JISC has today released a new report on the value and benefits of text mining. The evidence gathered and presented in the report illustrates that text mining promises economic and research benefit to both UKFHE and the wider economy, but that copyright law and other barriers are currently limiting its use.
The report contains a case study of JISC Journal Archives and how this platform’s search technology uses text mining to increase accessibility and relevance of retrieved scholarly journal content. The case study suggests that the text mining search technology used on JISC Journal Archives provides a cost saving of £416.16 per academic per year. Applied across the UK HE sector this would indicate £59.9m worth of academic time could be saved through the streamlined search process.
The full report is online on the JISC wesbite but here is the relevant section from the report:
4.3 Using text mining to increase accessibility and relevance of scholarly content
As this case study of the JISC Journal Archives  illustrates, text mining can be used to provide more efficient searching, which returns higher quality results than traditional information retrieval techniques. JISC Journal Archives contains a selection of journal archives that have been licensed for perpetual access by member institutions. MIMAS has recently developed a service that enables simple and fast conceptual searching across more than 450 journals published by Brill, Institution of Civil Engineers, Institute of Physics, ProQuest, Oxford University Press and the Royal Society of Chemistry. The aim of this subscription service [xxiv] is to enable researchers to access well-targeted content through three simple clicks from one central interface rather than having to visit multiple content providers’ websites and negotiate their differing interfaces. As Box 4 below illustrates, it increases researcher efficiency.
Increased accessibility and relevance in information retrieval
Searching on JISC Journal Archives for journal articles relating to ‘graphene’ returned 137 results with one click. Each of the individual papers can then be accessed through two further clicks – the second to select the paper from the return list and the third to download the pdf. At a conservative estimate, this takes less than 45 seconds, assuming sufficient internet bandwidth.
Carrying out the same search manually over the individual archives would involve at least five clicks per archive – visiting the archive, logging in, searching for graphene, selecting the journal article to read and downloading the pdf.
Further, the returned list is automatically ranked for relevance. So at a conservative estimate, the researcher can select papers to read in less than one minute. Using Tenopir and King  figures that the average US academic spends 5.2 minutes selecting a paper when browsing collections,
Time saved through text mining enhanced paper selection = 5.2-1 = 4.2 minutes
Assuming median researcher salary of £48,000 and 1,650 working hours per annum, then:
Cost saving per paper selected = £2.04
This illustrates a very real productivity gain – the researcher only spends approximately 1/5th of the time they would normally spend on paper selection.
Tenopir et al  estimate that an average academic selects 204 papers per year. This implies a cost saving of £416.16 per academic per year. Applied across the UK HE sector this would indicate £59.9m worth of academic time could be saved through the streamlined search process [xxv].
The JISC Journal Archives facility also increases the quality of the journal articles selected for further investigation through three means: it ranks the articles identified based on semantic content analysis; it provides a summary of the content; and it provides a list of other articles which are contextually similar. This includes identification of previously unknown links between documents.
Anecdotal evidence from users suggests that lists of conceptually similar articles are particularly helpful in improving the quality of literature reviews; however, current legal/process restrictions limit the value that can be achieved. For example, the Autonomy IDOL software automatically summarises the documents in the collections as part of the initial indexing process. However, the agreements with the rights holders prevent display of this derivative product. As in the systems biology case study (4.1), were this facility available it could increase research productivity by a factor of 6.2 [xxvi].
In summary, this case study illustrates the value of efficiency savings that text mining can make, as well as benefits of improved quality of literature reviews.