Literature-based Resurrection of Neglected Medical Discoveries.

It is possible to find in the medical literature many articles that have been neglected or ignored, in some cases for many years, but which are worth bringing to light because they report unusual findings that may be of current scientific interest. Resurrecting previously published but neglected hypotheses that have merit might be overlooked because it would seem to lack the novelty of "discovery" -- but the potential value of so doing is hardly arguable. Finding neglected hypotheses may be not only of great practical value, but also affords the opportunity to study the structure of such hypotheses in the hope of illuminating the more general problem of hypothesis generation.


Introduction
It is possible to find in the medical literature many articles that have been neglected or ignored, in some cases for many years, but which are worth bringing to light because they report unusual findings that may be of current scientific interest. An example will introduce my story about how to identify and characterize these articles. The following abbreviated Medline record represents a neglected article, later resurrected, that reports a medical discovery: Vosgerau reported consistent and rapid termination of migraine attacks in 10 patients who were treated with injections of magnesium glutamate. He was not the first to treat migraine with magnesium, but his report apparently was the first to become the main topic of a medical journal article. Earlier, Simon (1967, p.59) had claimed success in terminating migraine attacks with an injection of magnesium glutamate, but he wrote only six sentences about it, buried in a book on the pharmacology of magnesium.
A search of the Science Citation Index showed that during the next fifteen years after its publication Vosgerau's article had been cited only twice. Moreover, a title search of Medline for "migraine AND magnesium" as late as 1987 yielded only the single record by Vosgerau, and a similar but broader search that included medical subject headings (MeSH) yielded only 4 records at that time. The absence of citations to an article over an extended period immediately following its publication, together with the scarcity of other works that investigate the same problem, are plausible markers for neglect.
Vosgerau's article contains, in the title, a biologically meaningful co-occurrence of a substance term and a disease term. To a human observer, it is clear from the word "therapy" in the title that the substance is expected to influence (beneficially) the course of the disease. The same implication is reflected in the complementary subheadings assigned to the MeSH terms for magnesium and migraine in the full Medline record for that article. "Therapeutic use" is attached to the heading "magnesium" and "drug therapy" is attached to the heading "migraine". The possible role of subheadings in limiting the search space for other neglected discoveries merits investigation. Evidence that subheadings may be valuable in a similar though not identical context is discussed elsewhere (Swanson et al, 2006).
Hence, instead of searching for "discoveries" in general, we propose to address, at least initially, the better defined and more manageable problem of searching for substances or external agents that can or might influence diseases. Granted that large and important areas of medicine, including genetics, are not necessarily included in that description, this problem nonetheless covers much ground. Just how much can be estimated from the scope of the Medical Subject Heading (MeSH) schedules. The 2002 MeSH tree structures yielded a count of 9284 C-terms (diseases) and 32726 D-terms (substances, including entry terms). To learn which substances have been investigated for their effects on which diseases one can search Medline for each disease-substance pair of interest. The search space of all possible pairs is impressively large --over 300 million. At a later point, we report results of a small randomized search experiment.
The fact that the words "migraine" and "magnesium" both appear in the (translated) title of the Vosgerau paper should not be taken lightly. It signals that the author regarded the connection between migraine and magnesium as important enough to justify being the main topic of an entire article, a point perhaps not unrelated to whether a substance -disease term -co-occurrence is likely to be trivial or important, how readily the context of the relationship can be seen, and whether title-searching should be used to find other such articles.

On the question of resurrection
In 1988 I reported an analysis of eleven indirect or implicit connections between migraine and magnesium deficiency found by studying their separate literatures. Reports that directly connected magnesium with migraine were scarce. I discussed the few I had found, including those by Simon (1967) and by Vosgerau (1973) mentioned above, as well as papers by Altura (1985) and by Jain et al (1985). Although my purpose was primarily to bring to light possibly unnoticed implicit connections between migraine and magnesium, irrespective of any reports directly connecting them, it was secondarily to call attention to direct connections that may not have been appreciated as standing almost alone as a bridge between two otherwise unconnected literatures. (Swanson 1988, p. 546-547).
That my citing the few early papers reporting a direct magnesium-migraine connection may have awakened interest in them is suggested by the subsequent citation pattern. The number of citations to the above pre-1988 reports that I discussed was only 2 for Vosgerau and none for Altura or Jain, prior to 1988, but in the ensuing 10 years (1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998) there were about 57 citing articles --14 for Vosgerau, 28 for Altura, and 15 for Jain. All but 2 of these 57 also cited my above paper (Swanson 1988), thus supporting the possibility that it was indeed the latter that was responsible for awakening a dormant interest in the pre-1988 migraine-magnesium work, and so could be considered as an example of literature-based resurrection.
Just why the paper by Vosgerau had been apparently ignored for 15 years may not be knowable, but a plausible guess is that not enough was known in 1973 about the biological mechanisms or pathways by which magnesium could influence migraine. Perhaps calling attention in 1988 to the indirect evidence in support of 11 possible pathways was sufficient to trigger wider interest in the role of magnesium in migraine.
Once a direct connection that seems to be lonely and neglected is detected, then the work of resurrection, by the above argument becomes that of analyzing pathways or mechanisms that might not be known. For the express purpose of assisting such an analysis, the set of programs and processes called Arrowsmith was created. Arrowsmith has two modes of operation --hypothesis generation, and hypothesis testing. The above analysis shows that finding "neglected discoveries" can contribute to hypothesis generation, and, by so doing, provide input to the Arrowsmith hypothesis-testing mode. The latter then can help find the intermediate mechanisms of biological action constituting pathways that might illuminate and ignite interest in the neglected discoveries.

Arrowsmith as pathfinder; hypothesis-testing mode
The biological mechanisms or pathways by which agent A might influence disease C are called, collectively, B. Pathways A-B established within one set of articles and separate pathways B-C within a second, non-overlapping, set, suggest that bringing together these two types of pathway might reveal previously unknown A-B-C pathways worth investigating.
The "pathway problem" as stated above divides naturally into two sub-problems, depending on our state of knowledge about A, and assuming that a single specific disease C is given.
If a single specific A is hypothesized at the outset, we define a "hypothesis-testing" subproblem, and attempt to find all plausible pathways, B, given both A and C. In this subproblem, a list of B-candidates is generated by computer from words, phrases, or index terms common to the A and C bibliographic records in a database such as Medline. To then recognize which among the candidates could plausibly be part of a biological pathway requires, in general, human expertise and judgment. If A is unknown at the outset, then we define "hypothesis generation" as the sub-problem of developing a list of plausible A-terms.

The context of literature-based discovery
In our previous work on literature-based discovery, the case in which the A and C literatures were disjoint --i.e. had no articles in common --has been of central interest, for it implies that no direct A-C connection had been documented. Hence creating a novel though implicit A-C connection became the main objective --to find undiscovered public knowledge (Swanson & Smalheiser, 1997).
In the present paper, attention is shifted to cases in which the A-C intersection is not a null set but a set which instead is populated by just few articles. Those few articles are very likely to contain the kind of hypotheses we are trying to generate, and may well have the additional advantage of reporting preliminary tests based on clinical or laboratory research, irrespective of whether the biological mechanisms or pathways have also been investigated or discovered. Resurrecting previously published but neglected hypotheses that have merit might be overlooked as an approach to hypothesis generation because it would seem to lack the novelty of "discovery" --but the potential value of so doing is hardly arguable. The connection explosion within the literature is vast, and even useful stuff gets lost. Finding at least some of it supplements, but does not replace, the more challenging problem of generating hypotheses de novo. The neglected hypotheses found can be treated as input to the hypothesis-testing mode of Arrowsmith in order to identify possible pathways or mechanisms.
Finding neglected hypotheses may be not only of great practical value, but also affords the opportunity to study the structure of such hypotheses in the hope of illuminating the more general problem of hypothesis generation.

A pilot test based on randomized searching
From 9284 C-terms extracted from MeSH, 100 terms were selected using a random number generator. Title searches derived from the selected MeSH terms were constructed. Some of the terms were then omitted, for several reasons (see MeSH stoplist discussion elsewhere (Swanson et al, 2006)), including terms that led to either too many or too few postings in a title-search. 47 terms remained, and are shown in Table 1.
From 32726 D-category substance entry-terms, 30 were selected at random. Entry terms were matched to their corresponding MeSH terms, and a suitable title search was constructed. 12 terms were omitted in this process and the remaining 18 are shown in Table 2. To these random selections, "migraine" was added to the disease list and "magnesium" to the substance list, to provide a known case as a benchmark.  intersections of the 18 substances and 47 diseases were formed from a title search, narrowed down in some cases by using appropriate subject headings. There were 789 null and 57 non-null intersections; of the latter, 36 were small enough to qualify as "neglected" (5 or fewer articles --based on a broader search than just titles, the threshold of 5 being somewhat arbitrary).
A search of the Science Citation Index (all years) was conducted for all articles corresponding to the 36 disease-substance intersections. Only those disease-substance articles that had 5 or fewer citations prior to 1988 were retained, of which there were 14. These 14 articles, which represented 9 disease-substance searches, fulfilled the two criteria for being considered as neglected. Table 3 shows, for each of these 9 searches (14 articles) the number of citing references before and after 1988. The migraine and magnesium results are shown separately from the randomized group. The latter accounted for 5 of the 9 searches and 8 of the 14 articles. Table 4 lists authors and titles for the 14 articles.

Observation on subheadings
The titles of 11 of the 14 articles make clear to a human observer that they involve the influence of an externally administered agent on a disease, which fulfills the original intent of the search; the other three articles however (#4B, #7, and #9) do not directly fulfill that intent. The same inference can be drawn from the subheadings assigned to the substance and disease MeSH terms. Should a larger scale study bear out these results, it is possible that automatic filtering of the output based on pre-specified subheadings will prove to be of value.

Addendum
The above reported experiment as it stands is incomplete; this draft reports work in progress. The next step is to repeat each of the final 9 substance-disease searches for the post 1988 time period. The purpose of conducting the above search as though the date were 1988 was to provide visibility on what happened after that date to these supposedly neglected discoveries. The next step after that is to conduct an Arrowsmith 2-node search using the same two terms as in the above 9 searches.
The purpose of the randomized approach was to see what would emerge in the way of "neglected discoveries" from a more or less mindless --randomized --combination of substances and diseases. Using this as a baseline makes it easier to evaluate more intelligently-selected categories of the MeSH schedule for disease-substance combinations, an evaluation that may also shed light on the design of a menu of AA choices for the kiwi-server based 1-node search [Editor's note: This refers to Don's website at http://kiwi.uchicago.edu, which is no longer being maintained.] Indeed it seems natural that the AA 1-node approach eventually will have built into it the "resurrection" approach presented here, simply because both are addressed to the problem of hypothesis generation. The role of the Science Citation Index at present isn't easily accomodated by Arrowsmith (http://arrowsmith.psych.uic.edu), but it is not yet clear that the SCI is indispensable for our purposes.

Note 1 (Neil R. Smalheiser):
Don outlines a set of search criteria to find initial candidate articles that are neglected. However, further work in this area should attempt to establish criteria to determine WHY each candidate was neglected and thus, to assess rapidly which ones (if any) are worth rehabilitating.
One can also question the robustness of the search criteria as defined in this article. Whereas lithium appears as a candidate neglected treatment for both melanoma and small cell carcinoma (Tables 3 and 4), the "neglect" is partially due to considering each form of cancer as a separate, independent entity. If one considers various forms of cancer in the aggregate, there are many papers on the topic both before and after 1988. This means that analysis of a "neglected" finding linking a treatment to a disease should include an examination of the larger biomedical context of the index paper(s) as well as the meaningfulness of the links per se.

Note 2 (Vetle I. Torvik):
Information retrieval systems tend to rank highly the most recent or the most obviously relevant papers, whereas neglected but important papers, particularly ones relevant for literature-based discovery, are by definition not obvious and not recent. For example, Google Scholar heavily weights citation counts for ranking search results, while PubMed lists the most recent papers first. If one were to identify the neglected papers, one would be better off starting at the end of these lists, that is, follow a strategy that goes against convention. This is partly why Don Swanson's paper represents an important step forward for the field of literature-based discovery, and even a cautionary tale for information retrieval in general. Looking forward, one may expect that being able to identify neglected findings will become increasingly important because papers need to be cited within a short time period after publication otherwise they may quickly become candidates for neglect.
When studying the lack of citations as a marker for neglect, it is important to understand not only on why papers are cited but also why papers are not cited. For example, a paper may have a low citation count because, for example, it is hard to find (unfortunate title choice, poorly indexed, obscure journal, foreign un-translated language), it is ahead of its time or otherwise not actionable at the time of publication, or it is of low quality and thus purposefully neglected. A paper may also be neglected if it presents findings that disagree with the majority of other publications on the subject. With the advent of web servers and their download logs, one now has the opportunity to distinguish usage from citation, and test alternative markers for neglect. It may be that papers with low download and citation counts are the ones that are hard to find, and are therefore most susceptible for neglect.
Identifying neglected findings is a more rich and difficult problem than identifying neglected papers. This is partly due to the fact that references point to papers and not necessarily to specific findings. Findings reported in the title or abstract of a paper are typically easy to identify, but what if an important finding is written as side-note in the full-text of the paper because the author did not realize its importance at the time?
Papers that received few citations during the first few years post-publication, followed by a steep increase in citation counts subsequently, are rare and have been called "sleeping beauties" (Van Raan, AFJ. Sleeping Beauties in science. Scientometrics. 2004;59(3): 467-472). Their resurrection might be explained, in at least in part, by a random process (Burrell, QL. Are "Sleeping Beauties" to be expected? Scientometrics. 2005; 65 (3): 381-385). However, there seems to be little existing systematic effort aimed at identifying and resurrecting neglected papers or findings. The so-called sleeping beauties may provide insight into certain types of neglect. It would also be useful to know how often neglect occurs generally (beyond the sleeping beauties that are [serendipitously] resurrected), and how this rate differs across time and disciplines. Identifying attributes that can help predict neglect and resurrection would be useful not only for investigators but for librarians, editors, publishers, and IR system developers, who could (and should) implement tools and strategies for tracking articles and their fates and impacts over time.