Artemy Kolchinsky1,2, Anália Lourenço3, Heng-Yi Wu4, Lang Li4 and Luis M. Rocha1,2,*
1School of Informatics, Indiana University, Bloomington IN, USA
2FLAD Computational Biology Collaboratorium, Instituto Gulbenkian de Ciencia, Portugal
3IBB/CEB, University of Minho, Campus Gualtar, Braga, Portugal
4Department of Medical and Molecular Genetics, Indiana Univeristy School of Medicine, USA
Citation: Citation: A. Kolchinsky, A. Lourenço, H. Wu, L. Li, and L.M. Rocha.[2015] "Extraction of Pharmacokinetic Evidence of Drug-drug Interactions from the literature." PLoS ONE 10(5): e0122199. doi:10.1371/journal.pone.0122199.
The full text and pdf re-print are available from the PLoS ONE site. Due to mathematical notation and graphics, only the abstract is presented here. The arXiv:1412.0744 pre-print is also available.
Drug-drug interactions (DDIs) are major causes of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. While evidence for DDI ranges in scale from intracellular biochemistry to human populations, literature mining methods have not been used to extract specific types of experimental evidence which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDIs ... We used a manually curated corpus of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining in classifying PubMed abstracts containing pharmacokinetic evidence for DDIs, as well as extracting sentences containing such evidence. We implemented a text mining pipeline using several linear classifiers and a variety of feature transformation methods. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, from various publicly-available named entity recognizers and from pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1 ~= 0.93, MCC ~= 0.74, iAUC ~= 0.99) and sentences (F1 ~= 0.76, MCC ~= 0.65, iAUC ~= 0.83). We found that word-bigram textual features were important for achieving optimal classifier performance, that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification, and that some drug-related entity named recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences.
Keywords: Drug-drug interaction, literature mining, text mining, text classification, machine learning, pharmacokinetics, translational science