We propose a novel semantic query expansion technique that combines association rules with ontologies and Natural Language Processing techniques. Our technique is different from others in that (1) it utilizes the explicit semantics as well as other linguistic properties of unstructured text corpus, (2) it makes use of contextual properties of important terms discovered by association rules, and (3) ontology entries are added to the query by disambiguating word senses. Using TREC ad hoc queries we achieve from 13.41% to 32.39% improvement for P@20 and from 8.39% to 14.22% for the F-measure.
Bibliographical noteFunding Information:
The authors are grateful to Sourav Bhowmick, Andreas Hotho, and an anonymous reviewer for their excellent comments. This work is supported in part by NSF Career grant IIS 0448023, NSF CCF 0514679, PA Dept of Health Tobacco Settlement Formula Grant (No. 240205 and No. 240196), and PA Dept of Health Grant (No. 239667).
Xiaohua (Tony) Hu is currently an assistant professor at the College of Information Science and Technology, Drexel University. His current research interests are in biomedical literature data mining, bioinformatics, text mining, semantic web mining and reasoning, rough set theory and application, information extraction and information retrieval. He has published more than 120 peer-reviewed research papers in various journals, conferences and books (many of these papers are in the best/top journals and conferences in his areas such as IEEE/ACM Transactions, JIS, KAIS, SIG KDD, IEEE ICDM, IEEE ICDE, SIGIR, ACM CIKM, etc.), co-edited 8 books/proceedings. He has received a few prestigious awards including the 2005 NSF Career award, the best paper award at the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, the 2006 IEEE Granular Computing Outstanding Service Award, and the 2001 IEEE Data Mining Outstanding Service Award. He has also served as a program co-chair/conference co-chair of 9 international conferences/workshops and a program committee member in more than 40 international conferences in the above areas. He is the founding editor-in-chief of the International Journal of Data Mining and Bioinformatics, an associate editor/editorial board member of four international journals (KAIS, IJDWM, IJSOI and JCIB), and the founding advisory board member and secretary of the IEEE Granular Computing Task Force. His research projects are funded by the National Science Foundation (NSF), US Dept. of Education, and the PA Dept. of Health.
All Science Journal Classification (ASJC) codes
- Information Systems and Management