Designing and developing a system that assists the users in digesting and understanding information available has been a difficult challenge. In this paper, we discuss the design and development of an automatic interactive keyphrase extraction system, called KPSpotter, which is capable of processing various formats of data such as XML, HTML, and plain text through Internet. KPSpotter combines Information Gain data mining measure and several Natural Language Processing (NLP) techniques, such as Part of Speech (POS) technique and First Occurrence of Term. To improve extraction accuracy, WordNet is incorporated into KPSpotter. In designing and developing KPSpotter we utilized Unified Modeling Language (UML). UML modeling helps in the formalization of the preliminary analysis model and accomplishes iterative system design and development. We also conducted experiments for system performance testing by comparing keyphrases extracted by KPSPotter and KEA, a well-known naïve Baysiean-based keyphrase extraction system. The experiments show that KPSpotter outperforms KEA in most test cases.
All Science Journal Classification (ASJC) codes
- Information Systems
- Library and Information Sciences