An automatic hypothesis generation for plausible linkage between xanthium and diabetes

Arida Ferti Syafiandini, Gyuri Song, Yuri Ahn, Heeyoung Kim, Min Song

Research output: Contribution to journalArticlepeer-review


There has been a significant increase in text mining implementation for biomedical literature in recent years. Previous studies introduced the implementation of text mining and literature-based discovery to generate hypotheses of potential candidates for drug development. By conducting a hypothesis-generation step and using evidence from published journal articles or proceedings, previous studies have managed to reduce experimental time and costs. First, we applied the closed discovery approach from Swanson’s ABC model to collect publications related to 36 Xanthium compounds or diabetes. Second, we extracted biomedical entities and relations using a knowledge extraction engine, the Public Knowledge Discovery Engine for Java or PKDE4J. Third, we built a knowledge graph using the obtained bio entities and relations and then generated paths with Xanthium compounds as source nodes and diabetes as the target node. Lastly, we employed graph embeddings to rank each path and evaluated the results based on domain experts’ opinions and literature. Among 36 Xanthium compounds, 35 had direct paths to five diabetes-related nodes. We ranked 2,740,314 paths in total between 35 Xanthium compounds and three diabetes-related phrases: type 1 diabetes, type 2 diabetes, and diabetes mellitus. Based on the top five percentile paths, we concluded that adenosine, choline, beta-sitosterol, rhamnose, and scopoletin were potential candidates for diabetes drug development using natural products. Our framework for hypothesis generation employs a closed discovery from Swanson’s ABC model that has proven very helpful in discovering biological linkages between bio entities. The PKDE4J tools we used to capture bio entities from our document collection could label entities into five categories: genes, compounds, phenotypes, biological processes, and molecular functions. Using the BioPREP model, we managed to interpret the semantic relatedness between two nodes and provided paths containing valuable hypotheses. Lastly, using a graph-embedding algorithm in our path-ranking analysis, we exploited the semantic relatedness while preserving the graph structure properties.

Original languageEnglish
Article number17547
JournalScientific reports
Issue number1
Publication statusPublished - 2022 Dec

Bibliographical note

Funding Information:
This work was financially supported by the Research Grant of Federal Urdu University of Science, Arts and Technology.

Publisher Copyright:
© 2022, The Author(s).

All Science Journal Classification (ASJC) codes

  • General


Dive into the research topics of 'An automatic hypothesis generation for plausible linkage between xanthium and diabetes'. Together they form a unique fingerprint.

Cite this