There has been a significant increase in text mining implementation for biomedical literature in recent years. Previous studies introduced the implementation of text mining and literature-based discovery to generate hypotheses of potential candidates for drug development. By conducting a hypothesis-generation step and using evidence from published journal articles or proceedings, previous studies have managed to reduce experimental time and costs. First, we applied the closed discovery approach from Swanson’s ABC model to collect publications related to 36 Xanthium compounds or diabetes. Second, we extracted biomedical entities and relations using a knowledge extraction engine, the Public Knowledge Discovery Engine for Java or PKDE4J. Third, we built a knowledge graph using the obtained bio entities and relations and then generated paths with Xanthium compounds as source nodes and diabetes as the target node. Lastly, we employed graph embeddings to rank each path and evaluated the results based on domain experts’ opinions and literature. Among 36 Xanthium compounds, 35 had direct paths to five diabetes-related nodes. We ranked 2,740,314 paths in total between 35 Xanthium compounds and three diabetes-related phrases: type 1 diabetes, type 2 diabetes, and diabetes mellitus. Based on the top five percentile paths, we concluded that adenosine, choline, beta-sitosterol, rhamnose, and scopoletin were potential candidates for diabetes drug development using natural products. Our framework for hypothesis generation employs a closed discovery from Swanson’s ABC model that has proven very helpful in discovering biological linkages between bio entities. The PKDE4J tools we used to capture bio entities from our document collection could label entities into five categories: genes, compounds, phenotypes, biological processes, and molecular functions. Using the BioPREP model, we managed to interpret the semantic relatedness between two nodes and provided paths containing valuable hypotheses. Lastly, using a graph-embedding algorithm in our path-ranking analysis, we exploited the semantic relatedness while preserving the graph structure properties.
|Publication status||Published - 2022 Dec|
Bibliographical noteFunding Information:
This work was financially supported by the Research Grant of Federal Urdu University of Science, Arts and Technology.
© 2022, The Author(s).
All Science Journal Classification (ASJC) codes