C-Rank: A link-based similarity measure for scientific literature databases

Seok Ho Yoon, Sang Wook Kim, Sunju Park

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

As the number of people who use scientific literature databases has grown, the demand for literature retrieval services has steadily increased. One of the most popular retrieval service methods is to find a set of papers similar to the paper under consideration, which requires a measure that computes the similarities between the papers. Scientific literature databases exhibit two interesting characteristics that are not found in general databases. First, the papers cited by older papers are often not included in the database due to technical and economic reasons. Second, since a paper references previously published papers, few papers cite recently published papers. These two characteristics cause all existing similarity measures to fail in at least one of the following cases: (1) measuring the similarity between old, but similar papers, (2) measuring the similarity between recent, but similar papers, and (3) measuring the similarity between two similar papers: one old, the other recent. In this paper, we propose a new link-based similarity measure called C-Rank, which uses both in-link and out-link references, disregarding the direction of the references. In addition, we discuss the most suitable normalization method for scientific literature databases and we propose an evaluation method for measuring the accuracy of similarity measures. For the experiments, we used real-world papers from DBLP's database with reference information crawled from Libra. We then compared the performance of C-Rank with that of existing similarity measures. Experimental results showed that C-Rank achieved a higher accuracy than existing similarity measures.

Original languageEnglish
Pages (from-to)25-40
Number of pages16
JournalInformation sciences
Volume326
DOIs
Publication statusPublished - 2016 Jan 1

Fingerprint

Similarity Measure
Retrieval
Evaluation Method
Normalization
Data base
Similarity measure
High Accuracy
Economics
Similarity
Experimental Results
Experiment
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

@article{819a43f0e657407ba350a64a93cfa1ff,
title = "C-Rank: A link-based similarity measure for scientific literature databases",
abstract = "As the number of people who use scientific literature databases has grown, the demand for literature retrieval services has steadily increased. One of the most popular retrieval service methods is to find a set of papers similar to the paper under consideration, which requires a measure that computes the similarities between the papers. Scientific literature databases exhibit two interesting characteristics that are not found in general databases. First, the papers cited by older papers are often not included in the database due to technical and economic reasons. Second, since a paper references previously published papers, few papers cite recently published papers. These two characteristics cause all existing similarity measures to fail in at least one of the following cases: (1) measuring the similarity between old, but similar papers, (2) measuring the similarity between recent, but similar papers, and (3) measuring the similarity between two similar papers: one old, the other recent. In this paper, we propose a new link-based similarity measure called C-Rank, which uses both in-link and out-link references, disregarding the direction of the references. In addition, we discuss the most suitable normalization method for scientific literature databases and we propose an evaluation method for measuring the accuracy of similarity measures. For the experiments, we used real-world papers from DBLP's database with reference information crawled from Libra. We then compared the performance of C-Rank with that of existing similarity measures. Experimental results showed that C-Rank achieved a higher accuracy than existing similarity measures.",
author = "Yoon, {Seok Ho} and Kim, {Sang Wook} and Sunju Park",
year = "2016",
month = "1",
day = "1",
doi = "10.1016/j.ins.2015.07.036",
language = "English",
volume = "326",
pages = "25--40",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

C-Rank : A link-based similarity measure for scientific literature databases. / Yoon, Seok Ho; Kim, Sang Wook; Park, Sunju.

In: Information sciences, Vol. 326, 01.01.2016, p. 25-40.

Research output: Contribution to journalArticle

TY - JOUR

T1 - C-Rank

T2 - A link-based similarity measure for scientific literature databases

AU - Yoon, Seok Ho

AU - Kim, Sang Wook

AU - Park, Sunju

PY - 2016/1/1

Y1 - 2016/1/1

N2 - As the number of people who use scientific literature databases has grown, the demand for literature retrieval services has steadily increased. One of the most popular retrieval service methods is to find a set of papers similar to the paper under consideration, which requires a measure that computes the similarities between the papers. Scientific literature databases exhibit two interesting characteristics that are not found in general databases. First, the papers cited by older papers are often not included in the database due to technical and economic reasons. Second, since a paper references previously published papers, few papers cite recently published papers. These two characteristics cause all existing similarity measures to fail in at least one of the following cases: (1) measuring the similarity between old, but similar papers, (2) measuring the similarity between recent, but similar papers, and (3) measuring the similarity between two similar papers: one old, the other recent. In this paper, we propose a new link-based similarity measure called C-Rank, which uses both in-link and out-link references, disregarding the direction of the references. In addition, we discuss the most suitable normalization method for scientific literature databases and we propose an evaluation method for measuring the accuracy of similarity measures. For the experiments, we used real-world papers from DBLP's database with reference information crawled from Libra. We then compared the performance of C-Rank with that of existing similarity measures. Experimental results showed that C-Rank achieved a higher accuracy than existing similarity measures.

AB - As the number of people who use scientific literature databases has grown, the demand for literature retrieval services has steadily increased. One of the most popular retrieval service methods is to find a set of papers similar to the paper under consideration, which requires a measure that computes the similarities between the papers. Scientific literature databases exhibit two interesting characteristics that are not found in general databases. First, the papers cited by older papers are often not included in the database due to technical and economic reasons. Second, since a paper references previously published papers, few papers cite recently published papers. These two characteristics cause all existing similarity measures to fail in at least one of the following cases: (1) measuring the similarity between old, but similar papers, (2) measuring the similarity between recent, but similar papers, and (3) measuring the similarity between two similar papers: one old, the other recent. In this paper, we propose a new link-based similarity measure called C-Rank, which uses both in-link and out-link references, disregarding the direction of the references. In addition, we discuss the most suitable normalization method for scientific literature databases and we propose an evaluation method for measuring the accuracy of similarity measures. For the experiments, we used real-world papers from DBLP's database with reference information crawled from Libra. We then compared the performance of C-Rank with that of existing similarity measures. Experimental results showed that C-Rank achieved a higher accuracy than existing similarity measures.

UR - http://www.scopus.com/inward/record.url?scp=84943806579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84943806579&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2015.07.036

DO - 10.1016/j.ins.2015.07.036

M3 - Article

AN - SCOPUS:84943806579

VL - 326

SP - 25

EP - 40

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -