Effective and scalable solutions for mixed and split citation problems in digital libraries

Dongwon Lee, Byung Won On, Jaewoo Kang, Sang Hyun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

62 Citations (Scopus)

Abstract

In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-The-Art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.

Original languageEnglish
Title of host publicationProceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005
PublisherAssociation for Computing Machinery, Inc
Pages69-76
Number of pages8
ISBN (Electronic)1595931600, 9781595931603
DOIs
Publication statusPublished - 2005 Jun 17
Event2nd International Workshop on Information Quality in Information Systems, IQIS 2005 - Baltimore, United States
Duration: 2005 Jun 17 → …

Publication series

NameProceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005

Other

Other2nd International Workshop on Information Quality in Information Systems, IQIS 2005
CountryUnited States
CityBaltimore
Period05/6/17 → …

Fingerprint

Digital libraries
Sampling

All Science Journal Classification (ASJC) codes

  • Information Systems

Cite this

Lee, D., On, B. W., Kang, J., & Park, S. H. (2005). Effective and scalable solutions for mixed and split citation problems in digital libraries. In Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005 (pp. 69-76). (Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005). Association for Computing Machinery, Inc. https://doi.org/10.1145/1077501.1077514
Lee, Dongwon ; On, Byung Won ; Kang, Jaewoo ; Park, Sang Hyun. / Effective and scalable solutions for mixed and split citation problems in digital libraries. Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005. Association for Computing Machinery, Inc, 2005. pp. 69-76 (Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005).
@inproceedings{25f7aead23af48a59cb5b0ccc76f1a4e,
title = "Effective and scalable solutions for mixed and split citation problems in digital libraries",
abstract = "In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-The-Art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.",
author = "Dongwon Lee and On, {Byung Won} and Jaewoo Kang and Park, {Sang Hyun}",
year = "2005",
month = "6",
day = "17",
doi = "10.1145/1077501.1077514",
language = "English",
series = "Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005",
publisher = "Association for Computing Machinery, Inc",
pages = "69--76",
booktitle = "Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005",

}

Lee, D, On, BW, Kang, J & Park, SH 2005, Effective and scalable solutions for mixed and split citation problems in digital libraries. in Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005. Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005, Association for Computing Machinery, Inc, pp. 69-76, 2nd International Workshop on Information Quality in Information Systems, IQIS 2005, Baltimore, United States, 05/6/17. https://doi.org/10.1145/1077501.1077514

Effective and scalable solutions for mixed and split citation problems in digital libraries. / Lee, Dongwon; On, Byung Won; Kang, Jaewoo; Park, Sang Hyun.

Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005. Association for Computing Machinery, Inc, 2005. p. 69-76 (Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Effective and scalable solutions for mixed and split citation problems in digital libraries

AU - Lee, Dongwon

AU - On, Byung Won

AU - Kang, Jaewoo

AU - Park, Sang Hyun

PY - 2005/6/17

Y1 - 2005/6/17

N2 - In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-The-Art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.

AB - In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-The-Art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.

UR - http://www.scopus.com/inward/record.url?scp=85019781759&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019781759&partnerID=8YFLogxK

U2 - 10.1145/1077501.1077514

DO - 10.1145/1077501.1077514

M3 - Conference contribution

AN - SCOPUS:85019781759

T3 - Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005

SP - 69

EP - 76

BT - Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005

PB - Association for Computing Machinery, Inc

ER -

Lee D, On BW, Kang J, Park SH. Effective and scalable solutions for mixed and split citation problems in digital libraries. In Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005. Association for Computing Machinery, Inc. 2005. p. 69-76. (Proceedings of the 2nd International Workshop on Information Quality in Information Systems, IQIS 2005). https://doi.org/10.1145/1077501.1077514