A sensitivity analysis of factors influential to the popularity of shared data in data repositories

Qing Xie, Jiamin Wang, Giyeong Kim, Soobin Lee, Min Song

Research output: Contribution to journalArticlepeer-review

Abstract

With their rapid development, data repositories usually provide abundant metadata - including data types, keywords, downloads, stars, forks, and citations - along with the data content. These rich metadata can be used as valuable resources to study the factors that facilitate data sharing. However, few previous studies have attempted to study which metadata are correlated with the popularity of data. This study overcomes these issues by extracting the major factors for each dataset from a well-known data repository, the UCI Machine Learning Repository, and a popular open-source software repository, GitHub. We trained a neural network model and measured the influence of these features on quantified popularity metrics using the weight product of connecting neurons. We grouped the UCI factors into two categories (intrinsic and extrinsic) and the GitHub factors into three categories (intrinsic, extrinsic, and web-related) to analyze their influence on popularity at each level. The quantified influence was used to predict the popularity of the data or software. We conducted a statistical analysis to explore the relationship between these factors and popularity with five different domains (life sciences, physical sciences, computer science/engineering, social sciences, and others) for the UCI repository. This study's findings contribute to understanding the factors that affect the popularity of open datasets or software for providing guidance on data sharing, reuse, and organization.

Original languageEnglish
Article number101142
JournalJournal of Informetrics
Volume15
Issue number3
DOIs
Publication statusPublished - 2021 Aug

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2019R1A2C2002577 ).

Publisher Copyright:
© 2021 Elsevier Ltd.

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'A sensitivity analysis of factors influential to the popularity of shared data in data repositories'. Together they form a unique fingerprint.

Cite this