Data repository mapping for influenza protein sequence analysis

Donald Pellegrino, Chaomei Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper introduces a new method for creating an interactive sequence similarity map of all known influenza virus protein sequences and integrating the map with existing general purpose analytical tools. The NCBI data model was designed to provide a high degree of interconnectedness amongst data objects. Substantial and continuous increase in data volume has led to a large and highly connected information space. Researchers seeking to explore this space are challenged to identify a starting point. They often choose data that is popular in the literature. Reference in the literature follow a power law distribution and popular data points may bias explorers toward paths that lead only to a dead-end of what is already known. To help discover the unexpected we developed an interactive visual analytics system to map the information space of influenza protein sequence data. The design is motivated by the needs of eScience researchers.

Original languageEnglish
Title of host publicationProceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011
DOIs
Publication statusPublished - 2011 Feb 28
EventVisualization and Data Analysis 2011 - San Francisco, CA, United States
Duration: 2011 Jan 242011 Jan 25

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume7868
ISSN (Print)0277-786X

Other

OtherVisualization and Data Analysis 2011
CountryUnited States
CitySan Francisco, CA
Period11/1/2411/1/25

Fingerprint

influenza
Influenza
Sequence Analysis
Protein Sequence
Repository
proteins
Proteins
viruses
Viruses
Data structures
Visual Analytics
E-Science
Power-law Distribution
Data Model
Virus
Choose
Path

All Science Journal Classification (ASJC) codes

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Cite this

Pellegrino, D., & Chen, C. (2011). Data repository mapping for influenza protein sequence analysis. In Proceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011 [786804] (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 7868). https://doi.org/10.1117/12.872266
Pellegrino, Donald ; Chen, Chaomei. / Data repository mapping for influenza protein sequence analysis. Proceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011. 2011. (Proceedings of SPIE - The International Society for Optical Engineering).
@inproceedings{6a6bdd70246c464f8a36e779373e52f8,
title = "Data repository mapping for influenza protein sequence analysis",
abstract = "This paper introduces a new method for creating an interactive sequence similarity map of all known influenza virus protein sequences and integrating the map with existing general purpose analytical tools. The NCBI data model was designed to provide a high degree of interconnectedness amongst data objects. Substantial and continuous increase in data volume has led to a large and highly connected information space. Researchers seeking to explore this space are challenged to identify a starting point. They often choose data that is popular in the literature. Reference in the literature follow a power law distribution and popular data points may bias explorers toward paths that lead only to a dead-end of what is already known. To help discover the unexpected we developed an interactive visual analytics system to map the information space of influenza protein sequence data. The design is motivated by the needs of eScience researchers.",
author = "Donald Pellegrino and Chaomei Chen",
year = "2011",
month = "2",
day = "28",
doi = "10.1117/12.872266",
language = "English",
isbn = "9780819484055",
series = "Proceedings of SPIE - The International Society for Optical Engineering",
booktitle = "Proceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011",

}

Pellegrino, D & Chen, C 2011, Data repository mapping for influenza protein sequence analysis. in Proceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011., 786804, Proceedings of SPIE - The International Society for Optical Engineering, vol. 7868, Visualization and Data Analysis 2011, San Francisco, CA, United States, 11/1/24. https://doi.org/10.1117/12.872266

Data repository mapping for influenza protein sequence analysis. / Pellegrino, Donald; Chen, Chaomei.

Proceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011. 2011. 786804 (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 7868).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Data repository mapping for influenza protein sequence analysis

AU - Pellegrino, Donald

AU - Chen, Chaomei

PY - 2011/2/28

Y1 - 2011/2/28

N2 - This paper introduces a new method for creating an interactive sequence similarity map of all known influenza virus protein sequences and integrating the map with existing general purpose analytical tools. The NCBI data model was designed to provide a high degree of interconnectedness amongst data objects. Substantial and continuous increase in data volume has led to a large and highly connected information space. Researchers seeking to explore this space are challenged to identify a starting point. They often choose data that is popular in the literature. Reference in the literature follow a power law distribution and popular data points may bias explorers toward paths that lead only to a dead-end of what is already known. To help discover the unexpected we developed an interactive visual analytics system to map the information space of influenza protein sequence data. The design is motivated by the needs of eScience researchers.

AB - This paper introduces a new method for creating an interactive sequence similarity map of all known influenza virus protein sequences and integrating the map with existing general purpose analytical tools. The NCBI data model was designed to provide a high degree of interconnectedness amongst data objects. Substantial and continuous increase in data volume has led to a large and highly connected information space. Researchers seeking to explore this space are challenged to identify a starting point. They often choose data that is popular in the literature. Reference in the literature follow a power law distribution and popular data points may bias explorers toward paths that lead only to a dead-end of what is already known. To help discover the unexpected we developed an interactive visual analytics system to map the information space of influenza protein sequence data. The design is motivated by the needs of eScience researchers.

UR - http://www.scopus.com/inward/record.url?scp=79951873752&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951873752&partnerID=8YFLogxK

U2 - 10.1117/12.872266

DO - 10.1117/12.872266

M3 - Conference contribution

SN - 9780819484055

T3 - Proceedings of SPIE - The International Society for Optical Engineering

BT - Proceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011

ER -

Pellegrino D, Chen C. Data repository mapping for influenza protein sequence analysis. In Proceedings of SPIE-IS and T Electronic Imaging - Visualization and Data Analysis 2011. 2011. 786804. (Proceedings of SPIE - The International Society for Optical Engineering). https://doi.org/10.1117/12.872266