A cloud-based brain connectivity analysis tool

Laura Brattain, Mihnea Bulugioiu, Adam Brewster, Mark Hernandez, Heejin Choi, Taeyun Ku, Kwanghun Chung, Vijay Gadepally

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1-2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.

Original languageEnglish
Title of host publication2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538634721
DOIs
Publication statusPublished - 2017 Oct 30
Event2017 IEEE High Performance Extreme Computing Conference, HPEC 2017 - Waltham, United States
Duration: 2017 Sep 122017 Sep 14

Publication series

Name2017 IEEE High Performance Extreme Computing Conference, HPEC 2017

Conference

Conference2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
CountryUnited States
CityWaltham
Period17/9/1217/9/14

Fingerprint

Brain
Neurons
Fibers
Pipelines
Throughput
Binary trees
Parallel programming
Benchmarking
Graphical user interfaces
Information management
Imaging techniques
Experiments

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Brattain, L., Bulugioiu, M., Brewster, A., Hernandez, M., Choi, H., Ku, T., ... Gadepally, V. (2017). A cloud-based brain connectivity analysis tool. In 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017 [8091080] (2017 IEEE High Performance Extreme Computing Conference, HPEC 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HPEC.2017.8091080
Brattain, Laura ; Bulugioiu, Mihnea ; Brewster, Adam ; Hernandez, Mark ; Choi, Heejin ; Ku, Taeyun ; Chung, Kwanghun ; Gadepally, Vijay. / A cloud-based brain connectivity analysis tool. 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. (2017 IEEE High Performance Extreme Computing Conference, HPEC 2017).
@inproceedings{5e2dc9cf91ef4c2f825cda32d5b5220c,
title = "A cloud-based brain connectivity analysis tool",
abstract = "With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1-2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.",
author = "Laura Brattain and Mihnea Bulugioiu and Adam Brewster and Mark Hernandez and Heejin Choi and Taeyun Ku and Kwanghun Chung and Vijay Gadepally",
year = "2017",
month = "10",
day = "30",
doi = "10.1109/HPEC.2017.8091080",
language = "English",
series = "2017 IEEE High Performance Extreme Computing Conference, HPEC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2017 IEEE High Performance Extreme Computing Conference, HPEC 2017",
address = "United States",

}

Brattain, L, Bulugioiu, M, Brewster, A, Hernandez, M, Choi, H, Ku, T, Chung, K & Gadepally, V 2017, A cloud-based brain connectivity analysis tool. in 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017., 8091080, 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017, Institute of Electrical and Electronics Engineers Inc., 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017, Waltham, United States, 17/9/12. https://doi.org/10.1109/HPEC.2017.8091080

A cloud-based brain connectivity analysis tool. / Brattain, Laura; Bulugioiu, Mihnea; Brewster, Adam; Hernandez, Mark; Choi, Heejin; Ku, Taeyun; Chung, Kwanghun; Gadepally, Vijay.

2017 IEEE High Performance Extreme Computing Conference, HPEC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. 8091080 (2017 IEEE High Performance Extreme Computing Conference, HPEC 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A cloud-based brain connectivity analysis tool

AU - Brattain, Laura

AU - Bulugioiu, Mihnea

AU - Brewster, Adam

AU - Hernandez, Mark

AU - Choi, Heejin

AU - Ku, Taeyun

AU - Chung, Kwanghun

AU - Gadepally, Vijay

PY - 2017/10/30

Y1 - 2017/10/30

N2 - With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1-2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.

AB - With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1-2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.

UR - http://www.scopus.com/inward/record.url?scp=85041206794&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041206794&partnerID=8YFLogxK

U2 - 10.1109/HPEC.2017.8091080

DO - 10.1109/HPEC.2017.8091080

M3 - Conference contribution

AN - SCOPUS:85041206794

T3 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017

BT - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Brattain L, Bulugioiu M, Brewster A, Hernandez M, Choi H, Ku T et al. A cloud-based brain connectivity analysis tool. In 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017. Institute of Electrical and Electronics Engineers Inc. 2017. 8091080. (2017 IEEE High Performance Extreme Computing Conference, HPEC 2017). https://doi.org/10.1109/HPEC.2017.8091080