TY - GEN
T1 - A cloud-based brain connectivity analysis tool
AU - Brattain, Laura
AU - Bulugioiu, Mihnea
AU - Brewster, Adam
AU - Hernandez, Mark
AU - Choi, Heejin
AU - Ku, Taeyun
AU - Chung, Kwanghun
AU - Gadepally, Vijay
N1 - Publisher Copyright:
© 2017 IEEE.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2017/10/30
Y1 - 2017/10/30
N2 - With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1-2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.
AB - With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1-2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.
UR - http://www.scopus.com/inward/record.url?scp=85041206794&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041206794&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2017.8091080
DO - 10.1109/HPEC.2017.8091080
M3 - Conference contribution
AN - SCOPUS:85041206794
T3 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
BT - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
Y2 - 12 September 2017 through 14 September 2017
ER -