Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

Minyoung Jung, Jinwoo Park, Johann Blieberger, bernd Burgstaller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA facilitates parallel FA matching, SFA construction itself is limited by the exponential state-growth problem, which makes sequential SFA construction intractable for all but the smallest problem sizes.In this paper, we propose several optimizations to leverage parallelism, improve cache and memory utilization and greatly reduce the processing steps required to construct an SFA. We introduce fingerprints and hashing for efficient comparisons of SFA states. Kernels of x86 SIMD-instructions facilitate cache-locality and leverage data-parallelism with the construction of SFA states. Our parallelization for shared-memory multicores employs lock-free synchronization to minimize cache-coherence overhead. Our dynamic work-partitioning scheme employs work-stealing with thread-local work-queues. The structural properties of FAs allow efficient compression of SFA states. Our construction algorithm dynamically switches to in-memory compression of SFA states for problem sizes which approach the main memory size limit of a given system.We evaluate our approach with patterns from the PROSITE protein database. We achieve speedups of up to 312x on a 64-core AMD system and 193x on a 44-core (88 hyperthreads) Intel system. Our SFA construction algorithm shows scalability on both evaluation platforms.

Original languageEnglish
Title of host publicationProceedings - 46th International Conference on Parallel Processing, ICPP 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages271-281
Number of pages11
ISBN (Electronic)9781538610428
DOIs
Publication statusPublished - 2017 Sep 1
Event46th International Conference on Parallel Processing, ICPP 2017 - Bristol, United Kingdom
Duration: 2017 Aug 142017 Aug 17

Other

Other46th International Conference on Parallel Processing, ICPP 2017
CountryUnited Kingdom
CityBristol
Period17/8/1417/8/17

Fingerprint

Deterministic Finite Automata
Finite automata
Shared Memory
Data storage equipment
Finite Automata
Pattern Matching
Leverage
Parallelization
Cache
Pattern matching
Compression
Cache Coherence
Data Parallelism
Data Dependency
Hashing
Matching Algorithm
Fingerprint
Locality
Thread
Structural Properties

All Science Journal Classification (ASJC) codes

  • Software
  • Mathematics(all)
  • Hardware and Architecture

Cite this

Jung, M., Park, J., Blieberger, J., & Burgstaller, B. (2017). Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores. In Proceedings - 46th International Conference on Parallel Processing, ICPP 2017 (pp. 271-281). [8025301] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPP.2017.36
Jung, Minyoung ; Park, Jinwoo ; Blieberger, Johann ; Burgstaller, bernd. / Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores. Proceedings - 46th International Conference on Parallel Processing, ICPP 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 271-281
@inproceedings{df90406985134bb3b1fa1e41c31c1823,
title = "Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores",
abstract = "String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA facilitates parallel FA matching, SFA construction itself is limited by the exponential state-growth problem, which makes sequential SFA construction intractable for all but the smallest problem sizes.In this paper, we propose several optimizations to leverage parallelism, improve cache and memory utilization and greatly reduce the processing steps required to construct an SFA. We introduce fingerprints and hashing for efficient comparisons of SFA states. Kernels of x86 SIMD-instructions facilitate cache-locality and leverage data-parallelism with the construction of SFA states. Our parallelization for shared-memory multicores employs lock-free synchronization to minimize cache-coherence overhead. Our dynamic work-partitioning scheme employs work-stealing with thread-local work-queues. The structural properties of FAs allow efficient compression of SFA states. Our construction algorithm dynamically switches to in-memory compression of SFA states for problem sizes which approach the main memory size limit of a given system.We evaluate our approach with patterns from the PROSITE protein database. We achieve speedups of up to 312x on a 64-core AMD system and 193x on a 44-core (88 hyperthreads) Intel system. Our SFA construction algorithm shows scalability on both evaluation platforms.",
author = "Minyoung Jung and Jinwoo Park and Johann Blieberger and bernd Burgstaller",
year = "2017",
month = "9",
day = "1",
doi = "10.1109/ICPP.2017.36",
language = "English",
pages = "271--281",
booktitle = "Proceedings - 46th International Conference on Parallel Processing, ICPP 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Jung, M, Park, J, Blieberger, J & Burgstaller, B 2017, Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores. in Proceedings - 46th International Conference on Parallel Processing, ICPP 2017., 8025301, Institute of Electrical and Electronics Engineers Inc., pp. 271-281, 46th International Conference on Parallel Processing, ICPP 2017, Bristol, United Kingdom, 17/8/14. https://doi.org/10.1109/ICPP.2017.36

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores. / Jung, Minyoung; Park, Jinwoo; Blieberger, Johann; Burgstaller, bernd.

Proceedings - 46th International Conference on Parallel Processing, ICPP 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 271-281 8025301.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

AU - Jung, Minyoung

AU - Park, Jinwoo

AU - Blieberger, Johann

AU - Burgstaller, bernd

PY - 2017/9/1

Y1 - 2017/9/1

N2 - String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA facilitates parallel FA matching, SFA construction itself is limited by the exponential state-growth problem, which makes sequential SFA construction intractable for all but the smallest problem sizes.In this paper, we propose several optimizations to leverage parallelism, improve cache and memory utilization and greatly reduce the processing steps required to construct an SFA. We introduce fingerprints and hashing for efficient comparisons of SFA states. Kernels of x86 SIMD-instructions facilitate cache-locality and leverage data-parallelism with the construction of SFA states. Our parallelization for shared-memory multicores employs lock-free synchronization to minimize cache-coherence overhead. Our dynamic work-partitioning scheme employs work-stealing with thread-local work-queues. The structural properties of FAs allow efficient compression of SFA states. Our construction algorithm dynamically switches to in-memory compression of SFA states for problem sizes which approach the main memory size limit of a given system.We evaluate our approach with patterns from the PROSITE protein database. We achieve speedups of up to 312x on a 64-core AMD system and 193x on a 44-core (88 hyperthreads) Intel system. Our SFA construction algorithm shows scalability on both evaluation platforms.

AB - String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA facilitates parallel FA matching, SFA construction itself is limited by the exponential state-growth problem, which makes sequential SFA construction intractable for all but the smallest problem sizes.In this paper, we propose several optimizations to leverage parallelism, improve cache and memory utilization and greatly reduce the processing steps required to construct an SFA. We introduce fingerprints and hashing for efficient comparisons of SFA states. Kernels of x86 SIMD-instructions facilitate cache-locality and leverage data-parallelism with the construction of SFA states. Our parallelization for shared-memory multicores employs lock-free synchronization to minimize cache-coherence overhead. Our dynamic work-partitioning scheme employs work-stealing with thread-local work-queues. The structural properties of FAs allow efficient compression of SFA states. Our construction algorithm dynamically switches to in-memory compression of SFA states for problem sizes which approach the main memory size limit of a given system.We evaluate our approach with patterns from the PROSITE protein database. We achieve speedups of up to 312x on a 64-core AMD system and 193x on a 44-core (88 hyperthreads) Intel system. Our SFA construction algorithm shows scalability on both evaluation platforms.

UR - http://www.scopus.com/inward/record.url?scp=85030642512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030642512&partnerID=8YFLogxK

U2 - 10.1109/ICPP.2017.36

DO - 10.1109/ICPP.2017.36

M3 - Conference contribution

AN - SCOPUS:85030642512

SP - 271

EP - 281

BT - Proceedings - 46th International Conference on Parallel Processing, ICPP 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Jung M, Park J, Blieberger J, Burgstaller B. Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores. In Proceedings - 46th International Conference on Parallel Processing, ICPP 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 271-281. 8025301 https://doi.org/10.1109/ICPP.2017.36