Online stochastic pattern matching

Marco Cognetta, Yo Sub Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The pattern matching problem is to find all occurrences of a given pattern in an input text. In particular, we consider the case when the pattern is a stochastic regular language where each pattern string has its own probability. Our problem is to find all matching patterns—(start, end) indices in the text—whose probability is larger than a given threshold probability. A pattern matching procedure is frequently used on streaming data in several applications, and often it is very challenging to find the start index of a matching in streaming data. We design an efficient algorithm for the stochastic pattern matching problem over streaming data based on the transformation of the pattern PFA into a weighted automaton and a constant bound on the number of backtracks required to find a start index while reading the streaming input. We also employ heuristics that enable us to reduce the number of backtracks, which improves the practical runtime of our algorithm. We establish the tight theoretical runtime of the proposed algorithm and experimentally demonstrate its practical performance. Finally, we show a possible application of our algorithm to another stochastic pattern matching problem where we search for the maximum probability substring of a text that is a superstring of a specified string.

Original languageEnglish
Title of host publicationImplementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings
PublisherSpringer Verlag
Pages121-132
Number of pages12
ISBN (Print)9783319948119
DOIs
Publication statusPublished - 2018 Jan 1
Event23rd International Conference on Implementation and Application of Automata, CIAA 2018 - Charlottetown, Canada
Duration: 2018 Jul 302018 Aug 2

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10977 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other23rd International Conference on Implementation and Application of Automata, CIAA 2018
CountryCanada
CityCharlottetown
Period18/7/3018/8/2

Fingerprint

Pattern matching
Pattern Matching
Streaming Data
Matching Problem
Strings
Weighted Automata
Formal languages
Superstring
Regular Languages
Streaming
Efficient Algorithms
Heuristics
Demonstrate
Text

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Cognetta, M., & Han, Y. S. (2018). Online stochastic pattern matching. In Implementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings (pp. 121-132). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10977 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-94812-6_11
Cognetta, Marco ; Han, Yo Sub. / Online stochastic pattern matching. Implementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings. Springer Verlag, 2018. pp. 121-132 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{4177b5618139489fb3caf5c515d776bb,
title = "Online stochastic pattern matching",
abstract = "The pattern matching problem is to find all occurrences of a given pattern in an input text. In particular, we consider the case when the pattern is a stochastic regular language where each pattern string has its own probability. Our problem is to find all matching patterns—(start, end) indices in the text—whose probability is larger than a given threshold probability. A pattern matching procedure is frequently used on streaming data in several applications, and often it is very challenging to find the start index of a matching in streaming data. We design an efficient algorithm for the stochastic pattern matching problem over streaming data based on the transformation of the pattern PFA into a weighted automaton and a constant bound on the number of backtracks required to find a start index while reading the streaming input. We also employ heuristics that enable us to reduce the number of backtracks, which improves the practical runtime of our algorithm. We establish the tight theoretical runtime of the proposed algorithm and experimentally demonstrate its practical performance. Finally, we show a possible application of our algorithm to another stochastic pattern matching problem where we search for the maximum probability substring of a text that is a superstring of a specified string.",
author = "Marco Cognetta and Han, {Yo Sub}",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-94812-6_11",
language = "English",
isbn = "9783319948119",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "121--132",
booktitle = "Implementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings",
address = "Germany",

}

Cognetta, M & Han, YS 2018, Online stochastic pattern matching. in Implementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10977 LNCS, Springer Verlag, pp. 121-132, 23rd International Conference on Implementation and Application of Automata, CIAA 2018, Charlottetown, Canada, 18/7/30. https://doi.org/10.1007/978-3-319-94812-6_11

Online stochastic pattern matching. / Cognetta, Marco; Han, Yo Sub.

Implementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings. Springer Verlag, 2018. p. 121-132 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10977 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Online stochastic pattern matching

AU - Cognetta, Marco

AU - Han, Yo Sub

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The pattern matching problem is to find all occurrences of a given pattern in an input text. In particular, we consider the case when the pattern is a stochastic regular language where each pattern string has its own probability. Our problem is to find all matching patterns—(start, end) indices in the text—whose probability is larger than a given threshold probability. A pattern matching procedure is frequently used on streaming data in several applications, and often it is very challenging to find the start index of a matching in streaming data. We design an efficient algorithm for the stochastic pattern matching problem over streaming data based on the transformation of the pattern PFA into a weighted automaton and a constant bound on the number of backtracks required to find a start index while reading the streaming input. We also employ heuristics that enable us to reduce the number of backtracks, which improves the practical runtime of our algorithm. We establish the tight theoretical runtime of the proposed algorithm and experimentally demonstrate its practical performance. Finally, we show a possible application of our algorithm to another stochastic pattern matching problem where we search for the maximum probability substring of a text that is a superstring of a specified string.

AB - The pattern matching problem is to find all occurrences of a given pattern in an input text. In particular, we consider the case when the pattern is a stochastic regular language where each pattern string has its own probability. Our problem is to find all matching patterns—(start, end) indices in the text—whose probability is larger than a given threshold probability. A pattern matching procedure is frequently used on streaming data in several applications, and often it is very challenging to find the start index of a matching in streaming data. We design an efficient algorithm for the stochastic pattern matching problem over streaming data based on the transformation of the pattern PFA into a weighted automaton and a constant bound on the number of backtracks required to find a start index while reading the streaming input. We also employ heuristics that enable us to reduce the number of backtracks, which improves the practical runtime of our algorithm. We establish the tight theoretical runtime of the proposed algorithm and experimentally demonstrate its practical performance. Finally, we show a possible application of our algorithm to another stochastic pattern matching problem where we search for the maximum probability substring of a text that is a superstring of a specified string.

UR - http://www.scopus.com/inward/record.url?scp=85051141243&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051141243&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-94812-6_11

DO - 10.1007/978-3-319-94812-6_11

M3 - Conference contribution

SN - 9783319948119

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 121

EP - 132

BT - Implementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings

PB - Springer Verlag

ER -

Cognetta M, Han YS. Online stochastic pattern matching. In Implementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings. Springer Verlag. 2018. p. 121-132. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-94812-6_11