Online stochastic pattern matching

Marco Cognetta, Yo Sub Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The pattern matching problem is to find all occurrences of a given pattern in an input text. In particular, we consider the case when the pattern is a stochastic regular language where each pattern string has its own probability. Our problem is to find all matching patterns—(start, end) indices in the text—whose probability is larger than a given threshold probability. A pattern matching procedure is frequently used on streaming data in several applications, and often it is very challenging to find the start index of a matching in streaming data. We design an efficient algorithm for the stochastic pattern matching problem over streaming data based on the transformation of the pattern PFA into a weighted automaton and a constant bound on the number of backtracks required to find a start index while reading the streaming input. We also employ heuristics that enable us to reduce the number of backtracks, which improves the practical runtime of our algorithm. We establish the tight theoretical runtime of the proposed algorithm and experimentally demonstrate its practical performance. Finally, we show a possible application of our algorithm to another stochastic pattern matching problem where we search for the maximum probability substring of a text that is a superstring of a specified string.

Original languageEnglish
Title of host publicationImplementation and Application of Automata - 23rd International Conference, CIAA 2018, Proceedings
EditorsCezar Campeanu
PublisherSpringer Verlag
Pages121-132
Number of pages12
ISBN (Print)9783319948119
DOIs
Publication statusPublished - 2018
Event23rd International Conference on Implementation and Application of Automata, CIAA 2018 - Charlottetown, Canada
Duration: 2018 Jul 302018 Aug 2

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10977 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other23rd International Conference on Implementation and Application of Automata, CIAA 2018
CountryCanada
CityCharlottetown
Period18/7/3018/8/2

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Online stochastic pattern matching'. Together they form a unique fingerprint.

Cite this