Closest Substring Problems for Regular Languages

Yo Sub Han, Sang Ki Ko, Timothy Ng, Kai Salomaa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

It is well known that given a finite set of strings of equal length, the Consensus String problem—the problem of deciding whether or not there exists a consensus string whose distance is at most r from every string in the given set—is proven to be NP-complete. A similar problem called the Closest Substring problem asks whether there exists a string w of length l such that each string in a given set L has a substring whose distance is at most r (called radius) from w. As the Closest Substring problem is a generalized version of the Consensus String problem, it is obvious that the problem is NP-hard for a finite set of strings. We show that the Closest Substring problem for regular languages represented by nondeterministic finite automata (NFAs) is PSPACE-complete. The main difference from the previous work is that we consider an infinite set of strings, which is recognized by an NFA as input instead of a finite set of strings. We also prove that the Closest Substring problem for acyclic NFAs lies in the second level of the polynomial-time hierarchy (formula presented) and is both NP-hard and coNP-hard.

Original languageEnglish
Title of host publicationDevelopments in Language Theory - 22nd International Conference, DLT 2018, Proceedings
EditorsMizuho Hoshi, Shinnosuke Seki
PublisherSpringer Verlag
Pages392-403
Number of pages12
ISBN (Print)9783319986531
DOIs
Publication statusPublished - 2018 Jan 1
Event22nd International Conference on Developments in Language Theory, DLT 2018 - Tokyo, Japan
Duration: 2018 Sep 102018 Sep 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11088 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd International Conference on Developments in Language Theory, DLT 2018
CountryJapan
CityTokyo
Period18/9/1018/9/14

Fingerprint

Formal languages
Regular Languages
Finite automata
Strings
Finite Automata
Computational complexity
Finite Set
Polynomials
NP-complete problem
Polynomial time
Radius

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Han, Y. S., Ko, S. K., Ng, T., & Salomaa, K. (2018). Closest Substring Problems for Regular Languages. In M. Hoshi, & S. Seki (Eds.), Developments in Language Theory - 22nd International Conference, DLT 2018, Proceedings (pp. 392-403). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11088 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-98654-8_32
Han, Yo Sub ; Ko, Sang Ki ; Ng, Timothy ; Salomaa, Kai. / Closest Substring Problems for Regular Languages. Developments in Language Theory - 22nd International Conference, DLT 2018, Proceedings. editor / Mizuho Hoshi ; Shinnosuke Seki. Springer Verlag, 2018. pp. 392-403 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c4ed869827404966b7e74d250045980e,
title = "Closest Substring Problems for Regular Languages",
abstract = "It is well known that given a finite set of strings of equal length, the Consensus String problem—the problem of deciding whether or not there exists a consensus string whose distance is at most r from every string in the given set—is proven to be NP-complete. A similar problem called the Closest Substring problem asks whether there exists a string w of length l such that each string in a given set L has a substring whose distance is at most r (called radius) from w. As the Closest Substring problem is a generalized version of the Consensus String problem, it is obvious that the problem is NP-hard for a finite set of strings. We show that the Closest Substring problem for regular languages represented by nondeterministic finite automata (NFAs) is PSPACE-complete. The main difference from the previous work is that we consider an infinite set of strings, which is recognized by an NFA as input instead of a finite set of strings. We also prove that the Closest Substring problem for acyclic NFAs lies in the second level of the polynomial-time hierarchy (formula presented) and is both NP-hard and coNP-hard.",
author = "Han, {Yo Sub} and Ko, {Sang Ki} and Timothy Ng and Kai Salomaa",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-98654-8_32",
language = "English",
isbn = "9783319986531",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "392--403",
editor = "Mizuho Hoshi and Shinnosuke Seki",
booktitle = "Developments in Language Theory - 22nd International Conference, DLT 2018, Proceedings",
address = "Germany",

}

Han, YS, Ko, SK, Ng, T & Salomaa, K 2018, Closest Substring Problems for Regular Languages. in M Hoshi & S Seki (eds), Developments in Language Theory - 22nd International Conference, DLT 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11088 LNCS, Springer Verlag, pp. 392-403, 22nd International Conference on Developments in Language Theory, DLT 2018, Tokyo, Japan, 18/9/10. https://doi.org/10.1007/978-3-319-98654-8_32

Closest Substring Problems for Regular Languages. / Han, Yo Sub; Ko, Sang Ki; Ng, Timothy; Salomaa, Kai.

Developments in Language Theory - 22nd International Conference, DLT 2018, Proceedings. ed. / Mizuho Hoshi; Shinnosuke Seki. Springer Verlag, 2018. p. 392-403 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11088 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Closest Substring Problems for Regular Languages

AU - Han, Yo Sub

AU - Ko, Sang Ki

AU - Ng, Timothy

AU - Salomaa, Kai

PY - 2018/1/1

Y1 - 2018/1/1

N2 - It is well known that given a finite set of strings of equal length, the Consensus String problem—the problem of deciding whether or not there exists a consensus string whose distance is at most r from every string in the given set—is proven to be NP-complete. A similar problem called the Closest Substring problem asks whether there exists a string w of length l such that each string in a given set L has a substring whose distance is at most r (called radius) from w. As the Closest Substring problem is a generalized version of the Consensus String problem, it is obvious that the problem is NP-hard for a finite set of strings. We show that the Closest Substring problem for regular languages represented by nondeterministic finite automata (NFAs) is PSPACE-complete. The main difference from the previous work is that we consider an infinite set of strings, which is recognized by an NFA as input instead of a finite set of strings. We also prove that the Closest Substring problem for acyclic NFAs lies in the second level of the polynomial-time hierarchy (formula presented) and is both NP-hard and coNP-hard.

AB - It is well known that given a finite set of strings of equal length, the Consensus String problem—the problem of deciding whether or not there exists a consensus string whose distance is at most r from every string in the given set—is proven to be NP-complete. A similar problem called the Closest Substring problem asks whether there exists a string w of length l such that each string in a given set L has a substring whose distance is at most r (called radius) from w. As the Closest Substring problem is a generalized version of the Consensus String problem, it is obvious that the problem is NP-hard for a finite set of strings. We show that the Closest Substring problem for regular languages represented by nondeterministic finite automata (NFAs) is PSPACE-complete. The main difference from the previous work is that we consider an infinite set of strings, which is recognized by an NFA as input instead of a finite set of strings. We also prove that the Closest Substring problem for acyclic NFAs lies in the second level of the polynomial-time hierarchy (formula presented) and is both NP-hard and coNP-hard.

UR - http://www.scopus.com/inward/record.url?scp=85053888367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053888367&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-98654-8_32

DO - 10.1007/978-3-319-98654-8_32

M3 - Conference contribution

AN - SCOPUS:85053888367

SN - 9783319986531

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 392

EP - 403

BT - Developments in Language Theory - 22nd International Conference, DLT 2018, Proceedings

A2 - Hoshi, Mizuho

A2 - Seki, Shinnosuke

PB - Springer Verlag

ER -

Han YS, Ko SK, Ng T, Salomaa K. Closest Substring Problems for Regular Languages. In Hoshi M, Seki S, editors, Developments in Language Theory - 22nd International Conference, DLT 2018, Proceedings. Springer Verlag. 2018. p. 392-403. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-98654-8_32