Logical structure analysis and generation for structured documents: A syntactic approach

Kyong Ho Lee, Yoon Chul Choy, Sung Bae Cho

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

This paper presents a syntactic method for sophisticated logical structure analysis that transforms document images with multiple pages and hierarchical structure into an electronic document based on SGML/XML. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed parsing method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of documents efficiently and present its automated creation method. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) show that the method has performed logical structure analysis successfully and generated a document model automatically. Particularly, the method generates SGML/XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.

Original languageEnglish
Pages (from-to)1277-1294
Number of pages18
JournalIEEE Transactions on Knowledge and Data Engineering
Volume15
Issue number5
DOIs
Publication statusPublished - 2003 Sep 1

Fingerprint

SGML
Syntactics
XML
Reusability
Structural analysis

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

@article{c1c9faa4050c424fbb5fe92dfab1314b,
title = "Logical structure analysis and generation for structured documents: A syntactic approach",
abstract = "This paper presents a syntactic method for sophisticated logical structure analysis that transforms document images with multiple pages and hierarchical structure into an electronic document based on SGML/XML. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed parsing method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of documents efficiently and present its automated creation method. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) show that the method has performed logical structure analysis successfully and generated a document model automatically. Particularly, the method generates SGML/XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.",
author = "Lee, {Kyong Ho} and Choy, {Yoon Chul} and Cho, {Sung Bae}",
year = "2003",
month = "9",
day = "1",
doi = "10.1109/TKDE.2003.1232278",
language = "English",
volume = "15",
pages = "1277--1294",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "5",

}

Logical structure analysis and generation for structured documents : A syntactic approach. / Lee, Kyong Ho; Choy, Yoon Chul; Cho, Sung Bae.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 5, 01.09.2003, p. 1277-1294.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Logical structure analysis and generation for structured documents

T2 - A syntactic approach

AU - Lee, Kyong Ho

AU - Choy, Yoon Chul

AU - Cho, Sung Bae

PY - 2003/9/1

Y1 - 2003/9/1

N2 - This paper presents a syntactic method for sophisticated logical structure analysis that transforms document images with multiple pages and hierarchical structure into an electronic document based on SGML/XML. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed parsing method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of documents efficiently and present its automated creation method. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) show that the method has performed logical structure analysis successfully and generated a document model automatically. Particularly, the method generates SGML/XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.

AB - This paper presents a syntactic method for sophisticated logical structure analysis that transforms document images with multiple pages and hierarchical structure into an electronic document based on SGML/XML. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed parsing method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of documents efficiently and present its automated creation method. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) show that the method has performed logical structure analysis successfully and generated a document model automatically. Particularly, the method generates SGML/XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.

UR - http://www.scopus.com/inward/record.url?scp=0141836805&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0141836805&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2003.1232278

DO - 10.1109/TKDE.2003.1232278

M3 - Article

AN - SCOPUS:0141836805

VL - 15

SP - 1277

EP - 1294

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 5

ER -