Automatic extraction of apparent semantic structure from text contents of a structural calculation document

Bong Geun Kim, Sang Il Park, Hyo Jin Kim, Sang-Ho Lee

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

A generic method for the automatic extraction of apparent semantic document structure from a structural calculation document was proposed in this paper. The method consists of two processes: extracting subtitles and classifying depth levels of the subtitles. The subtitles become tree nodes of the apparent semantic structure. A context model of technical documents was built for the subtitle extraction from plain text information. In addition, a formal classification method for the determination of depth levels of the subtitles was developed and used to build a document tree with sequentially ordered subtitles. An application module of the proposed method, which transforms a plain text document into a semistructured XML document, was implemented. Performance of the developed application module was also evaluated with 40 test documents including structural calculation documents, technical reports, and theses.

Original languageEnglish
Pages (from-to)313-324
Number of pages12
JournalJournal of Computing in Civil Engineering
Volume24
Issue number3
DOIs
Publication statusPublished - 2010 Apr 28

Fingerprint

Semantics
XML

All Science Journal Classification (ASJC) codes

  • Civil and Structural Engineering
  • Computer Science Applications

Cite this

@article{f245623b75884bf492d351d5273142f3,
title = "Automatic extraction of apparent semantic structure from text contents of a structural calculation document",
abstract = "A generic method for the automatic extraction of apparent semantic document structure from a structural calculation document was proposed in this paper. The method consists of two processes: extracting subtitles and classifying depth levels of the subtitles. The subtitles become tree nodes of the apparent semantic structure. A context model of technical documents was built for the subtitle extraction from plain text information. In addition, a formal classification method for the determination of depth levels of the subtitles was developed and used to build a document tree with sequentially ordered subtitles. An application module of the proposed method, which transforms a plain text document into a semistructured XML document, was implemented. Performance of the developed application module was also evaluated with 40 test documents including structural calculation documents, technical reports, and theses.",
author = "Kim, {Bong Geun} and Park, {Sang Il} and Kim, {Hyo Jin} and Sang-Ho Lee",
year = "2010",
month = "4",
day = "28",
doi = "10.1061/(ASCE)CP.1943-5487.0000047",
language = "English",
volume = "24",
pages = "313--324",
journal = "Journal of Computing in Civil Engineering",
issn = "0887-3801",
publisher = "American Society of Civil Engineers (ASCE)",
number = "3",

}

Automatic extraction of apparent semantic structure from text contents of a structural calculation document. / Kim, Bong Geun; Park, Sang Il; Kim, Hyo Jin; Lee, Sang-Ho.

In: Journal of Computing in Civil Engineering, Vol. 24, No. 3, 28.04.2010, p. 313-324.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Automatic extraction of apparent semantic structure from text contents of a structural calculation document

AU - Kim, Bong Geun

AU - Park, Sang Il

AU - Kim, Hyo Jin

AU - Lee, Sang-Ho

PY - 2010/4/28

Y1 - 2010/4/28

N2 - A generic method for the automatic extraction of apparent semantic document structure from a structural calculation document was proposed in this paper. The method consists of two processes: extracting subtitles and classifying depth levels of the subtitles. The subtitles become tree nodes of the apparent semantic structure. A context model of technical documents was built for the subtitle extraction from plain text information. In addition, a formal classification method for the determination of depth levels of the subtitles was developed and used to build a document tree with sequentially ordered subtitles. An application module of the proposed method, which transforms a plain text document into a semistructured XML document, was implemented. Performance of the developed application module was also evaluated with 40 test documents including structural calculation documents, technical reports, and theses.

AB - A generic method for the automatic extraction of apparent semantic document structure from a structural calculation document was proposed in this paper. The method consists of two processes: extracting subtitles and classifying depth levels of the subtitles. The subtitles become tree nodes of the apparent semantic structure. A context model of technical documents was built for the subtitle extraction from plain text information. In addition, a formal classification method for the determination of depth levels of the subtitles was developed and used to build a document tree with sequentially ordered subtitles. An application module of the proposed method, which transforms a plain text document into a semistructured XML document, was implemented. Performance of the developed application module was also evaluated with 40 test documents including structural calculation documents, technical reports, and theses.

UR - http://www.scopus.com/inward/record.url?scp=77951212598&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951212598&partnerID=8YFLogxK

U2 - 10.1061/(ASCE)CP.1943-5487.0000047

DO - 10.1061/(ASCE)CP.1943-5487.0000047

M3 - Article

VL - 24

SP - 313

EP - 324

JO - Journal of Computing in Civil Engineering

JF - Journal of Computing in Civil Engineering

SN - 0887-3801

IS - 3

ER -