Automatic extraction of apparent semantic structure from text contents of a structural calculation document

Bong Geun Kim, Sang Il Park, Hyo Jin Kim, Sang Ho Lee

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

A generic method for the automatic extraction of apparent semantic document structure from a structural calculation document was proposed in this paper. The method consists of two processes: extracting subtitles and classifying depth levels of the subtitles. The subtitles become tree nodes of the apparent semantic structure. A context model of technical documents was built for the subtitle extraction from plain text information. In addition, a formal classification method for the determination of depth levels of the subtitles was developed and used to build a document tree with sequentially ordered subtitles. An application module of the proposed method, which transforms a plain text document into a semistructured XML document, was implemented. Performance of the developed application module was also evaluated with 40 test documents including structural calculation documents, technical reports, and theses.

Original languageEnglish
Pages (from-to)313-324
Number of pages12
JournalJournal of Computing in Civil Engineering
Volume24
Issue number3
DOIs
Publication statusPublished - 2010 Apr 28

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Civil and Structural Engineering
  • Computer Science Applications

Cite this