Logical structure analysis

From HTML to XML

Min Hyung Lee, Yeon Seok Kim, Kyong Ho Lee

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.

Original languageEnglish
Pages (from-to)109-124
Number of pages16
JournalComputer Standards and Interfaces
Volume29
Issue number1
DOIs
Publication statusPublished - 2007 Jan 1

Fingerprint

HTML
XML
Reusability
World Wide Web
grouping

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Law

Cite this

Lee, Min Hyung ; Kim, Yeon Seok ; Lee, Kyong Ho. / Logical structure analysis : From HTML to XML. In: Computer Standards and Interfaces. 2007 ; Vol. 29, No. 1. pp. 109-124.
@article{cb6c01cf98be4b758ce3417fb74d2ce8,
title = "Logical structure analysis: From HTML to XML",
abstract = "This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.",
author = "Lee, {Min Hyung} and Kim, {Yeon Seok} and Lee, {Kyong Ho}",
year = "2007",
month = "1",
day = "1",
doi = "10.1016/j.csi.2006.02.001",
language = "English",
volume = "29",
pages = "109--124",
journal = "Computer Standards and Interfaces",
issn = "0920-5489",
publisher = "Elsevier",
number = "1",

}

Logical structure analysis : From HTML to XML. / Lee, Min Hyung; Kim, Yeon Seok; Lee, Kyong Ho.

In: Computer Standards and Interfaces, Vol. 29, No. 1, 01.01.2007, p. 109-124.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Logical structure analysis

T2 - From HTML to XML

AU - Lee, Min Hyung

AU - Kim, Yeon Seok

AU - Lee, Kyong Ho

PY - 2007/1/1

Y1 - 2007/1/1

N2 - This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.

AB - This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.

UR - http://www.scopus.com/inward/record.url?scp=33751088046&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751088046&partnerID=8YFLogxK

U2 - 10.1016/j.csi.2006.02.001

DO - 10.1016/j.csi.2006.02.001

M3 - Article

VL - 29

SP - 109

EP - 124

JO - Computer Standards and Interfaces

JF - Computer Standards and Interfaces

SN - 0920-5489

IS - 1

ER -