As sharing documents through the World Wide Web has been recently and constantly increasing, the need for creating hyperdocuments to make them accessible and retrievable via the internet, in formats such as HTML and SGML/XML, has also been rapidly rising. Nevertheless, only a few works have been done on the conversion of paper documents into hyperdocuments. Moreover, most of these studies have concentrated on the direct conversion of single-column document images that include only text and image objects. In this paper, we propose two methods for converting complex multi-column document images into HTML documents, and a method for generating a structured table of contents page based on the logical structure analysis of the document image. Experiments with various kinds of multi-column document images show that, by using the proposed methods, their corresponding HTML documents can be generated in the same visual layout as that of the document images, and their structured table of contents page can be also produced with the hierarchically ordered section titles hyperlinked to the contents.
Bibliographical noteFunding Information:
In this paper, we proposed two methods — one using the table structure and the other using the layer structure — converting multi-column document images into HTML documents and also proposed a method for generating a structured table of contents page by extracting the section titles from input document images. For the conversion of each paper image into its hyperdocument, the proposed conversion methods were tested on various kinds of complex multi-column document images. Experimental results revealed that the proposed methods performed well. However, they showed different performance on different types of document images. Hence, a scheme is needed to determine which one of the conversion methods should be applied for better performance when document images are given, for further research. For the generation of a structured table of contents page, experiments were carried out on technical papers. Experimental results showed that, using the proposed table of contents generation method, we could create a table of contents page by extracting the section titles from input document images without regard to the accuracy of character recognition. The performance of the proposed methods still depends on the result of character recognition and the geometric layout analysis of input document images. Therefore, the methods must be improved to perform well regardless of the result of preprocessing of the conversion. About the Author —JI-YEON LEE was born in Seoul, Korea, in 1975. She received the B.S. degree in Information Processing from Sangmyung University, Korea, in 1997 and received the M.S. degree in Computer Science and Engineering at Korea University, Seoul, Korea, in 2000. She is currently working as a research engineer at Samsung Electronics, Co., Ltd. in Korea. Her research interests include multimedia, hyperdocument and document structure analysis. About the Author —JEONG-SEON PARK received the B.S. and M.S. degrees in Computer Science from Chungbuk National University, Cheongju, Korea, in 1988 and 1992, respectively. She is currently working toward the Ph.D. degree in computer science and engineering at Korea University, Seoul, Korea. From February 1994 to July 1996, she was a research engineer in S/W R&D center at Hyundai Electronics, Co., Ltd. in Korea and worked as an advanced research engineer at Hyundai Information Technology, Co., Ltd. in Korea from August 1996 to March 1999. She was the winner of the Annual Best Paper Award of the Korea Information Science Society in 1994. Her research interests include pattern recognition, image processing and computer vision. About the Author —HYERAN BYUN received the B.S. and M.S. degrees in Mathematics from Yonsei University, Korea. She received her Ph.D. degree in Computer Science from Purdue University, West Lafayette, Indiana. She was an assistant professor in Hallym University, Chooncheon, Korea from 1994–1995. Since 1995, she has been an associate professor of Computer Science at Yonsei University, Korea. Her research interests include multimedia, computer vision, image processing, and pattern recognition. About the Author —JONGSUB MOON received the B.S. degree and M.S. degree in Computer Science from Seoul National University, Korea in 1981 and 1983, respectively. Also, he received the Ph.D. degree in Computer Science from Illinois Institute of Technology, Illinois, U.S.A., in 1991. He worked at Gold Star Tele-electric research Institute as researcher between 1981 and 1985. After receiving the Ph.D. degree, he joined the Department of Information Engineering of Korea University, Korea as an assistant professor. Now he is an associate professor in the Department of Electric and Information Engineering of Korea University, Korea. His research interests include neural network, image processing, pattern matching and cognitive science. About the Author —SEONG-WHAN LEE received his B.S. degree in Computer Science and Statistics from Seoul National University, Seoul, Korea, in 1984; and M.S. and Ph.D. degrees in Computer Science from KAIST in 1986 and 1989, respectively. From February 1989 to February 1995, he was an Assistant Professor in the Department of Computer Science at Chungbuk National University, Cheongju, Korea. In March 1995, he joined the faculty of the Department of Computer Science and Engineering at Korea University, Seoul, Korea, as an Associate Professor, and now he is a Full Professor. Currently, Dr. Lee is the director of National Creative Research Initiative Center for Artificial Vision Research (CAVR) supported by the Korean Ministry of Science and Technology. Dr. Lee was the winner of the Annual Best Paper Award of the Korea Information Science Society in 1986. He obtained the First Outstanding Young Researcher Award at the 2nd International Conference on Document Analysis and Recognition in 1993, and the First Distinguished Research Professor Award from Chungbuk National University in 1994. He also obtained the Outstanding Research Award from the Korea Information Science Society in 1996. He has been the Co-Editor-in-Chief of the International Journal on Document Analysis and Recognition since 1998 and the Associate Editor of the Pattern Recognition Journal, the International Journal of Pattern Recognition and Artificial Intelligence, and the International Journal of Computer Processing of Oriental Languages since 1997. He was the Program Co-Chair of the 6th International Workshop on Frontiers in Handwriting Recognition, the 2nd International Conference on Multimodal Interface, the 17th International Conference on Computer Processing of Oriental Languages, the 5th International Conference on Document Analysis and Recognition, and the 7th International Conference on Neural Information Processing. He was the Workshop Co-Chair of the 3rd International Workshop on Document Analysis Systems and the 1st IEEE International Workshop on Biologically Motivated Computer Vision. He served on the program committees of several well-known international conferences. He is a fellow of International Association for Pattern Recognition, a senior member of the IEEE Computer Society and a life member of the Korea Information Science Society, the International Neural Network Society, and the Oriental Languages Computer Society. His research interests include pattern recognition, computer vision and neural networks. He has more than 200 publications on these areas in international journals and conference proceedings, and authored five books.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence