Automating XML Markup using Machine Learning Techniques

In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimension...

Full description

Bibliographic Details
Main Authors: Shazia Akhtar, Ronan Reilly, John Dunnion
Format: Article
Language:English
Published: International Institute of Informatics and Cybernetics 2004-10-01
Series:Journal of Systemics, Cybernetics and Informatics
Subjects:
Online Access:http://www.iiisci.org/Journal/CV$/sci/pdfs/P817863.pdf
_version_ 1819053515039834112
author Shazia Akhtar
Ronan Reilly
John Dunnion
author_facet Shazia Akhtar
Ronan Reilly
John Dunnion
author_sort Shazia Akhtar
collection DOAJ
description In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algorithm learns and applies markup rules derived from the nearest SOM neighbours of an unmarked document. The system is designed to be adaptive so that it learns from errors in order to improve the markup of resulting document. Experiments shows that our system provides high accuracy and demonstrate that our approach is practical and feasible.
first_indexed 2024-12-21T12:36:57Z
format Article
id doaj.art-f2706dd35f6245278d74d4c859344f0e
institution Directory Open Access Journal
issn 1690-4524
language English
last_indexed 2024-12-21T12:36:57Z
publishDate 2004-10-01
publisher International Institute of Informatics and Cybernetics
record_format Article
series Journal of Systemics, Cybernetics and Informatics
spelling doaj.art-f2706dd35f6245278d74d4c859344f0e2022-12-21T19:03:53ZengInternational Institute of Informatics and CyberneticsJournal of Systemics, Cybernetics and Informatics1690-45242004-10-01251216Automating XML Markup using Machine Learning TechniquesShazia Akhtar0Ronan Reilly1John Dunnion2 Department of Computer Science University College Dublin Ireland Department of Computer Science, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland Department of Computer Science, University College Dublin, Ireland In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algorithm learns and applies markup rules derived from the nearest SOM neighbours of an unmarked document. The system is designed to be adaptive so that it learns from errors in order to improve the markup of resulting document. Experiments shows that our system provides high accuracy and demonstrate that our approach is practical and feasible.http://www.iiisci.org/Journal/CV$/sci/pdfs/P817863.pdf XMLAutomatic MarkupMachine LearningSelf-Organizing MapC5.0
spellingShingle Shazia Akhtar
Ronan Reilly
John Dunnion
Automating XML Markup using Machine Learning Techniques
Journal of Systemics, Cybernetics and Informatics
XML
Automatic Markup
Machine Learning
Self-Organizing Map
C5.0
title Automating XML Markup using Machine Learning Techniques
title_full Automating XML Markup using Machine Learning Techniques
title_fullStr Automating XML Markup using Machine Learning Techniques
title_full_unstemmed Automating XML Markup using Machine Learning Techniques
title_short Automating XML Markup using Machine Learning Techniques
title_sort automating xml markup using machine learning techniques
topic XML
Automatic Markup
Machine Learning
Self-Organizing Map
C5.0
url http://www.iiisci.org/Journal/CV$/sci/pdfs/P817863.pdf
work_keys_str_mv AT shaziaakhtar automatingxmlmarkupusingmachinelearningtechniques
AT ronanreilly automatingxmlmarkupusingmachinelearningtechniques
AT johndunnion automatingxmlmarkupusingmachinelearningtechniques