Automating XML Markup using Machine Learning Techniques
In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimension...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
International Institute of Informatics and Cybernetics
2004-10-01
|
Series: | Journal of Systemics, Cybernetics and Informatics |
Subjects: | |
Online Access: | http://www.iiisci.org/Journal/CV$/sci/pdfs/P817863.pdf
|
_version_ | 1819053515039834112 |
---|---|
author | Shazia Akhtar Ronan Reilly John Dunnion |
author_facet | Shazia Akhtar Ronan Reilly John Dunnion |
author_sort | Shazia Akhtar |
collection | DOAJ |
description | In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algorithm learns and applies markup rules derived from the nearest SOM neighbours of an unmarked document. The system is designed to be adaptive so that it learns from errors in order to improve the markup of resulting document. Experiments shows that our system provides high accuracy and demonstrate that our approach is practical and feasible. |
first_indexed | 2024-12-21T12:36:57Z |
format | Article |
id | doaj.art-f2706dd35f6245278d74d4c859344f0e |
institution | Directory Open Access Journal |
issn | 1690-4524 |
language | English |
last_indexed | 2024-12-21T12:36:57Z |
publishDate | 2004-10-01 |
publisher | International Institute of Informatics and Cybernetics |
record_format | Article |
series | Journal of Systemics, Cybernetics and Informatics |
spelling | doaj.art-f2706dd35f6245278d74d4c859344f0e2022-12-21T19:03:53ZengInternational Institute of Informatics and CyberneticsJournal of Systemics, Cybernetics and Informatics1690-45242004-10-01251216Automating XML Markup using Machine Learning TechniquesShazia Akhtar0Ronan Reilly1John Dunnion2 Department of Computer Science University College Dublin Ireland Department of Computer Science, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland Department of Computer Science, University College Dublin, Ireland In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algorithm learns and applies markup rules derived from the nearest SOM neighbours of an unmarked document. The system is designed to be adaptive so that it learns from errors in order to improve the markup of resulting document. Experiments shows that our system provides high accuracy and demonstrate that our approach is practical and feasible.http://www.iiisci.org/Journal/CV$/sci/pdfs/P817863.pdf XMLAutomatic MarkupMachine LearningSelf-Organizing MapC5.0 |
spellingShingle | Shazia Akhtar Ronan Reilly John Dunnion Automating XML Markup using Machine Learning Techniques Journal of Systemics, Cybernetics and Informatics XML Automatic Markup Machine Learning Self-Organizing Map C5.0 |
title | Automating XML Markup using Machine Learning Techniques |
title_full | Automating XML Markup using Machine Learning Techniques |
title_fullStr | Automating XML Markup using Machine Learning Techniques |
title_full_unstemmed | Automating XML Markup using Machine Learning Techniques |
title_short | Automating XML Markup using Machine Learning Techniques |
title_sort | automating xml markup using machine learning techniques |
topic | XML Automatic Markup Machine Learning Self-Organizing Map C5.0 |
url | http://www.iiisci.org/Journal/CV$/sci/pdfs/P817863.pdf
|
work_keys_str_mv | AT shaziaakhtar automatingxmlmarkupusingmachinelearningtechniques AT ronanreilly automatingxmlmarkupusingmachinelearningtechniques AT johndunnion automatingxmlmarkupusingmachinelearningtechniques |