RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
This work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
FRUCT
2018-11-01
|
Series: | Proceedings of the XXth Conference of Open Innovations Association FRUCT |
Subjects: | |
Online Access: | https://fruct.org/publications/fruct23/files/Bud.pdf
|
_version_ | 1811233144512184320 |
---|---|
author | Sergei Budkov Kseniya Buraya Andrey Filchenkov Ivan Smetannikov Antonina Puchkovskaia |
author_facet | Sergei Budkov Kseniya Buraya Andrey Filchenkov Ivan Smetannikov Antonina Puchkovskaia |
author_sort | Sergei Budkov |
collection | DOAJ |
description | This work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa- rameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modifications of the proposed methods. |
first_indexed | 2024-04-12T11:16:36Z |
format | Article |
id | doaj.art-3a80790d11da4925aa16706c4f9d7f70 |
institution | Directory Open Access Journal |
issn | 2305-7254 2343-0737 |
language | English |
last_indexed | 2024-04-12T11:16:36Z |
publishDate | 2018-11-01 |
publisher | FRUCT |
record_format | Article |
series | Proceedings of the XXth Conference of Open Innovations Association FRUCT |
spelling | doaj.art-3a80790d11da4925aa16706c4f9d7f702022-12-22T03:35:29ZengFRUCTProceedings of the XXth Conference of Open Innovations Association FRUCT2305-72542343-07372018-11-01602237884RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich LanguagesSergei Budkov0Kseniya Buraya1Andrey Filchenkov2Ivan Smetannikov3Antonina Puchkovskaia4ITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaThis work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa- rameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modifications of the proposed methods.https://fruct.org/publications/fruct23/files/Bud.pdf ontologynever ending learningWikipediapart of speech |
spellingShingle | Sergei Budkov Kseniya Buraya Andrey Filchenkov Ivan Smetannikov Antonina Puchkovskaia RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages Proceedings of the XXth Conference of Open Innovations Association FRUCT ontology never ending learning Wikipedia part of speech |
title | RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages |
title_full | RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages |
title_fullStr | RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages |
title_full_unstemmed | RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages |
title_short | RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages |
title_sort | rich cpl fact extraction from wikipedia sized corpora for morphologically rich languages |
topic | ontology never ending learning Wikipedia part of speech |
url | https://fruct.org/publications/fruct23/files/Bud.pdf
|
work_keys_str_mv | AT sergeibudkov richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages AT kseniyaburaya richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages AT andreyfilchenkov richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages AT ivansmetannikov richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages AT antoninapuchkovskaia richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages |