RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages

This work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa...

Full description

Bibliographic Details
Main Authors: Sergei Budkov, Kseniya Buraya, Andrey Filchenkov, Ivan Smetannikov, Antonina Puchkovskaia
Format: Article
Language:English
Published: FRUCT 2018-11-01
Series:Proceedings of the XXth Conference of Open Innovations Association FRUCT
Subjects:
Online Access:https://fruct.org/publications/fruct23/files/Bud.pdf
_version_ 1811233144512184320
author Sergei Budkov
Kseniya Buraya
Andrey Filchenkov
Ivan Smetannikov
Antonina Puchkovskaia
author_facet Sergei Budkov
Kseniya Buraya
Andrey Filchenkov
Ivan Smetannikov
Antonina Puchkovskaia
author_sort Sergei Budkov
collection DOAJ
description This work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa- rameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modifications of the proposed methods.
first_indexed 2024-04-12T11:16:36Z
format Article
id doaj.art-3a80790d11da4925aa16706c4f9d7f70
institution Directory Open Access Journal
issn 2305-7254
2343-0737
language English
last_indexed 2024-04-12T11:16:36Z
publishDate 2018-11-01
publisher FRUCT
record_format Article
series Proceedings of the XXth Conference of Open Innovations Association FRUCT
spelling doaj.art-3a80790d11da4925aa16706c4f9d7f702022-12-22T03:35:29ZengFRUCTProceedings of the XXth Conference of Open Innovations Association FRUCT2305-72542343-07372018-11-01602237884RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich LanguagesSergei Budkov0Kseniya Buraya1Andrey Filchenkov2Ivan Smetannikov3Antonina Puchkovskaia4ITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaITMO University Saint-Petersburg, RussiaThis work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa- rameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modifications of the proposed methods.https://fruct.org/publications/fruct23/files/Bud.pdf ontologynever ending learningWikipediapart of speech
spellingShingle Sergei Budkov
Kseniya Buraya
Andrey Filchenkov
Ivan Smetannikov
Antonina Puchkovskaia
RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
Proceedings of the XXth Conference of Open Innovations Association FRUCT
ontology
never ending learning
Wikipedia
part of speech
title RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
title_full RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
title_fullStr RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
title_full_unstemmed RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
title_short RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
title_sort rich cpl fact extraction from wikipedia sized corpora for morphologically rich languages
topic ontology
never ending learning
Wikipedia
part of speech
url https://fruct.org/publications/fruct23/files/Bud.pdf
work_keys_str_mv AT sergeibudkov richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages
AT kseniyaburaya richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages
AT andreyfilchenkov richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages
AT ivansmetannikov richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages
AT antoninapuchkovskaia richcplfactextractionfromwikipediasizedcorporaformorphologicallyrichlanguages