Linguistic measures of chemical diversity and the “keywords” of molecular collections
Abstract Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections (“corpora”), including those deposited on the Internet – indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2018-05-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-018-25440-6 |
_version_ | 1818754205104472064 |
---|---|
author | Michał Woźniak Agnieszka Wołos Urszula Modrzyk Rafał L. Górski Jan Winkowski Michał Bajczyk Sara Szymkuć Bartosz A. Grzybowski Maciej Eder |
author_facet | Michał Woźniak Agnieszka Wołos Urszula Modrzyk Rafał L. Górski Jan Winkowski Michał Bajczyk Sara Szymkuć Bartosz A. Grzybowski Maciej Eder |
author_sort | Michał Woźniak |
collection | DOAJ |
description | Abstract Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections (“corpora”), including those deposited on the Internet – indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic “chemical words” that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular “keywords” by which such collections are best characterized and annotated. |
first_indexed | 2024-12-18T05:19:33Z |
format | Article |
id | doaj.art-cb3bb3ade27f4b13a2b20861629d946e |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-12-18T05:19:33Z |
publishDate | 2018-05-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-cb3bb3ade27f4b13a2b20861629d946e2022-12-21T21:19:42ZengNature PortfolioScientific Reports2045-23222018-05-018111010.1038/s41598-018-25440-6Linguistic measures of chemical diversity and the “keywords” of molecular collectionsMichał Woźniak0Agnieszka Wołos1Urszula Modrzyk2Rafał L. Górski3Jan Winkowski4Michał Bajczyk5Sara Szymkuć6Bartosz A. Grzybowski7Maciej Eder8Institute of Polish Language, Polish Academy of SciencesInstitute of Organic Chemistry, Polish Academy of SciencesInstitute of Polish Language, Polish Academy of SciencesInstitute of Polish Language, Polish Academy of SciencesInstitute of Polish Language, Polish Academy of SciencesInstitute of Organic Chemistry, Polish Academy of SciencesInstitute of Organic Chemistry, Polish Academy of SciencesInstitute of Organic Chemistry, Polish Academy of SciencesInstitute of Polish Language, Polish Academy of SciencesAbstract Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections (“corpora”), including those deposited on the Internet – indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic “chemical words” that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular “keywords” by which such collections are best characterized and annotated.https://doi.org/10.1038/s41598-018-25440-6 |
spellingShingle | Michał Woźniak Agnieszka Wołos Urszula Modrzyk Rafał L. Górski Jan Winkowski Michał Bajczyk Sara Szymkuć Bartosz A. Grzybowski Maciej Eder Linguistic measures of chemical diversity and the “keywords” of molecular collections Scientific Reports |
title | Linguistic measures of chemical diversity and the “keywords” of molecular collections |
title_full | Linguistic measures of chemical diversity and the “keywords” of molecular collections |
title_fullStr | Linguistic measures of chemical diversity and the “keywords” of molecular collections |
title_full_unstemmed | Linguistic measures of chemical diversity and the “keywords” of molecular collections |
title_short | Linguistic measures of chemical diversity and the “keywords” of molecular collections |
title_sort | linguistic measures of chemical diversity and the keywords of molecular collections |
url | https://doi.org/10.1038/s41598-018-25440-6 |
work_keys_str_mv | AT michałwozniak linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT agnieszkawołos linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT urszulamodrzyk linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT rafałlgorski linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT janwinkowski linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT michałbajczyk linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT saraszymkuc linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT bartoszagrzybowski linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT maciejeder linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections |