Construction and analysis of the word network based on the Random Reading Frame (RRF) method

In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the conce...

Full description

Bibliographic Details
Main Author: WenJun Zhang
Format: Article
Language:English
Published: International Academy of Ecology and Environmental Sciences 2021-09-01
Series:Network Biology
Subjects:
Online Access:http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdf
_version_ 1819090610799247360
author WenJun Zhang
author_facet WenJun Zhang
author_sort WenJun Zhang
collection DOAJ
description In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2, y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x1, y1), (x2, y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.
first_indexed 2024-12-21T22:26:34Z
format Article
id doaj.art-f32a44b353964605a8e57fb42ec56be9
institution Directory Open Access Journal
issn 2220-8879
language English
last_indexed 2024-12-21T22:26:34Z
publishDate 2021-09-01
publisher International Academy of Ecology and Environmental Sciences
record_format Article
series Network Biology
spelling doaj.art-f32a44b353964605a8e57fb42ec56be92022-12-21T18:48:12ZengInternational Academy of Ecology and Environmental SciencesNetwork Biology2220-88792021-09-01113154193Construction and analysis of the word network based on the Random Reading Frame (RRF) methodWenJun Zhang0School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, ChinaIn present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2, y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x1, y1), (x2, y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdfword associationassociation rulescorrelation measuresrandom reading framenetwork constructionnetwork analysisalgorithmtext mining
spellingShingle WenJun Zhang
Construction and analysis of the word network based on the Random Reading Frame (RRF) method
Network Biology
word association
association rules
correlation measures
random reading frame
network construction
network analysis
algorithm
text mining
title Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_full Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_fullStr Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_full_unstemmed Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_short Construction and analysis of the word network based on the Random Reading Frame (RRF) method
title_sort construction and analysis of the word network based on the random reading frame rrf method
topic word association
association rules
correlation measures
random reading frame
network construction
network analysis
algorithm
text mining
url http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdf
work_keys_str_mv AT wenjunzhang constructionandanalysisofthewordnetworkbasedontherandomreadingframerrfmethod