Written Documents Analyzed as Nature-Inspired Processes: Persistence, Anti-Persistence, and Random Walks—We Remember, as Along Came Writing—T. Holopainen

Written communication is pivotal for societies to develop. However, lexicon and depth of information vary greatly among texts according to their purpose. Scientific texts, diffusion of science reports, general and area-specific news are all written differently. Thus, we explore the characterization...

Full description

Bibliographic Details
Main Authors: Omar López-Ortega, Obed Pérez-Cortés, Heydy Castillejos-Fernández, Félix-Agustín Castro-Espinoza, Miguel González-Mendoza
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/18/6354
Description
Summary:Written communication is pivotal for societies to develop. However, lexicon and depth of information vary greatly among texts according to their purpose. Scientific texts, diffusion of science reports, general and area-specific news are all written differently. Thus, we explore the characterization of different text categories through a nature-inspired feature known as the Hurst parameter. We contend that the Hurst exponent is useful to unveil the rhetorical structure within written documents. We collected and processed texts in five categories: scientific articles, diffusion of science reports, business news, entertainment news, and random texts. Each category contains 350 documents. We found that the median for scientific texts has the highest value of the Hurst parameter (0.575), followed by business news (0.54); the median for randomly-generated texts is 0.48, which lies in the region associated with random walks. The median value for diffusion texts is 0.49, and for entertainment texts is 0.53. However, these two categories present high dispersion. We conclude that the Hurst parameter is a measure that quantifies the structure of communication in the selected categories of texts. Application of our finding in the field of e-research is discussed.
ISSN:2076-3417