Summary: | Written communication is pivotal for societies to develop. However, lexicon and depth of information vary greatly among texts according to their purpose. Scientific texts, diffusion of science reports, general and area-specific news are all written differently. Thus, we explore the characterization of different text categories through a nature-inspired feature known as the Hurst parameter. We contend that the Hurst exponent is useful to unveil the rhetorical structure within written documents. We collected and processed texts in five categories: scientific articles, diffusion of science reports, business news, entertainment news, and random texts. Each category contains 350 documents. We found that the median for scientific texts has the highest value of the Hurst parameter (0.575), followed by business news (0.54); the median for randomly-generated texts is 0.48, which lies in the region associated with random walks. The median value for diffusion texts is 0.49, and for entertainment texts is 0.53. However, these two categories present high dispersion. We conclude that the Hurst parameter is a measure that quantifies the structure of communication in the selected categories of texts. Application of our finding in the field of e-research is discussed.
|