Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2011-02-01
|
Series: | PLoS ONE |
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21407801/?tool=EBI |
_version_ | 1831633109373681664 |
---|---|
author | Adolfo Paolo Masucci Alkiviadis Kalampokis Victor Martínez Eguíluz Emilio Hernández-García |
author_facet | Adolfo Paolo Masucci Alkiviadis Kalampokis Victor Martínez Eguíluz Emilio Hernández-García |
author_sort | Adolfo Paolo Masucci |
collection | DOAJ |
description | In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution. |
first_indexed | 2024-12-19T05:29:59Z |
format | Article |
id | doaj.art-984340c671c34581beac705f09f2d2d6 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-19T05:29:59Z |
publishDate | 2011-02-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-984340c671c34581beac705f09f2d2d62022-12-21T20:34:16ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-02-0162e1733310.1371/journal.pone.0017333Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.Adolfo Paolo MasucciAlkiviadis KalampokisVictor Martínez EguíluzEmilio Hernández-GarcíaIn this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21407801/?tool=EBI |
spellingShingle | Adolfo Paolo Masucci Alkiviadis Kalampokis Victor Martínez Eguíluz Emilio Hernández-García Wikipedia information flow analysis reveals the scale-free architecture of the semantic space. PLoS ONE |
title | Wikipedia information flow analysis reveals the scale-free architecture of the semantic space. |
title_full | Wikipedia information flow analysis reveals the scale-free architecture of the semantic space. |
title_fullStr | Wikipedia information flow analysis reveals the scale-free architecture of the semantic space. |
title_full_unstemmed | Wikipedia information flow analysis reveals the scale-free architecture of the semantic space. |
title_short | Wikipedia information flow analysis reveals the scale-free architecture of the semantic space. |
title_sort | wikipedia information flow analysis reveals the scale free architecture of the semantic space |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21407801/?tool=EBI |
work_keys_str_mv | AT adolfopaolomasucci wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace AT alkiviadiskalampokis wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace AT victormartinezeguiluz wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace AT emiliohernandezgarcia wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace |