Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws

This paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a dec...

Full description

Bibliographic Details
Main Authors: Bella Martinez-Seis, Obdulia Pichardo-Lagunas, Harlan Koff, Miguel Equihua, Octavio Perez-Maqueo, Arturo Hernández-Huerta
Format: Article
Language:English
Published: MDPI AG 2022-07-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/7/7/91
_version_ 1797406877281157120
author Bella Martinez-Seis
Obdulia Pichardo-Lagunas
Harlan Koff
Miguel Equihua
Octavio Perez-Maqueo
Arturo Hernández-Huerta
author_facet Bella Martinez-Seis
Obdulia Pichardo-Lagunas
Harlan Koff
Miguel Equihua
Octavio Perez-Maqueo
Arturo Hernández-Huerta
author_sort Bella Martinez-Seis
collection DOAJ
description This paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a deconstruction of law–document structure, and labeled with natural language processing techniques considering part of speech (PoS); a process of entity extraction was also performed. The corpus includes the Mexican constitution and the Mexican laws that were collected from the official site in PDF format repealed before 14 October 2021. The collection has 305 documents, including: the Mexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance. The semi-structured database includes the transformation of the set of laws from PDF format to a digital representation in order to facilitate its computational analysis. The documents were migrated to JSON type files to represent internal hierarchical relations. In addition, basic natural language processing techniques were implemented on laws for the identification of part of speech and named entities. The presented data set is mainly useful for text analysis and data science. It could be used for various legislative analysis tasks including: comprehension, interpretation, translation, classification, accessibility, coherence, and searches. Finally, we present some statistic of the identified entities and an example of the usefulness of the corpus for environmental laws.
first_indexed 2024-03-09T03:33:02Z
format Article
id doaj.art-f63f34a94990465f8771b6ef01114fab
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-03-09T03:33:02Z
publishDate 2022-07-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-f63f34a94990465f8771b6ef01114fab2023-12-03T14:53:17ZengMDPI AGData2306-57292022-07-01779110.3390/data7070091Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican LawsBella Martinez-Seis0Obdulia Pichardo-Lagunas1Harlan Koff2Miguel Equihua3Octavio Perez-Maqueo4Arturo Hernández-Huerta5Engineering Department, UPIITA-IPN, Instituto Politécnico Nacional, Mexico City 07360, MexicoEngineering Department, UPIITA-IPN, Instituto Politécnico Nacional, Mexico City 07360, MexicoDepartment of Geography and Spatial Planning, University of Luxembourg, Maison des Sciences Humaines, 11, Porte des Sciences, L-4366 Luxembourg, LuxembourgRed de Ambiente y Sustentabilidad, Instituto de Ecología, A.C. (INECOL), Xalapa 91073, MexicoRed de Ambiente y Sustentabilidad, Instituto de Ecología, A.C. (INECOL), Xalapa 91073, MexicoRed de Ambiente y Sustentabilidad, Instituto de Ecología, A.C. (INECOL), Xalapa 91073, MexicoThis paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a deconstruction of law–document structure, and labeled with natural language processing techniques considering part of speech (PoS); a process of entity extraction was also performed. The corpus includes the Mexican constitution and the Mexican laws that were collected from the official site in PDF format repealed before 14 October 2021. The collection has 305 documents, including: the Mexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance. The semi-structured database includes the transformation of the set of laws from PDF format to a digital representation in order to facilitate its computational analysis. The documents were migrated to JSON type files to represent internal hierarchical relations. In addition, basic natural language processing techniques were implemented on laws for the identification of part of speech and named entities. The presented data set is mainly useful for text analysis and data science. It could be used for various legislative analysis tasks including: comprehension, interpretation, translation, classification, accessibility, coherence, and searches. Finally, we present some statistic of the identified entities and an example of the usefulness of the corpus for environmental laws.https://www.mdpi.com/2306-5729/7/7/91Mexican legislationlawsnatural language processinglegislative documents
spellingShingle Bella Martinez-Seis
Obdulia Pichardo-Lagunas
Harlan Koff
Miguel Equihua
Octavio Perez-Maqueo
Arturo Hernández-Huerta
Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
Data
Mexican legislation
laws
natural language processing
legislative documents
title Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
title_full Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
title_fullStr Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
title_full_unstemmed Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
title_short Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
title_sort unified labeled and semi structured database of pre processed mexican laws
topic Mexican legislation
laws
natural language processing
legislative documents
url https://www.mdpi.com/2306-5729/7/7/91
work_keys_str_mv AT bellamartinezseis unifiedlabeledandsemistructureddatabaseofpreprocessedmexicanlaws
AT obduliapichardolagunas unifiedlabeledandsemistructureddatabaseofpreprocessedmexicanlaws
AT harlankoff unifiedlabeledandsemistructureddatabaseofpreprocessedmexicanlaws
AT miguelequihua unifiedlabeledandsemistructureddatabaseofpreprocessedmexicanlaws
AT octavioperezmaqueo unifiedlabeledandsemistructureddatabaseofpreprocessedmexicanlaws
AT arturohernandezhuerta unifiedlabeledandsemistructureddatabaseofpreprocessedmexicanlaws