Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)

Text corpora are tools having both a long tradition in research and a variety of applications. Of all existing types, this paper focuses specifically on parallel, aligned corpora. By taking one of this corpora as a starting point—a parallel, aligned corpus from museum texts originally written in Eng...

Full description

Bibliographic Details
Main Author: Jorge Leiva Rojo
Format: Article
Language:English
Published: Universidad Politécnica de Valencia 2018-07-01
Series:Revista de Lingüística y Lenguas Aplicadas
Subjects:
Online Access:https://polipapers.upv.es/index.php/rdlyla/article/view/7912
_version_ 1818955754697129984
author Jorge Leiva Rojo
author_facet Jorge Leiva Rojo
author_sort Jorge Leiva Rojo
collection DOAJ
description Text corpora are tools having both a long tradition in research and a variety of applications. Of all existing types, this paper focuses specifically on parallel, aligned corpora. By taking one of this corpora as a starting point—a parallel, aligned corpus from museum texts originally written in English and subsequently translated into Spanish—, the aim of this article is to propose a methodology that consists of four basic stages. By the revision of previous literature on the topic, and by using multiple software programs—proprietary and free, specifically created for corpus compilation and created for other purposes—, it is concluded that, although the compilation of corpora such as the one that was intended is a feasible task, the procedure is full of obstacles. Some obstacles were overcome, while some were not; that is the case, for example, of the repetitions on the aligned corpus, which are not present in the corpus.
first_indexed 2024-12-20T10:43:05Z
format Article
id doaj.art-498b07130d2646258f61bb0b4e20b8cc
institution Directory Open Access Journal
issn 1886-2438
1886-6298
language English
last_indexed 2024-12-20T10:43:05Z
publishDate 2018-07-01
publisher Universidad Politécnica de Valencia
record_format Article
series Revista de Lingüística y Lenguas Aplicadas
spelling doaj.art-498b07130d2646258f61bb0b4e20b8cc2022-12-21T19:43:31ZengUniversidad Politécnica de ValenciaRevista de Lingüística y Lenguas Aplicadas1886-24381886-62982018-07-01131597310.4995/rlyla.2018.79126376Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)Jorge Leiva Rojo0Universidad de MálagaText corpora are tools having both a long tradition in research and a variety of applications. Of all existing types, this paper focuses specifically on parallel, aligned corpora. By taking one of this corpora as a starting point—a parallel, aligned corpus from museum texts originally written in English and subsequently translated into Spanish—, the aim of this article is to propose a methodology that consists of four basic stages. By the revision of previous literature on the topic, and by using multiple software programs—proprietary and free, specifically created for corpus compilation and created for other purposes—, it is concluded that, although the compilation of corpora such as the one that was intended is a feasible task, the procedure is full of obstacles. Some obstacles were overcome, while some were not; that is the case, for example, of the repetitions on the aligned corpus, which are not present in the corpus.https://polipapers.upv.es/index.php/rdlyla/article/view/7912lingüística de corpustextos museísticostraduccióntextos paralelos alineadosbitextos
spellingShingle Jorge Leiva Rojo
Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)
Revista de Lingüística y Lenguas Aplicadas
lingüística de corpus
textos museísticos
traducción
textos paralelos alineados
bitextos
title Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)
title_full Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)
title_fullStr Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)
title_full_unstemmed Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)
title_short Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)
title_sort designing and compiling parallel aligned corpora pitfalls and some solutions on the example of a corpus of translated musem texts english spanish
topic lingüística de corpus
textos museísticos
traducción
textos paralelos alineados
bitextos
url https://polipapers.upv.es/index.php/rdlyla/article/view/7912
work_keys_str_mv AT jorgeleivarojo designingandcompilingparallelalignedcorporapitfallsandsomesolutionsontheexampleofacorpusoftranslatedmusemtextsenglishspanish