Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)
Text corpora are tools having both a long tradition in research and a variety of applications. Of all existing types, this paper focuses specifically on parallel, aligned corpora. By taking one of this corpora as a starting point—a parallel, aligned corpus from museum texts originally written in Eng...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Universidad Politécnica de Valencia
2018-07-01
|
Series: | Revista de Lingüística y Lenguas Aplicadas |
Subjects: | |
Online Access: | https://polipapers.upv.es/index.php/rdlyla/article/view/7912 |
_version_ | 1818955754697129984 |
---|---|
author | Jorge Leiva Rojo |
author_facet | Jorge Leiva Rojo |
author_sort | Jorge Leiva Rojo |
collection | DOAJ |
description | Text corpora are tools having both a long tradition in research and a variety of applications. Of all existing types, this paper focuses specifically on parallel, aligned corpora. By taking one of this corpora as a starting point—a parallel, aligned corpus from museum texts originally written in English and subsequently translated into Spanish—, the aim of this article is to propose a methodology that consists of four basic stages. By the revision of previous literature on the topic, and by using multiple software programs—proprietary and free, specifically created for corpus compilation and created for other purposes—, it is concluded that, although the compilation of corpora such as the one that was intended is a feasible task, the procedure is full of obstacles. Some obstacles were overcome, while some were not; that is the case, for example, of the repetitions on the aligned corpus, which are not present in the corpus. |
first_indexed | 2024-12-20T10:43:05Z |
format | Article |
id | doaj.art-498b07130d2646258f61bb0b4e20b8cc |
institution | Directory Open Access Journal |
issn | 1886-2438 1886-6298 |
language | English |
last_indexed | 2024-12-20T10:43:05Z |
publishDate | 2018-07-01 |
publisher | Universidad Politécnica de Valencia |
record_format | Article |
series | Revista de Lingüística y Lenguas Aplicadas |
spelling | doaj.art-498b07130d2646258f61bb0b4e20b8cc2022-12-21T19:43:31ZengUniversidad Politécnica de ValenciaRevista de Lingüística y Lenguas Aplicadas1886-24381886-62982018-07-01131597310.4995/rlyla.2018.79126376Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish)Jorge Leiva Rojo0Universidad de MálagaText corpora are tools having both a long tradition in research and a variety of applications. Of all existing types, this paper focuses specifically on parallel, aligned corpora. By taking one of this corpora as a starting point—a parallel, aligned corpus from museum texts originally written in English and subsequently translated into Spanish—, the aim of this article is to propose a methodology that consists of four basic stages. By the revision of previous literature on the topic, and by using multiple software programs—proprietary and free, specifically created for corpus compilation and created for other purposes—, it is concluded that, although the compilation of corpora such as the one that was intended is a feasible task, the procedure is full of obstacles. Some obstacles were overcome, while some were not; that is the case, for example, of the repetitions on the aligned corpus, which are not present in the corpus.https://polipapers.upv.es/index.php/rdlyla/article/view/7912lingüística de corpustextos museísticostraduccióntextos paralelos alineadosbitextos |
spellingShingle | Jorge Leiva Rojo Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish) Revista de Lingüística y Lenguas Aplicadas lingüística de corpus textos museísticos traducción textos paralelos alineados bitextos |
title | Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish) |
title_full | Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish) |
title_fullStr | Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish) |
title_full_unstemmed | Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish) |
title_short | Designing and compiling parallel aligned corpora: pitfalls and (some) solutions on the example of a corpus of translated musem texts (English-Spanish) |
title_sort | designing and compiling parallel aligned corpora pitfalls and some solutions on the example of a corpus of translated musem texts english spanish |
topic | lingüística de corpus textos museísticos traducción textos paralelos alineados bitextos |
url | https://polipapers.upv.es/index.php/rdlyla/article/view/7912 |
work_keys_str_mv | AT jorgeleivarojo designingandcompilingparallelalignedcorporapitfallsandsomesolutionsontheexampleofacorpusoftranslatedmusemtextsenglishspanish |