Summary: | The article discusses the on-going process for the creation of the MoReThesisCorpus, outlining its major characteristics and offering an account of the considerations and issues involved so far. The corpus, composed of the theses submitted to the University of Modena and Reggio Emilia between 2011 and 2020, is being developed as part of the project CAP (‘Comunicazione Accademica e Professionale;’ Academic and Professional Communication), and is meant to foster research into academic language in a cross-disciplinary discourse perspective, as well as to facilitate the production of educational materials aimed at university students. It aims at supporting the acquisition of discipline-related vocabularies and styles to improve the learning of academic writing through corpus tools and resources, following a data-driven learning approach. Technical details surrounding the acquisition and subsequent processing of the data are discussed, along with considerations on a number of issues pertaining both to computer science and linguistics, directly impinging on the capability of the corpus to correctly support an investigation of academic discourse across different languages and disciplines.
|