Evaluating a typology of signals for automatic detection of complementarity

In a cluster of news texts on the same event, two sentences from different documents might express different multi-document phenomena (redundancy, complementarity, and contradiction). Cross-Document Structure Theory (CST) provides labels to explicitly represent these phenomena. The automatic identif...

Full description

Bibliographic Details
Main Authors: Jackson Wilke da Cruz Souza, Ariani Di Felippo
Format: Article
Language:English
Published: Programa de Pós-Graduação em Estudos Linguísticos 2022-09-01
Series:Domínios de Lingu@gem
Subjects:
Online Access:https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63776
Description
Summary:In a cluster of news texts on the same event, two sentences from different documents might express different multi-document phenomena (redundancy, complementarity, and contradiction). Cross-Document Structure Theory (CST) provides labels to explicitly represent these phenomena. The automatic identification of the multi-document phenomena and their correspondent CST relations is definitely handy for Automatic Multi-Document Summarization since it helps computers understand text meaning. In this paper, we evaluated a typology of (textual) signals for the automatic detection of the CST relations of complementarity (i.e., Historical background, Follow-up and Elaboration) in a multi-document corpus of news texts in Brazilian Portuguese. Using algorithms from different machine-learning paradigms, we obtained classifiers that achieved high general accuracy (higher than 90%), indicating the potential of the signals.
ISSN:1980-5799