Summary: | AI has introduced a new reform direction for traditional education, such as automating Grammatical Error Correction (GEC) to reduce teachers’ workload and improve efficiency. However, current GEC models still have flaws because human language is very variable, and the available labeled datasets are often too small to learn everything automatically. One of the key principles of GEC is to preserve correct parts of the input text while correcting grammatical errors. However, previous sequence-to-sequence (Seq2Seq) models may be prone to over-correction as they generate corrections from scratch. Over-correction is a phenomenon where a grammatically correct sentence is incorrectly flagged as containing errors that require correction, leading to incorrect corrections that can change the meaning or structure of the original sentence. This can significantly reduce the accuracy and usefulness of GEC systems, highlighting the need for improved approaches that can reduce over-correction and ensure more accurate and natural corrections. Recently, sequence tagging-based models have been used to mitigate this issue by only predicting edit operations that convert the source sentence to a corrected one. Despite their good performance on datasets with minimal edits, they struggle to restore texts with drastic changes. This issue artificially restricts the type of changes that can be made to a sentence and does not reflect those required for native speakers to find sentences fluent or natural sounding. Moreover, sequence tagging-based models are usually conditioned on human-designed language-specific tagging labels, hindering generalization and the real error distribution generated by diverse learners from different nationalities. In this work, we introduce a novel Seq2Seq-based approach that can handle a wide variety of grammatical errors on a low-fluency dataset. Our approach enhances the Seq2Seq architecture with a novel copy mechanism based on a supervised attention approach. Instead of merely predicting the next token in context, the model predicts additional correctness-related information for each token. This auxiliary objective propagates into the weights of the model during training without requiring extra labels at testing time. Experimental results on benchmark datasets show that our model achieves competitive performance compared to state-of-the-art(SOTA) models.
|