MaskedWiki

MaskedWiki is a large-scale dataset for coreference resolution. It contains 130M passages from Wikipedia where a noun occurs at least twice. The second occurrence is masked and the goal is to correctly predict it. It is thus similar to coreference resolution and can serve as a large pre-training dat...

Full description

Bibliographic Details
Main Author: Kocijan, V
Format: Dataset
Published: University of Oxford 2019