MaskedWiki
MaskedWiki is a large-scale dataset for coreference resolution. It contains 130M passages from Wikipedia where a noun occurs at least twice. The second occurrence is masked and the goal is to correctly predict it. It is thus similar to coreference resolution and can serve as a large pre-training dat...
Main Author: | |
---|---|
Format: | Dataset |
Published: |
University of Oxford
2019
|