A large dataset of scientific text reuse in Open-Access publications
Abstract We present the Webis-STEREO-21 dataset, a massive collection of Scientific Text Reuse in Open-access publications. It contains 91 million cases of reused text passages found in 4.2 million unique open-access publications. Cases range from overlap of as few as eight words to near-duplicate p...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-01-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-022-01908-z |