Data mining patented antibody sequences

The patent literature should reflect the past 30 years of engineering efforts directed toward developing monoclonal antibody therapeutics. Such information is potentially valuable for rational antibody design. Patents, however, are designed not to convey scientific knowledge, but to provide legal pr...

Full description

Bibliographic Details
Main Authors: Konrad Krawczyk, Andrew Buchanan, Paolo Marcatili
Format: Article
Language:English
Published: Taylor & Francis Group 2021-01-01
Series:mAbs
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/19420862.2021.1892366
_version_ 1811343664742400000
author Konrad Krawczyk
Andrew Buchanan
Paolo Marcatili
author_facet Konrad Krawczyk
Andrew Buchanan
Paolo Marcatili
author_sort Konrad Krawczyk
collection DOAJ
description The patent literature should reflect the past 30 years of engineering efforts directed toward developing monoclonal antibody therapeutics. Such information is potentially valuable for rational antibody design. Patents, however, are designed not to convey scientific knowledge, but to provide legal protection. It is not obvious whether antibody information from patent documents, such as antibody sequences, is useful in conveying engineering know-how, rather than as a legal reference only. To assess the utility of patent data for therapeutic antibody engineering, we quantified the amount of antibody sequences in patents destined for medicinal purposes and how well they reflect the primary sequences of therapeutic antibodies in clinical use. We identified 16,526 patent families covering major jurisdictions (e.g., US Patent and Trademark Office (USPTO) and World Intellectual Property Organization) that contained antibody sequences. These families held 245,109 unique antibody chains (135,397 heavy chains and 109,712 light chains) that we compiled in our Patented Antibody Database (PAD, http://naturalantibody.com/pad). We find that antibodies make up a non-trivial proportion of all patent amino acid sequence depositions (e.g., 11% of USPTO Full Text database). Our analysis of the 16,526 families demonstrates that the volume of patent documents with antibody sequences is growing, with the majority of documents classified as containing antibodies for medicinal purposes. We further studied the 245,109 antibody chains from patent literature to reveal that they very well reflect the primary sequences of antibody therapeutics in clinical use. This suggests that the patent literature could serve as a reference for previous engineering efforts to improve rational antibody design.
first_indexed 2024-04-13T19:33:55Z
format Article
id doaj.art-ed2beac4fa7e4f9ea560ddb296104365
institution Directory Open Access Journal
issn 1942-0862
1942-0870
language English
last_indexed 2024-04-13T19:33:55Z
publishDate 2021-01-01
publisher Taylor & Francis Group
record_format Article
series mAbs
spelling doaj.art-ed2beac4fa7e4f9ea560ddb2961043652022-12-22T02:33:07ZengTaylor & Francis GroupmAbs1942-08621942-08702021-01-0113110.1080/19420862.2021.1892366Data mining patented antibody sequencesKonrad Krawczyk0Andrew Buchanan1Paolo Marcatili2Research and Development, Natural Antibody, Hamburg, GermanyAntibody Discovery & Protein Engineering, R&D, AstraZeneca, Cambridge, UKTechnical University of Denmark, Lyngby, DenmarkThe patent literature should reflect the past 30 years of engineering efforts directed toward developing monoclonal antibody therapeutics. Such information is potentially valuable for rational antibody design. Patents, however, are designed not to convey scientific knowledge, but to provide legal protection. It is not obvious whether antibody information from patent documents, such as antibody sequences, is useful in conveying engineering know-how, rather than as a legal reference only. To assess the utility of patent data for therapeutic antibody engineering, we quantified the amount of antibody sequences in patents destined for medicinal purposes and how well they reflect the primary sequences of therapeutic antibodies in clinical use. We identified 16,526 patent families covering major jurisdictions (e.g., US Patent and Trademark Office (USPTO) and World Intellectual Property Organization) that contained antibody sequences. These families held 245,109 unique antibody chains (135,397 heavy chains and 109,712 light chains) that we compiled in our Patented Antibody Database (PAD, http://naturalantibody.com/pad). We find that antibodies make up a non-trivial proportion of all patent amino acid sequence depositions (e.g., 11% of USPTO Full Text database). Our analysis of the 16,526 families demonstrates that the volume of patent documents with antibody sequences is growing, with the majority of documents classified as containing antibodies for medicinal purposes. We further studied the 245,109 antibody chains from patent literature to reveal that they very well reflect the primary sequences of antibody therapeutics in clinical use. This suggests that the patent literature could serve as a reference for previous engineering efforts to improve rational antibody design.https://www.tandfonline.com/doi/10.1080/19420862.2021.1892366Patentsdata miningtherapeutic Antibodies
spellingShingle Konrad Krawczyk
Andrew Buchanan
Paolo Marcatili
Data mining patented antibody sequences
mAbs
Patents
data mining
therapeutic Antibodies
title Data mining patented antibody sequences
title_full Data mining patented antibody sequences
title_fullStr Data mining patented antibody sequences
title_full_unstemmed Data mining patented antibody sequences
title_short Data mining patented antibody sequences
title_sort data mining patented antibody sequences
topic Patents
data mining
therapeutic Antibodies
url https://www.tandfonline.com/doi/10.1080/19420862.2021.1892366
work_keys_str_mv AT konradkrawczyk dataminingpatentedantibodysequences
AT andrewbuchanan dataminingpatentedantibodysequences
AT paolomarcatili dataminingpatentedantibodysequences