Summary: | In the rapidly evolving landscape of cancer drug research, cellular lines serve as invaluable tools for understanding drug-sensitive and drug-resistant tumors. The computational representation of cellular lines is usually based on genomic profiling, even though this method cannot be applied in a large scale. This study introduces a novel approach to the computational representation of cellular lines using text mining techniques. By meticulously extracting and analyzing textual data from the scientific literature, we developed a computational representation of these cellular lines. Our methodology encompassed advanced Natural Language Processing (NLP) for text extraction and machine learning models for predictive analysis. We achieved a comprehensive description of each cellular line. To validate our findings, we generated a distance matrix for all cellular lines, leading to the construction of a dendrogram representing cellular line relationships. This dendrogram shows a resemblance with the established cell line ontology from CLO. Our results bridge the gap between cellular line representation and text mining, offering a robust computational model that can significantly impact cancer drug research.
|