Summary: | Cancer is known to be a genetic disease. Due to the inherent complexity of cancer, large-scale genomic datasets are necessary to study this disease. In this project, the goal is to firstly download publically available data about cancer and integrate them into a database. The architecture of the database was designed according to the structured nature of the data and convenience of usage. As such, the design of the database adopts a data warehousing strategy. The database design and queries will be discussed and further developments like adopting a distributed systems approach and combining both SQL and NoSQL capabilities have been briefly mentioned.
|