What Should be Taught in an Academic Program of Data Sciences?
The new academic discipline of Data Sciences (DS) has been developed in recent years mainly because of the need to make decisions based on huge amounts of data -- Big Data. In parallel, there has been a huge progress in the development of technologies that enable to identify patterns, to filter big...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Bulgarian Academy of Sciences, Institute of Mathematics and Informatics
2019-09-01
|
Series: | Digital Presentation and Preservation of Cultural and Scientific Heritage |
Subjects: | |
Online Access: | https://dipp.math.bas.bg/dipp/article/view/174 |
Summary: | The new academic discipline of Data Sciences (DS) has been developed in recent years mainly because of the need to make decisions based on huge amounts of data -- Big Data. In parallel, there has been a huge progress in the development of technologies that enable to identify patterns, to filter big data, and to provide relevant meanings to information, due to machine learning and sophisticated inference techniques. The profession of Data Scientist (or Data Analyst) has become highly demanded in recent years. It is required in the business sector where data is the “oxygen” for business survival; it is needed in the governmental sector in order to improve its services to the citizens; and it is very imperative in the scientific world, where large data depositories collected in varied disciplines have to be integrated, mined and analyzed, in order to enable interdisciplinary research. The purpose of this paper is to demonstrate how the scientific discipline of Data Sciences fits into academic programs intended to prepare data analysts for the business, public, government, and academic sectors.
The article first delineates the Data Cycle, which portrays the transformation of data and their derivatives along the route from generation to decision making.
The cycle includes the following stages: problem definition identifying pertinent data sources data collection, and storing (including cleansing and backup)
data integration data mining processing and analysis visualization
learning and decision-making feedback for future cycles. Within this cycle, there might be sub cycles, where a number of stages are repeated and reiterated.
It should be noted that the data cycle is generic. It might have slight variations under various circumstances, however, there is not much difference between the scientific cycle and all the other cycles.
Each stage within the cycle requires different tools, namely hardware and software technologies that support the stage. This article classifies these tools. The final part of the article suggests a typology for academic DS programs. It outlines an academic program that will be offered to those wishing to practice the Data
Analyst profession. An introductory course that should be mandatory to all students campus-wide is sketched. |
---|---|
ISSN: | 1314-4006 2535-0366 |