Summary: | One of the most difficult and time-consuming aspects of machine learning is gathering high-quality data that can be used to train the algorithm. Historical data is a gold mine for predicting the future with a high level of confidence and accuracy in a certain field. Data is typically available in its raw form, which is in most circumstances unsuitable for machine learning applications. The work presented here introduces a tool that will considerably assist in the generation of a high-quality dataset, starting with raw data collection and progressing on to data pre-processing and validation, and finally to prediction using the selected machine learning algorithm. So far, we have built capability in the SDPTool to collect data dynamically from a GitHub repository. Data pre-processing, various defect predictions, and other machine learning settings are all in the pipeline. Once the data is ready to use, SDPTool will assist in selecting the appropriate machine learning algorithm and performs the desired prediction in the associated field. The entire process will be orchestrated in a user-friendly Java Swing application, from data collection to prediction.
|