Summary: | Security patches play an important role in detecting and fixing one-day vulnerabilities. However, collecting abundant security patches from diverse data sources is not a simple task. This is because (1) each data source provides vulnerability information in a different way and (2) many security patches cannot be directly collected from Common Vulnerabilities and Exposures (CVE) information (<italic>e</italic>. <italic>g</italic>., National Vulnerability Database (NVD) references). In this paper, we propose a high-coverage approach that collects known security patches by tracking multiple data sources. Specifically, we considered the following three data sources: repositories (<italic>e</italic>. <italic>g</italic>., GitHub), issue trackers (<italic>e</italic>. <italic>g</italic>., Bugzilla), and Q&A sites (<italic>e</italic>. <italic>g</italic>., Stack Overflow). From the data sources, we gather even security patches that cannot be collected by considering only CVE information (<italic>i</italic>. <italic>e</italic>., previously untracked security patches). In our experiments, we collected 12,432 CVE patches from repositories and issue trackers, and 12,458 insecure posts from Q&A sites. We could collect at least four times more CVE patches than those collected in existing approaches, which demonstrates the efficacy of our approach. The collected security patches serves as a database on a public website (<italic>i</italic>. <italic>e</italic>., IoTcube) to proceed with the detection of vulnerable code clones.
|