Identification-Method Research for Open-Source Software Ecosystems
In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-02-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/11/2/182 |
_version_ | 1798037983309332480 |
---|---|
author | Zhifang Liao Ningwei Wang Shengzong Liu Yan Zhang Hui Liu Qi Zhang |
author_facet | Zhifang Liao Ningwei Wang Shengzong Liu Yan Zhang Hui Liu Qi Zhang |
author_sort | Zhifang Liao |
collection | DOAJ |
description | In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework. |
first_indexed | 2024-04-11T21:34:00Z |
format | Article |
id | doaj.art-8554db0e8d1446d6bf7d9276b960a875 |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-04-11T21:34:00Z |
publishDate | 2019-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-8554db0e8d1446d6bf7d9276b960a8752022-12-22T04:01:49ZengMDPI AGSymmetry2073-89942019-02-0111218210.3390/sym11020182sym11020182Identification-Method Research for Open-Source Software EcosystemsZhifang Liao0Ningwei Wang1Shengzong Liu2Yan Zhang3Hui Liu4Qi Zhang5School of Software, Central South University, Changsha 410075, ChinaSchool of Software, Central South University, Changsha 410075, ChinaDepartment of Information Management, Hunan University of Finance and Economics, Changsha 410075, ChinaDepartment of Computing, School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow G4 0BA, UKDepartment of Computer Science, Missouri State University, Springfield, MO 65897, USASchool of Software, Central South University, Changsha 410075, ChinaIn recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework.https://www.mdpi.com/2073-8994/11/2/182software engineeringsymmetryopen-source-software ecosystemsidentificationsimilarity |
spellingShingle | Zhifang Liao Ningwei Wang Shengzong Liu Yan Zhang Hui Liu Qi Zhang Identification-Method Research for Open-Source Software Ecosystems Symmetry software engineering symmetry open-source-software ecosystems identification similarity |
title | Identification-Method Research for Open-Source Software Ecosystems |
title_full | Identification-Method Research for Open-Source Software Ecosystems |
title_fullStr | Identification-Method Research for Open-Source Software Ecosystems |
title_full_unstemmed | Identification-Method Research for Open-Source Software Ecosystems |
title_short | Identification-Method Research for Open-Source Software Ecosystems |
title_sort | identification method research for open source software ecosystems |
topic | software engineering symmetry open-source-software ecosystems identification similarity |
url | https://www.mdpi.com/2073-8994/11/2/182 |
work_keys_str_mv | AT zhifangliao identificationmethodresearchforopensourcesoftwareecosystems AT ningweiwang identificationmethodresearchforopensourcesoftwareecosystems AT shengzongliu identificationmethodresearchforopensourcesoftwareecosystems AT yanzhang identificationmethodresearchforopensourcesoftwareecosystems AT huiliu identificationmethodresearchforopensourcesoftwareecosystems AT qizhang identificationmethodresearchforopensourcesoftwareecosystems |