Identification-Method Research for Open-Source Software Ecosystems

In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and...

Full description

Bibliographic Details
Main Authors: Zhifang Liao, Ningwei Wang, Shengzong Liu, Yan Zhang, Hui Liu, Qi Zhang
Format: Article
Language:English
Published: MDPI AG 2019-02-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/11/2/182
_version_ 1798037983309332480
author Zhifang Liao
Ningwei Wang
Shengzong Liu
Yan Zhang
Hui Liu
Qi Zhang
author_facet Zhifang Liao
Ningwei Wang
Shengzong Liu
Yan Zhang
Hui Liu
Qi Zhang
author_sort Zhifang Liao
collection DOAJ
description In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework.
first_indexed 2024-04-11T21:34:00Z
format Article
id doaj.art-8554db0e8d1446d6bf7d9276b960a875
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-04-11T21:34:00Z
publishDate 2019-02-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-8554db0e8d1446d6bf7d9276b960a8752022-12-22T04:01:49ZengMDPI AGSymmetry2073-89942019-02-0111218210.3390/sym11020182sym11020182Identification-Method Research for Open-Source Software EcosystemsZhifang Liao0Ningwei Wang1Shengzong Liu2Yan Zhang3Hui Liu4Qi Zhang5School of Software, Central South University, Changsha 410075, ChinaSchool of Software, Central South University, Changsha 410075, ChinaDepartment of Information Management, Hunan University of Finance and Economics, Changsha 410075, ChinaDepartment of Computing, School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow G4 0BA, UKDepartment of Computer Science, Missouri State University, Springfield, MO 65897, USASchool of Software, Central South University, Changsha 410075, ChinaIn recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework.https://www.mdpi.com/2073-8994/11/2/182software engineeringsymmetryopen-source-software ecosystemsidentificationsimilarity
spellingShingle Zhifang Liao
Ningwei Wang
Shengzong Liu
Yan Zhang
Hui Liu
Qi Zhang
Identification-Method Research for Open-Source Software Ecosystems
Symmetry
software engineering
symmetry
open-source-software ecosystems
identification
similarity
title Identification-Method Research for Open-Source Software Ecosystems
title_full Identification-Method Research for Open-Source Software Ecosystems
title_fullStr Identification-Method Research for Open-Source Software Ecosystems
title_full_unstemmed Identification-Method Research for Open-Source Software Ecosystems
title_short Identification-Method Research for Open-Source Software Ecosystems
title_sort identification method research for open source software ecosystems
topic software engineering
symmetry
open-source-software ecosystems
identification
similarity
url https://www.mdpi.com/2073-8994/11/2/182
work_keys_str_mv AT zhifangliao identificationmethodresearchforopensourcesoftwareecosystems
AT ningweiwang identificationmethodresearchforopensourcesoftwareecosystems
AT shengzongliu identificationmethodresearchforopensourcesoftwareecosystems
AT yanzhang identificationmethodresearchforopensourcesoftwareecosystems
AT huiliu identificationmethodresearchforopensourcesoftwareecosystems
AT qizhang identificationmethodresearchforopensourcesoftwareecosystems