Machine learning and statistical models for analyzing multilevel patent data
Abstract A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country’s healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Mul...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-08-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-37922-3 |
_version_ | 1797576642853339136 |
---|---|
author | Sunyun Qi Yu Zhang Hua Gu Fei Zhu Meiying Gao Hongxiao Liang Qifeng Zhang Yanchao Gao |
author_facet | Sunyun Qi Yu Zhang Hua Gu Fei Zhu Meiying Gao Hongxiao Liang Qifeng Zhang Yanchao Gao |
author_sort | Sunyun Qi |
collection | DOAJ |
description | Abstract A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country’s healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Multicollinearity was carefully detected and removed by using the variable selection method and LASSO regression, respectively. The Poisson model and the negative binomial model were proposed to analyze the patent data. Three goodness of fit tests, the Pearson test, the deviance test, and the DHARMa non-parametric dispersion test, were conducted to investigate if the model has a good fit. After discovering four clusters by conducting agglomerative hierarchical clustering, these two models were replaced by the negative binomial mixed model. The likelihood ratio test was used to determine which model is more appropriate and the results reveal that the negative binomial mixed model outperforms both the Poisson model and the negative binomial model. Three variables, number of health technicians per 10,000 people, financial expenditure on science and technology as well as number of patent applications per 10,000 health personnel, have a significantly positive relationship with the number of patents in Chinese tertiary public hospitals. |
first_indexed | 2024-03-10T21:55:45Z |
format | Article |
id | doaj.art-a9a0b02c1e344a9883da66364583afee |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-10T21:55:45Z |
publishDate | 2023-08-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-a9a0b02c1e344a9883da66364583afee2023-11-19T13:08:06ZengNature PortfolioScientific Reports2045-23222023-08-0113111010.1038/s41598-023-37922-3Machine learning and statistical models for analyzing multilevel patent dataSunyun Qi0Yu Zhang1Hua Gu2Fei Zhu3Meiying Gao4Hongxiao Liang5Qifeng Zhang6Yanchao Gao7Zhejiang Provincial Center for Medical Science Technology and Education DevelopmentLeuven Statistics Research Centre, Faculty of Science, KU Leuven (Katholieke Universiteit Leuven)Zhejiang Provincial Center for Medical Science Technology and Education DevelopmentZhejiang Provincial Center for Medical Science Technology and Education DevelopmentZhejiang Provincial Center for Medical Science Technology and Education DevelopmentDepartment of Public Utilities Management, Faculty of Humanities and Management, Zhejiang Chinese Medical UniversityZhejiang Provincial Center for Medical Science Technology and Education DevelopmentZhejiang Provincial Center for Medical Science Technology and Education DevelopmentAbstract A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country’s healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Multicollinearity was carefully detected and removed by using the variable selection method and LASSO regression, respectively. The Poisson model and the negative binomial model were proposed to analyze the patent data. Three goodness of fit tests, the Pearson test, the deviance test, and the DHARMa non-parametric dispersion test, were conducted to investigate if the model has a good fit. After discovering four clusters by conducting agglomerative hierarchical clustering, these two models were replaced by the negative binomial mixed model. The likelihood ratio test was used to determine which model is more appropriate and the results reveal that the negative binomial mixed model outperforms both the Poisson model and the negative binomial model. Three variables, number of health technicians per 10,000 people, financial expenditure on science and technology as well as number of patent applications per 10,000 health personnel, have a significantly positive relationship with the number of patents in Chinese tertiary public hospitals.https://doi.org/10.1038/s41598-023-37922-3 |
spellingShingle | Sunyun Qi Yu Zhang Hua Gu Fei Zhu Meiying Gao Hongxiao Liang Qifeng Zhang Yanchao Gao Machine learning and statistical models for analyzing multilevel patent data Scientific Reports |
title | Machine learning and statistical models for analyzing multilevel patent data |
title_full | Machine learning and statistical models for analyzing multilevel patent data |
title_fullStr | Machine learning and statistical models for analyzing multilevel patent data |
title_full_unstemmed | Machine learning and statistical models for analyzing multilevel patent data |
title_short | Machine learning and statistical models for analyzing multilevel patent data |
title_sort | machine learning and statistical models for analyzing multilevel patent data |
url | https://doi.org/10.1038/s41598-023-37922-3 |
work_keys_str_mv | AT sunyunqi machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata AT yuzhang machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata AT huagu machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata AT feizhu machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata AT meiyinggao machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata AT hongxiaoliang machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata AT qifengzhang machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata AT yanchaogao machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata |