Machine learning and statistical models for analyzing multilevel patent data

Abstract A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country’s healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Mul...

Full description

Bibliographic Details
Main Authors: Sunyun Qi, Yu Zhang, Hua Gu, Fei Zhu, Meiying Gao, Hongxiao Liang, Qifeng Zhang, Yanchao Gao
Format: Article
Language:English
Published: Nature Portfolio 2023-08-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-37922-3
_version_ 1797576642853339136
author Sunyun Qi
Yu Zhang
Hua Gu
Fei Zhu
Meiying Gao
Hongxiao Liang
Qifeng Zhang
Yanchao Gao
author_facet Sunyun Qi
Yu Zhang
Hua Gu
Fei Zhu
Meiying Gao
Hongxiao Liang
Qifeng Zhang
Yanchao Gao
author_sort Sunyun Qi
collection DOAJ
description Abstract A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country’s healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Multicollinearity was carefully detected and removed by using the variable selection method and LASSO regression, respectively. The Poisson model and the negative binomial model were proposed to analyze the patent data. Three goodness of fit tests, the Pearson test, the deviance test, and the DHARMa non-parametric dispersion test, were conducted to investigate if the model has a good fit. After discovering four clusters by conducting agglomerative hierarchical clustering, these two models were replaced by the negative binomial mixed model. The likelihood ratio test was used to determine which model is more appropriate and the results reveal that the negative binomial mixed model outperforms both the Poisson model and the negative binomial model. Three variables, number of health technicians per 10,000 people, financial expenditure on science and technology as well as number of patent applications per 10,000 health personnel, have a significantly positive relationship with the number of patents in Chinese tertiary public hospitals.
first_indexed 2024-03-10T21:55:45Z
format Article
id doaj.art-a9a0b02c1e344a9883da66364583afee
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-10T21:55:45Z
publishDate 2023-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-a9a0b02c1e344a9883da66364583afee2023-11-19T13:08:06ZengNature PortfolioScientific Reports2045-23222023-08-0113111010.1038/s41598-023-37922-3Machine learning and statistical models for analyzing multilevel patent dataSunyun Qi0Yu Zhang1Hua Gu2Fei Zhu3Meiying Gao4Hongxiao Liang5Qifeng Zhang6Yanchao Gao7Zhejiang Provincial Center for Medical Science Technology and Education DevelopmentLeuven Statistics Research Centre, Faculty of Science, KU Leuven (Katholieke Universiteit Leuven)Zhejiang Provincial Center for Medical Science Technology and Education DevelopmentZhejiang Provincial Center for Medical Science Technology and Education DevelopmentZhejiang Provincial Center for Medical Science Technology and Education DevelopmentDepartment of Public Utilities Management, Faculty of Humanities and Management, Zhejiang Chinese Medical UniversityZhejiang Provincial Center for Medical Science Technology and Education DevelopmentZhejiang Provincial Center for Medical Science Technology and Education DevelopmentAbstract A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country’s healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Multicollinearity was carefully detected and removed by using the variable selection method and LASSO regression, respectively. The Poisson model and the negative binomial model were proposed to analyze the patent data. Three goodness of fit tests, the Pearson test, the deviance test, and the DHARMa non-parametric dispersion test, were conducted to investigate if the model has a good fit. After discovering four clusters by conducting agglomerative hierarchical clustering, these two models were replaced by the negative binomial mixed model. The likelihood ratio test was used to determine which model is more appropriate and the results reveal that the negative binomial mixed model outperforms both the Poisson model and the negative binomial model. Three variables, number of health technicians per 10,000 people, financial expenditure on science and technology as well as number of patent applications per 10,000 health personnel, have a significantly positive relationship with the number of patents in Chinese tertiary public hospitals.https://doi.org/10.1038/s41598-023-37922-3
spellingShingle Sunyun Qi
Yu Zhang
Hua Gu
Fei Zhu
Meiying Gao
Hongxiao Liang
Qifeng Zhang
Yanchao Gao
Machine learning and statistical models for analyzing multilevel patent data
Scientific Reports
title Machine learning and statistical models for analyzing multilevel patent data
title_full Machine learning and statistical models for analyzing multilevel patent data
title_fullStr Machine learning and statistical models for analyzing multilevel patent data
title_full_unstemmed Machine learning and statistical models for analyzing multilevel patent data
title_short Machine learning and statistical models for analyzing multilevel patent data
title_sort machine learning and statistical models for analyzing multilevel patent data
url https://doi.org/10.1038/s41598-023-37922-3
work_keys_str_mv AT sunyunqi machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata
AT yuzhang machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata
AT huagu machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata
AT feizhu machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata
AT meiyinggao machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata
AT hongxiaoliang machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata
AT qifengzhang machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata
AT yanchaogao machinelearningandstatisticalmodelsforanalyzingmultilevelpatentdata