Summary: | Multivariate time series clustering has become an important research topic in the task of time series analysis. Compared with univariate time series, the research of multivariate time series is more complex and difficult. Although many clustering algorithms for multivariate time series have been proposed, these algorithms still have difficulties in solving the accuracy and interpretation at the same time. Firstly, most of the current work does not consider the length redundancy and variable correlation of multivariable time series, resulting in large errors in the final similarity matrix. Secondly, the data are commonly used in the clustering process with the division paradigm, when the numerical space presents a complex distribution, this idea does not perform well, and it does not have the explanatory power of each variable and space. To address the above problems, this paper proposes a multivariate time series adaptive weight density clustering algorithm using Shapelet (high information-rich continuous subsequence) space (MDCS). This algorithm firstly performs a Shapelet search for each variable, and obtains its own Shapelet space through an adaptive strategy. Then, it weights the numerical distribution generated by each variable to obtain a similarity matrix that is more consistent with the characteristics of data distribution. Finally, the data are finally allocated using the shared nearest neighbor density peak clustering algorithm with improved density calculation and secondary allocation. Experimental results on several real datasets demonstrate that MDCS has better clustering results compared with current state-of-the-art clustering algorithms, with an average increase of 0.344 and 0.09 in the normalized mutual information and Rand index, balancing performance and interpretability.
|