Summary: | Depending on the scanning mode, existing short text stream clustering methods can be divided into the following two kinds of methods: one-pass-based and batch-based. The one-pass-based method handles each text only one time, but cannot deal with the sparseness problem very well. The batch-based method obtains better results by allowing multiple iterations of each batch, but the efficiency is relatively low. To overcome these problems, this paper presents Lifelong learning Augmented Short Text stream clustering method (LAST), which incorporates the episodic memory module and sparse experience replay module of lifelong learning into the clustering process. Specifically, LAST processes each text one time, but at a certain interval it randomly samples some previously seen texts of the episodic memory to update cluster features by performing sparse experience replay. Empirical studies on two public datasets demonstrate that the performance of the LAST-based method is on a par with the batch-based method, and runs close to the speed of the one-pass-based method.
|