Summary: | Memory-based recommender systems with m users and n items typically require O(mn) space to store the rating information. In item-based collaborative filtering (CF) algorithms, the feature vector of each item has length m,and it takes O(m) time to compute the similarity between two items using the Pearson or cosine distances. In this paper, we propose an efficient CF algorithm based on a new measure called the M-distance, which is defined as the difference between the average ratings of two items. In the initialization stage, we compute the average ratings of items and store them in two vectors, which requires O(m) space. Scanning the rating dataset then takes O(mn) time. In the online prediction stage, a threshold δ is employed to identify similar items. To predictp ratings, our algorithm requires O(np) time compared with the O(mnp) time of the cosine-based kNN algorithm. Experiments are undertaken on four well-known datasets, and the proposed M-distance is compared with the cosine-based kNN, Pearson-based kNN, and Slope One methods. Our results show that the new algorithm is significantly faster than the conventional techniques, especially for large datasets, and that its prediction ability is no worse in terms of the mean absolute error and root mean square error.
|