Movie Box Office Prediction Based on Multi-Model Ensembles

This paper is based on the box office data of films released in China in the past, which was collected from ENDATA on 30 November 2021, providing 5683 pieces of movie data, and enabling the selection of the top 2000 pieces of movie data to be used as the box office prediction dataset. In this paper,...

Full description

Bibliographic Details
Main Authors: Yuan Ni, Feixing Dong, Meng Zou, Weiping Li
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/13/6/299
_version_ 1827659839948455936
author Yuan Ni
Feixing Dong
Meng Zou
Weiping Li
author_facet Yuan Ni
Feixing Dong
Meng Zou
Weiping Li
author_sort Yuan Ni
collection DOAJ
description This paper is based on the box office data of films released in China in the past, which was collected from ENDATA on 30 November 2021, providing 5683 pieces of movie data, and enabling the selection of the top 2000 pieces of movie data to be used as the box office prediction dataset. In this paper, some types of Chinese micro-data are used, and a Baidu search of the index data of movie names 30 days before and after the release date, coronavirus disease 2019 (COVID-19) data in China, and other characteristics are introduced, and the stacking algorithm is optimized by adopting a two-layer model architecture. The first layer base learners adopt Extreme Gradient Boosting (XGBoost), the Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), the Gradient Boosting Decision Tree (GBDT), random forest (RF), and support vector regression (SVR), and the second layer meta-learner adopts a multiple linear regression model, to establish a box office prediction model with a prediction error, Mean Absolute Percentage Error (MAPE), of 14.49%. In addition, in order to study the impact of the COVID-19 epidemic on the movie box office, based on the data of 187 movies released from January 2020 to November 2021, and combined with a number of data features introduced earlier, this paper uses LightGBM to establish a model. By checking the importance of model features, it is found that the situation of the COVID-19 epidemic at the time of movie release had a certain related impact on the movie box office.
first_indexed 2024-03-09T23:29:56Z
format Article
id doaj.art-1d12350973a44368afc20e06b1a3a173
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-09T23:29:56Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-1d12350973a44368afc20e06b1a3a1732023-11-23T17:10:02ZengMDPI AGInformation2078-24892022-06-0113629910.3390/info13060299Movie Box Office Prediction Based on Multi-Model EnsemblesYuan Ni0Feixing Dong1Meng Zou2Weiping Li3School of Economics and Management, Beijing Information Science and Technology University, Beijing 100192, ChinaComputer School, Beijing Information Science and Technology University, Beijing 100192, ChinaComputer School, Beijing Information Science and Technology University, Beijing 100192, ChinaSchool of Software & Microelectronics, Peking University, Beijing 102600, ChinaThis paper is based on the box office data of films released in China in the past, which was collected from ENDATA on 30 November 2021, providing 5683 pieces of movie data, and enabling the selection of the top 2000 pieces of movie data to be used as the box office prediction dataset. In this paper, some types of Chinese micro-data are used, and a Baidu search of the index data of movie names 30 days before and after the release date, coronavirus disease 2019 (COVID-19) data in China, and other characteristics are introduced, and the stacking algorithm is optimized by adopting a two-layer model architecture. The first layer base learners adopt Extreme Gradient Boosting (XGBoost), the Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), the Gradient Boosting Decision Tree (GBDT), random forest (RF), and support vector regression (SVR), and the second layer meta-learner adopts a multiple linear regression model, to establish a box office prediction model with a prediction error, Mean Absolute Percentage Error (MAPE), of 14.49%. In addition, in order to study the impact of the COVID-19 epidemic on the movie box office, based on the data of 187 movies released from January 2020 to November 2021, and combined with a number of data features introduced earlier, this paper uses LightGBM to establish a model. By checking the importance of model features, it is found that the situation of the COVID-19 epidemic at the time of movie release had a certain related impact on the movie box office.https://www.mdpi.com/2078-2489/13/6/299stacking predictorbox office predictorimpact of COVID-19 epidemic
spellingShingle Yuan Ni
Feixing Dong
Meng Zou
Weiping Li
Movie Box Office Prediction Based on Multi-Model Ensembles
Information
stacking predictor
box office predictor
impact of COVID-19 epidemic
title Movie Box Office Prediction Based on Multi-Model Ensembles
title_full Movie Box Office Prediction Based on Multi-Model Ensembles
title_fullStr Movie Box Office Prediction Based on Multi-Model Ensembles
title_full_unstemmed Movie Box Office Prediction Based on Multi-Model Ensembles
title_short Movie Box Office Prediction Based on Multi-Model Ensembles
title_sort movie box office prediction based on multi model ensembles
topic stacking predictor
box office predictor
impact of COVID-19 epidemic
url https://www.mdpi.com/2078-2489/13/6/299
work_keys_str_mv AT yuanni movieboxofficepredictionbasedonmultimodelensembles
AT feixingdong movieboxofficepredictionbasedonmultimodelensembles
AT mengzou movieboxofficepredictionbasedonmultimodelensembles
AT weipingli movieboxofficepredictionbasedonmultimodelensembles