Summary: | Air pollution is one of the most pressing modern-day issues in cities around the world. However, most cities have adopted air quality measurement devices that only measure the past pollution levels without paying attention to the influencing factors. To obtain preliminary pollution information with regard to environmental factors, we developed a variational autoencoder and feedforward neural network-based embedded generative model to examine the relationship between air quality and the effects of environmental factors. In the model, actual <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>S</mi><msub><mi>O</mi><mn>2</mn></msub></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><msub><mi>O</mi><mn>2</mn></msub></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>M</mi><mrow><mn>2.5</mn></mrow></msub></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>M</mi><mrow><mn>10</mn></mrow></msub></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>C</mi><mi>O</mi></mrow></semantics></math></inline-formula> measurements from 2016 to 2020 were used, which were assembled from 15 differently located ground monitoring stations in Ulaanbaatar city. A wide range of weather and fuel measurements were used as the data for the influencing factors, and were collected over the same period as the air pollution data were recorded. The prediction results concerned all measurement stations, and the results were visualized as a spatial–temporal distribution of pollution and the performance of individual stations. A cross-validated <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula> was used to estimate the entire pollution distribution through the regions as <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>S</mi><msub><mi>O</mi><mn>2</mn></msub></mrow></semantics></math></inline-formula>: 0.81, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>M</mi><mrow><mn>2.5</mn></mrow></msub></mrow></semantics></math></inline-formula>: 0.76, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>M</mi><mrow><mn>10</mn></mrow></msub></mrow></semantics></math></inline-formula>: 0.89, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>C</mi><mi>O</mi></mrow></semantics></math></inline-formula>: 0.83. Pearson’s chi-squared tests were used for assessing each measurement station, and the contingency tables represent a high correlation between the actual and model results. The model can be applied to perform specific analysis of the interdependencies between pollution and environmental factors, and the performance of the model improves with long-range data.
|