Next Article in Journal
Influence of Shield Attitude Change on Shield–Soil Interaction
Next Article in Special Issue
Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies
Previous Article in Journal
Separation of N–C5H12–C9H20 Paraffins Using Boehmite by Inverse Gas Chromatography
Previous Article in Special Issue
Majority Voting Based Multi-Task Clustering of Air Quality Monitoring Network in Turkey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Air Pollution Concentration Based on mRMR and Echo State Network

1
Department of Environmental Engineering, Kyoto University, Kyoto 615-8540, Japan
2
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China
*
Authors to whom correspondence should be addressed.
Submission received: 15 March 2019 / Revised: 17 April 2019 / Accepted: 28 April 2019 / Published: 1 May 2019
(This article belongs to the Special Issue Air Quality Prediction Based on Machine Learning Algorithms)

Abstract

:
Air pollution has become a global environmental problem, because it has a great adverse impact on human health and the climate. One way to explore this problem is to monitor and predict air quality index in an economical way. Accurate monitoring and prediction of air quality index (AQI), e.g., PM2.5 concentration, is a challenging task. In order to accurately predict the PM2.5 time series, we propose a supplementary leaky integrator echo state network (SLI-ESN) in this paper. It adds the historical state term of the historical moment to the calculation of leaky integrator reservoir, which improves the influence of historical evolution state on the current state. Considering the redundancy and correlation between multivariable time series, minimum redundancy maximum relevance (mRMR) feature selection method is introduced to reduce redundant and irrelevant information, and increase computation speed. A variety of evaluation indicators are used to assess the overall performance of the proposed method. The effectiveness of the proposed model is verified by the experiment of Beijing PM2.5 time series prediction. The comparison of learning time also shows the efficiency of the algorithm.

1. Introduction

With the rapid advancement of urbanization and industrialization, air quality has deteriorated severely, which has negatively affected the quality of the living environment and even hindered economic growth in some areas [1]. In particular, the inhalable particles produced by industrial pollution have small particle size, large diffusion area and strong activity, thus they can enter the human body through the respiratory tract, which has adverse effects on human health. Therefore, the prediction of air pollutants plays a crucial role in the early warning and control of environmental pollution [2]. Therefore, modeling and forecasting the air quality index (e.g., PM2.5 concentration) has become an effective way to prevent and control air pollution, and it also provides a scientific basis for the development of effective measures [3]. The implementation of this idea can effectively reduce the health hazard of air pollution, thus achieving early warning and rational planning [4].
For a long time, many scholars have conducted in-depth research on air pollution. At the same time, a variety of predictive models have been proposed, such as autoregressive integrated moving average model [5], support vector machine [5], multiple linear regression model [6], neural networks [7,8], and so on [9]. All of them have been applied to predict air pollution concentration. Oprea et al. [10] applied artificial neural network and adaptive neuro-fuzzy inference system to predict PM2.5 concentration. Deng et al. [11] proposed heterogeneous space-time artificial neural networks to deal with spatial heterogeneity, which has been applied to predict the concentration of fine particles in Beijing–Tianjin–Hebei. Ong et al. [12] proposed a deep recurrent neural network for time series prediction, which can accurately predict the PM2.5 concentration. In order to solve the air pollution monitoring and prevention in Kunming and Yuxi, China, Li et al. [13] studied the method called co-integration flower-pollination-algorithm support vector machine for the prediction of PM10 concentration time series. In order to study the attenuation effect of haze on solar radiation scattering, Yao et al. [14] proposed a new SVM-based method that increases the accuracy of global solar radiation models by increasing variables of daily global solar radiation, sunshine hours, temperature, relative humidity, and air quality index. Reid et al. [15] selected an optimal prediction model based on 10-fold cross-validation to estimate PM2.5 concentrations during wildfires in Northern California in 2008, and reliably predicted major wildfire events. As can be seen, different types of predictive models can successfully solve different problems of air pollution time series prediction.
Because the causes of air pollution are very complicated, analyzing the main pollutants and influence variables of air quality index will lay the foundation for the establishment of predictive models. Although the methods described earlier are effective, they do not analyze the validity of the input variables, so the models are likely to contain irrelevant or redundant information. In order to build an accurate predictive model, Sun et al. [16] applied Pearson correlation coefficient to analyze the relationship between PM2.5 and other variables and selected the appropriate input variables according to the correlation order. Zhang et al. [17] investigated cross-correlations between PM2.5 and four meteorological factors based on multifractal detrended cross-correlation analysis method, which reveals the impact of meteorological variables on PM2.5 concentration. Zhu et al. [18] proposed a novel graphical causality analysis approach and analyzed the impact of meteorological and traffic variables on air quality indexes. Chen et al. [19] applied the convergent cross-mapping method to analyze the causal relationship between meteorological factors and PM2.5 concentration in Beijing–Tianjin–Hebei region, and obtained quantitative causality analysis results. Therefore, for complex air pollution problems, we also need to choose the appropriate correlation or causality analysis method to analyze the impact variables, thus achieving high-precision detection and forecasting.
At present, air pollution has become one of the major environmental problems. However, the causes of air pollution are very complicated. Taking the PM2.5 concentration as an example, its concentration is not only affected by the air pollutants such as NO2, CO, O3, and SO2, but also by meteorological variables such as temperature, pressure, humidity, wind speed and wind direction. How to choose effective information from a variety of variables for prediction is an important research topic. However, in the previous studies, the correlation between influencing factors and PM2.5 was not considered in the establishment of most predictive models. In order to solve the above problem, this paper considers using the minimum redundancy maximum relevance (mRMR) [20] feature selection method to select the appropriate input variables, which can select the most relevant information and reduce redundant information. Furthermore, considering the chaotic characteristics of variables, we use phase space reconstruction to extract evolutionary information of relevant variables. At last, the new input variables are transferred to the supplemental leaky integrator echo state network (SLI-ESN) for prediction. The improved model not only enhances the feature extraction and memory ability of the reservoir for multivariate time series, but also improves the influence of historical evolution state on the current state. In practical applications, accurate prediction is conducive to monitoring air quality and making reasonable and scientific decision-making on air pollution prevention. In order to verify the validity of the proposed method, we select the dataset of air quality index and meteorological time series to predict PM2.5 concentration in Beijing, China.
The rest of this paper are arranged as follows: Section 2 describes the preliminary knowledge of this paper, including feature selection method and echo state network; Section 3 introduces in detail the four aspects of the feature selection method, phase space reconstruction, PM2.5 prediction model and algorithm flow; In Section 4, we analyzed the experimental results of the time series prediction of PM2.5 concentration in Beijing, China. Section 5 gives conclusions and illustrates the challenges that will still be faced in the future.

2. Preliminaries

2.1. Feature Selection Method

Time series that affect air pollution are high-dimensional data, which not only contains rich information, but also has irrelevant or redundant factors. These irrelevant and redundant factors reduce the prediction accuracy and efficiency of the model. Thus, analyzing the relationship between variables and selecting valuable input variables are important for prediction.
Feature selection is the most typical data preprocessing method [21]. It consists of four parts: generation process, evaluation function, stop criterion and verification process. Currently, Common feature selection algorithms include random forest (RF), correlation feature selection (CFS), fast correlation-based filter (FCBF), mutual information (MI) [22], information gain (IG) [23], regularization models, relief-based algorithms, and genetic algorithm. However, these feature selection algorithms usually ignore the redundancy relationship between features. Random forests have over-fitting on noise regression problems and do not give continuous output. The CFS and FCBF feature selection methods are slow to calculate and cannot handle large-scale data efficiently. MI and IG do not require the type of data distribution but have high computational complexity for high-dimensional data. The regularization model has a better selection effect on high-dimensional data with a size much larger than the number of samples, but it does not perform satisfactorily in low-dimensional data selection. For the prediction of PM2.5 concentration with high noise, how to select appropriate feature selection method is very important. In this paper, the minimum redundancy maximum relevance (mRMR) [20] is used for feature selection. After this, a set of purest features can be obtained, and the redundancy features are removed while guaranteeing the maximum relevance.

2.2. Echo State Network

The echo state network (ESN) proposed by Jaeger et al. [24] is a new type of recurrent neural network. It consists of an input layer, a reservoir and an output layer. The structure of ESN is shown in Figure 1. W i n is the input weight matrix that connects the inputs and the reservoir, W is the internal connection weight matrix of the reservoir, W o u t is the output weight matrix that connects the reservoir and the output, and W b a c k is the output-to-reservoir feedback connection weight matrix, which is usually set as a zero vector.
Assuming that the system obtains a time series U ( n ) = [ u 1 ( n ) , u 2 ( n ) , , u K ( n ) ] T at time n, which is the input of the reservoir at this time. The output of the reservoir well be y i n ( n ) = U ( n + λ ) , and the prediction of different time steps is realized according to the adjustment of λ . The reservoir receives two inputs. The one is U ( n ) from the input layer and the other is x ( n 1 ) from the previous state of the reservoir. The matrixes W and W i n are randomly generated and remain unchanged during the training. The weight W is a large-scale sparse matrix, in which non-zero elements indicate activated neurons in the reservoir. The update formula of current state of reservoir is as follows:
x ( n + 1 ) = tanh ( W i n × [ 1 ; U ( n + 1 ) ] + W × x ( n ) )
where x ( n ) R L × 1 is the state of the reservoir at time n, and its initial state is a zero vector. The operator tanh ( ) is the hyperbolic tangent activation function. With the infinite growth of time, the dependence of the current state affected by the initial state of reservoir gradually decreases or even disappears [25]. The information in reservoir is transferred to the output layer with a linear connection, and the output of the network is:
y ( n ) = W o u t x ( n )
Afterwards, Jaeger et al. [26] improved the ESN and proposed the leaky integrator ESN (LI-ESN) by introducing leaky integrator neurons into the reservoir. LI-ESN is a variant of ESN, but its reservoir contains the leak integrator neurons. The update formula of the state of reservoir in LI-ESN is:
x ( n + 1 ) = a × tanh ( W i n × [ 1 ; U ( n + 1 ) ] + W × x ( n ) ) + ( 1 a ) × x ( n )
where a [ 0 , 1 ] is the leaking rate. The leaky neuron in reservoir has a leaky integration of its activation, partially remembers its previous activation, and retains the effect of the previous moment on the current state.

3. PM2.5 Time Series Prediction Model

In order to accurately predict the concentration of PM2.5, this paper proposes a PM2.5 time series prediction model. To handle high-dimensional input variables, the mRMR method is utilized to select optimal subset. Then, the phase space is reconstructed from the subset to obtain evolution information of time series. Finally, we predict PM2.5 time series based on supplemental leaky integrator echo state network (SLI-ESN) model, which is proposed in this paper.

3.1. mRMR Feature Selection Method

In this paper, mRMR method [20] is used for feature selection. Its basic idea is to maximize the correlation between input features and output, and to minimize the correlation between input features. The goal is to find the most representative feature subset. The evaluation of correlation between features is performed by mutual information method, and the appropriate feature subset is selected from original feature set. The maximum relevance and minimum redundancy of mRMR are calculated as follows:
max D ( S , C ) = 1 | S | f i S I ( f i ; C )
min R ( S ) = 1 | S | 2 f i , f k S I ( f i ; f k )
where f i and f k represent the ith and kth feature in set S, respectively. C denotes the target output. I ( f i ; C ) denotes the correlation function of the ith feature and output C. I ( f i ; f k ) represents the correlation function of the ith and kth feature. Mutual information is defined as follows:
I ( x ; y ) = P ( x , y ) log P ( x , y ) P ( x ) P ( y ) d x d y
Therefore, the expression of mRMR is as follows:
max S [ 1 | S | f i S I ( f i ; C ) 1 | S | 2 f i , f k S I ( f i ; f k ) ]
Applying the forward selection algorithm to solve the objective function (7), we can get the sort result of the input features. Finally, the optimal input feature subset can be obtained by cross-validation.

3.2. Phase Space Reconstruction

Phase space reconstruction is a very important step in time series analysis and prediction. In order to extract useful information from time series, Takens [27] proposed embedding theorem of phase space reconstruction. In the actual calculation, since the numerical differentiation is sensitive to the error. Therefore, phase space reconstruction of time series generally adopts coordinate delay method. The essence is to construct m-dimensional vector by different delay times of one-dimensional time series. Its calculation formula is as follows:
U i ( n ) = [ u i ( n ) , u i ( n + τ i ) , , u i ( n + ( m i 1 ) τ i ) ]
where { u i ( n ) | i = 1 , 2 , , K } are input time series, m i is the embedding dimension, and τ i is the delay time. The reconstructed phase space contains all the evolution information of original system.
The embedding theorem shows that for one-dimensional time series of infinitely long, noise-free m-dimensional chaotic attractors, an m-dimensional phase space can be found, as long as the embedding dimension satisfies m 2 d + 1 , where d is dimension of the dynamic system [28]. However, the existing time series are finite-length sequences with noise. The embedding dimension and delay time cannot be arbitrarily selected, otherwise the phase space quality of the reconstruction will be seriously affected.

3.3. Supplementary Leaky Integrator Echo State Network

The echo state network, including the model proposed in this paper, has the ability to memorize historical information. However, due to the unpredictability of chaotic time series, it is difficult to accurately retain the information before the current state of the reservoir for a long time. Moreover, due to the complexity of PM2.5 time series, its current concentration information may be more dependent on the most recent state. Inspired by the leaked integrator reservoir, the update of the reservoir state is strongly dependent on the historical state. In this paper, considering the impact of previous state on current state, the state update formula of reservoir neurons is improved, and SLI-ESN model is proposed. The state update formula of reservoir is as follows:
x ( n + 1 ) = a × t a n h ( W i n × [ 1 ; U ( n + 1 ) ] + W × x ( n ) ) + b × x ( n ) + ( 1 a b ) × x ( n 1 )
As with LI-ESN, the meaning of the attenuation parameter a has not changed, and its satisfy 0 < a < 1 . The parameter b is the supplement factor, and it also need to satisfy 0 < b < 1 . More importantly, like the side length of a triangle, each coefficient must be greater than 0, and satisfy a + b < 1 . Because the current state of reservoir is more dependent on the state of neighboring moment, this paper considers influence of the first two historical moments. When n 2 , supplement ( 1 a b ) × x ( n 1 ) is established. Set W b a c k = 0 , the output layer is calculated as follows:
y ( n ) = W o u t x ( n )
The calculation from reservoir to output layer is solved by ridge regression [29]. It is a biased estimation regression method for collinear data analysis. It overcomes ill-posed problem in least squares solution and prevents over-fitting problem. The expression is as follows:
W o u t = ( X T X + k I ) 1 X T Y
where k is regularization parameter, I is the identity matrix, x ( n ) and y ( n ) are column vectors of X and Y , respectively.

3.4. Algorithm Flow

In order to accurately predict the PM2.5 time series, this paper proposes the prediction structure based on feature extraction and improved echo state network model. The process is summarized as follows.
  • Feature selection: select the optimal subset from original dataset based on mRMR feature selection method.
  • Phase space reconstruction: reconstruct phase space of the selected optimal subset based on Takens’ theorem and form a new set of input features.
  • Data division: divide training set and testing set according to a certain proportion.
  • Model training: train SLI-ESN model using ridge regression algorithm on training set.
  • Prediction: predict PM2.5 time series using SLI-ESN model on testing set.
In order to briefly explain prediction process of the proposed model, the schematic diagram of the method is shown in Figure 2.

4. Results and Discussion

In order to prove the validity and practicability of the proposed method, the SLI-ESN model was applied to predict the actual observed PM2.5 time series of Beijing. At the same time, this paper also carried out comparative experiments of ESN, LI-ESN [26], extreme learning machine (ELM) [30], hierarchical ELM (H-ELM) [31] and stacked auto-encoder (SAE) [32]. The experimental environment is Windows 7 system, MATLAB 2016a programming software. The computer memory is 6 GB, clocked at 3.50 GHz, and Intel-i3 CPU.
Firstly, the AQI dataset and evaluation indicators are explained. Secondly, the feature selection experiment based on mRMR is introduced in detail, and the optimal subset is obtained. Finally, the reconstructed features of optimal subset is used for training the SLI-ESN model, and the validity of the proposed method in PM2.5 time series prediction is verified by predictive indicators.

4.1. Data Description

This paper selects 8759 samples of hourly air pollution data from January to December 2016 in Haidian District, Beijing. The dataset is from the US Embassy (Harvard University Geographic Analysis Center Dataverse), including the average concentration of PM2.5, PM10, NO2, CO, O3, and SO2 per hour, and hourly temperature (T), pressure (P), humidity (H), wind speed (WS) and wind direction (WD).
Because PM2.5 is a particulate matter that can enter the lungs, it has a great impact on human health and the quality of atmospheric environment. Therefore, predicting PM2.5 concentration can not only make an estimate of environmental quality, but also have a greater impact on better environmental governance and human health. This paper uses five predictive indicators to evaluate the prediction results of PM2.5 time series, namely root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute error (MAE), symmetric mean absolute percentage error (SMAPE) and Pearson correlation coefficient (R).
RMSE = 1 N t = 1 N [ y ^ ( t ) y ( t ) ] 2
NRMSE = 1 y ^ max y ^ min 1 N t = 1 N [ y ^ ( t ) y ( t ) ] 2
MAE = 1 N t = 1 N | y ^ ( t ) y ( t ) |
SMAPE = 1 N t = 1 N | y ( t ) y ^ ( t ) | ( | y ^ ( t ) | + | y ( t ) | ) / 2
R = t = 1 N ( y ( t ) y ¯ ( t ) ) ( y ^ ( t ) y ^ ¯ ( t ) ) t = 1 N ( y ( t ) y ¯ ( t ) ) 2 t = 1 N ( y ^ ( t ) y ^ ¯ ( t ) ) 2
where N is the number of samples, y ^ ( t ) is the predicted output, and y ( t ) is the target value.
In the above evaluation indicators, the smaller the values of the estimated indicators of RMSE, NRMSE, MAE, and SMAPE are, the better the prediction results of the model are. R = 1 indicates that y ^ ( t ) and y ( t ) are linear correlation, and R = 0 indicates there is no correlation. When R ( 0 , 1 ) , it indicates that there is a correlation, and the larger value is, the stronger the linear correlation is.

4.2. Data Processing

The optimal subset selection of original data was performed using mRMR. In this paper, the first 75% of the dataset is used for training set, and the last 25% is used for testing set. The optimal subset is selected based on training set, and each model parameter in the simulation is obtained only from the training set.
Firstly, PM2.5 time series is chosen as reference variable. Other 10-dimensional variables are used as comparison variables. The data was quantitatively analyzed using mRMR to obtain sorting results: PM2.5, CO, WS, PM10, H, WD, SO2, NO2, P, T, and O3. According to the influence of different factors on PM2.5 concentration, the correlation between different variables and PM2.5 is shown in Figure 3.
According to the mRMR method, the irrelevant and redundant variables in the original dataset are reduced. The selection result of obtained optimal subset is shown in Figure 4.
According to the prediction results in Figure 4, when the predictor is with 5 dimensions, the prediction error is the smallest, namely 9.199. Therefore, the optimal subset is PM2.5, CO, WS, PM10, and H. The phase space reconstruction is then performed on the selected optimal subset.
The delay time τ and embedding dimension m calculated by C-C method [33] are shown in Table 1. The 5 variables in the optimal subset are expressed as bold in the table. As shown in Table 1, the delay time for the obtained optimal subset is [8,8,6,4,4], and the embedding dimension is [2,2,2,4,4] for PM2.5, PM10, CO, H and WS respectively.

4.3. Experimental Results and Analysis

Phase space reconstruction is performed on the optimal subset to obtain a 14-dimensional reconstructed time series, which are used for inputs of prediction model. For the LI-ESN model, the reasonable range of the leaking rate a is (0, 1). For the SLI-ESN model, we use the cross-validation to select the two parameters a and b. According to experience, the feasible and effective range of the supplementy factor b is mainly within (0, 0.1). In this paper, ESN, LI-ESN, ELM, H-ELM and SAE are selected as comparison methods. The specific one-step (1 hour) prediction results are shown in Table 2.
It can be seen from Table 2 that the proposed method achieves better prediction results in one-step (1 hour) prediction. The one-step (1 hour) prediction result of PM2.5 concentration is shown in Figure 5. And Figure 6 plots the fit of predicted value to actual data. It can be seen from the figures that the prediction has a good linear relationship with actual value. SLI-ESN performs satisfactorily at peaks and undulating moments, which mainly depends on the full utilization of the historical state of the reservoir and the effective information obtained by phase space reconstruction.
At the same time, the simulation results of the five-step (5 hours) prediction are given in Figure 7. The prediction curve can better track original input, and the medium-term prediction effect is also good. Table 3 gives the five-step (5 hours) prediction results. The ten-step (10 hours) prediction result of SLI-ESN is shown in Figure 8. As seen in Figure 8, in some of the peaks, the prediction curve is still able to roughly fit the fluctuation trend of the original data, which is precisely because SLI-ESN makes full use of the role of historical information. Its comparison with other algorithms is shown in Table 4. The proposed algorithm achieves the optimal value on all four parameter indicators, except for SMAPE, which fully demonstrates the effectiveness of SLI-ESN in long-term prediction.
Longitudinal comparison between Table 2, Table 3 and Table 4 shows that the longer the prediction time is, the larger the error will be, which is consistent with the basic characteristics of the chaotic time series. Comparing Table 2, Table 3 and Table 4, it can be found that the recurrent neural networks in three kinds of networks (SAE as the representative of deep learning, ELM and H-ELM as the representative of feedforward neural network and ESN as the representative of recurrent neural network) have more satisfactory prediction performance. This exactly demonstrates the validity of the structure of the reservoir in time series prediction.
In order to further illustrate the performance of the proposed method, the running time of all comparison methods are shown in Table 5. The results show that the training time based on the ESN and ELM model is much smaller than the deep learning model. This is mainly because the training process of deep learning consumes a lot of time, and this time-consuming process is not required by the other neural networks. Moreover, SLI-ESN can complete training and testing within an acceptable time frame. It indicates that SLI-ESN can achieve good prediction results in both prediction accuracy and time consuming.

5. Conclusions

PM2.5 is the main source of air pollution, and predicting PM2.5 concentration is of great significance for protecting the environment. In order to improve prediction accuracy and reliability, it is important to preprocess the data to eliminate irrelevant and redundant variables before prediction. In this paper, the mRMR is used to screen original dataset to obtain optimal subset. Phase space reconstruction is performed on the optimal subset. And the reconstructed data is used as new input time series of SLI-ESN model for prediction. Experiments show the validity of the SLI-ESN model, which has high prediction accuracy in medium and long term projects, good generalization performance and good application prospects.
Although this paper has achieved the desired results, there are still some issues that need to be addressed in future work. First of all, long-term predictions are not satisfactory. We want to extend the model to longer prediction interval, such as one day, one week or one month. In addition, optimal subset selection and model optimization take a lot of time. In the future, we expect to simultaneously implement input variable selection and model optimization based on an optimization algorithm. Optimization objects include, but are not limited to, input variables, model structure, and model parameters.

Author Contributions

Conceptualization, X.X. and W.R.; methodology, X.X. and W.R.; data curation, X.X.; writing—original draft preparation, X.X. and W.R.; writing—review and editing, X.X. and W.R.

Funding

This research was funded by the National Natural Science Foundation of China (61773087).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liu, T.; Lau, A.K.H.; Sandbrink, K.; Fung, J.C.H. Time Series Forecasting of Air Quality Based On Regional Numerical Modeling in Hong Kong. J. Geophys. Res. Atmos. 2018, 123, 4175–4196. [Google Scholar] [CrossRef]
  2. Cai, S.; Wang, Y.; Zhao, B.; Wang, S.; Chang, X.; Hao, J. The impact of the “air pollution prevention and control action plan” on PM2.5 concentrations in Jing-Jin-Ji region during 2012–2020. Sci. Total Environ. 2017, 580, 197–209. [Google Scholar] [CrossRef]
  3. Li, L.; Zhang, J.H.; Qiu, W.Y.; Wang, J.; Fang, Y. An Ensemble Spatiotemporal Model for Predicting PM2.5 Concentrations. Int. J. Environ. Res. Public Health 2017, 14, 549. [Google Scholar] [CrossRef]
  4. Han, W.; Tong, L.; Chen, Y.; Li, R.; Yan, B.; Liu, X. Estimation of High-Resolution Daily Ground-Level PM2.5 Concentration in Beijing 2013–2017 Using 1 km MAIAC AOT Data. Appl. Sci. 2018, 8, 2624. [Google Scholar] [CrossRef]
  5. Wang, P.; Zhang, H.; Qin, Z.; Zhang, G. A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmos. Pollut. Res. 2017, 8, 850–860. [Google Scholar] [CrossRef]
  6. Ausati, S.; Amanollahi, J. Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM2.5. Atmos. Environ. 2016, 142, 465–474. [Google Scholar] [CrossRef]
  7. Huang, C.-J.; Kuo, P.-H. A Deep CNN-LSTM Model for Particulate Matter (PM2.5) Forecasting in Smart Cities. Sensors 2018, 18, 2220. [Google Scholar] [CrossRef]
  8. Qiao, J.; Cai, J.; Han, H.; Cai, J. Predicting PM2.5 Concentrations at a Regional Background Station Using Second Order Self-Organizing Fuzzy Neural Network. Atmosphere 2017, 8, 10. [Google Scholar] [CrossRef]
  9. Rybarczyk, Y.; Zalakeviciute, R. Machine Learning Approaches for Outdoor Air Quality Modelling: A Systematic Review. Appl. Sci. 2018, 8, 2570. [Google Scholar] [CrossRef]
  10. Oprea, M.; Mihalache, S.F.; Popescu, M. Computational intelligence-based PM2.5 air pollution forecasting. Int. J. Comput. Commun. Control 2017, 12, 365–380. [Google Scholar] [CrossRef]
  11. Deng, M.; Yang, W.; Liu, Q.; Jin, R.; Xu, F.; Zhang, Y. Heterogeneous Space–Time Artificial Neural Networks for Space–Time Series Prediction. Trans. GIS 2018, 22, 183–201. [Google Scholar] [CrossRef]
  12. Ong, B.T.; Sugiura, K.; Zettsu, K. Dynamically pre-trained deep recurrent neural networks using environmental monitoring data for predicting PM 2.5. Neural Comput. Appl. 2016, 27, 1553–1566. [Google Scholar] [CrossRef] [PubMed]
  13. Li, W.; Kong, D.; Wu, J. A New Hybrid Model FPA-SVM Considering Cointegration for Particular Matter Concentration Forecasting: A Case Study of Kunming and Yuxi, China. Comput. Intel. Neurosci. 2017, 2017, 2843651. [Google Scholar] [CrossRef] [PubMed]
  14. Yao, W.; Zhang, C.; Hao, H.; Wang, X.; Li, X. A support vector machine approach to estimate global solar radiation with the influence of fog and haze. Renew. Energy 2018, 128, 155–162. [Google Scholar] [CrossRef]
  15. Reid, C.E.; Jerrett, M.; Petersen, M.L.; Pfister, G.G.; Morefield, P.E.; Tager, I.B.; Raffuse, S.E.; Balmes, J.R. Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning. Environ. Sci. Technol. 2015, 49, 3887–3896. [Google Scholar] [CrossRef]
  16. Sun, W.; Sun, J. Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. J. Environ. Manag. 2017, 188, 144–152. [Google Scholar] [CrossRef]
  17. Zhang, C.; Ni, Z.; Ni, L. Multifractal detrended cross-correlation analysis between PM2.5 and meteorological factors. Physica A 2015, 438, 114–123. [Google Scholar] [CrossRef]
  18. Zhu, J.Y.; Zhang, C.; Zhang, H.; Zhi, S.; Li, V.O.; Han, J.; Zheng, Y. pg-causality: Identifying spatiotemporal causal pathways for air pollutants with urban big data. IEEE Trans. Big Data 2018, 4, 571–585. [Google Scholar] [CrossRef]
  19. Chen, Z.; Xie, X.; Cai, J.; Chen, D.; Gao, B.; He, B.; Cheng, N.; Xu, B. Understanding meteorological influences on PM2.5 concentrations across China: a temporal and spatial perspective. Atmos. Chem. Phys. 2018, 18, 5343–5358. [Google Scholar] [CrossRef]
  20. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
  21. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  22. Brown, G.; Pocock, A.; Zhao, M.J.; Luján, M. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 2012, 13, 27–66. [Google Scholar]
  23. Uğuz, H. A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Syst. 2011, 24, 1024–1032. [Google Scholar] [CrossRef]
  24. Jaeger, H.; Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 2004, 304, 78–80. [Google Scholar] [CrossRef]
  25. Ozturk, M.C.; Xu, D.; Príncipe, J.C. Analysis and design of echo state networks. Neural Comput. 2007, 19, 111–138. [Google Scholar] [CrossRef]
  26. Jaeger, H.; Lukoševičius, M.; Popovici, D.; Siewert, U. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 2007, 20, 335–352. [Google Scholar] [CrossRef]
  27. Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence; Springer: Berlin, Heidelberg, 1981; pp. 366–381. [Google Scholar]
  28. Han, M.; Ren, W.; Xu, M.; Qiu, T. Nonuniform state space reconstruction for multivariate chaotic time series. IEEE T. Cybern. 2019, 49, 1885–1895. [Google Scholar] [CrossRef]
  29. Løkse, S.; Bianchi, F.M.; Jenssen, R. Training echo state networks with regularization through dimensionality reduction. Cogn. Comput. 2017, 9, 364–378. [Google Scholar] [CrossRef]
  30. Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B-Cybern. 2012, 42, 513–529. [Google Scholar] [CrossRef]
  31. Tang, J.; Deng, C.; Huang, G.B. Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 809–821. [Google Scholar] [CrossRef]
  32. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  33. Kim, H.; Eykholt, R.; Salas, J.D. Nonlinear dynamics, delay times, and embedding windows. Physica D 1999, 127, 48–60. [Google Scholar] [CrossRef]
Figure 1. The basic structure of echo state network.
Figure 1. The basic structure of echo state network.
Applsci 09 01811 g001
Figure 2. Schematic diagram of time series prediction based on supplementary leaky integrator echo state network (SLI-ESN). mRMR: minimum redundancy maximum relevance.
Figure 2. Schematic diagram of time series prediction based on supplementary leaky integrator echo state network (SLI-ESN). mRMR: minimum redundancy maximum relevance.
Applsci 09 01811 g002
Figure 3. Correlation between different variables and PM2.5.
Figure 3. Correlation between different variables and PM2.5.
Applsci 09 01811 g003
Figure 4. Results of optimal subset selection.
Figure 4. Results of optimal subset selection.
Applsci 09 01811 g004
Figure 5. PM2.5 concentration one-step (1 hour) prediction curve.
Figure 5. PM2.5 concentration one-step (1 hour) prediction curve.
Applsci 09 01811 g005
Figure 6. PM2.5 concentration one-step (1 hour) predicted output fit curve.
Figure 6. PM2.5 concentration one-step (1 hour) predicted output fit curve.
Applsci 09 01811 g006
Figure 7. PM2.5 concentration five-step (5 hours) prediction curve.
Figure 7. PM2.5 concentration five-step (5 hours) prediction curve.
Applsci 09 01811 g007
Figure 8. PM2.5 concentration ten-step (10 hours) prediction curve.
Figure 8. PM2.5 concentration ten-step (10 hours) prediction curve.
Applsci 09 01811 g008
Table 1. Phase space reconstruction parameters of variables. Abbreviations: hourly temperature (T), pressure (P), humidity (H), wind speed (WS), and wind direction (WD).
Table 1. Phase space reconstruction parameters of variables. Abbreviations: hourly temperature (T), pressure (P), humidity (H), wind speed (WS), and wind direction (WD).
VariablesPM2.5PM10NO2COO3SO2TPHWSWD
τ 884646412446
m22324262443
Table 2. Comparison of one-step (1 hour) prediction results. Abbreviations: root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute error (MAE), symmetric mean absolute percentage error (SMAPE), Pearson correlation coefficient (R), echo state network (ESN), leaky integrator ESN (LI-ESN), extreme learning machine (ELM), hierarchical ELM (H-ELM), stacked auto-encoder (SAE), supplementary leaky integrator echo state network (SLI-ESN).
Table 2. Comparison of one-step (1 hour) prediction results. Abbreviations: root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute error (MAE), symmetric mean absolute percentage error (SMAPE), Pearson correlation coefficient (R), echo state network (ESN), leaky integrator ESN (LI-ESN), extreme learning machine (ELM), hierarchical ELM (H-ELM), stacked auto-encoder (SAE), supplementary leaky integrator echo state network (SLI-ESN).
MethodsRMSENRMSEMAESMAPER
ESN10.20200.01606.69480.10530.9936
LI-ESN9.70630.01526.23220.10010.9943
ELM11.73300.01847.19020.10820.9914
H-ELM14.15200.02228.05750.11020.9876
SAE32.17000.050520.18400.27640.9448
SLI-ESN9.39530.01475.84470.08940.9945
Table 3. Comparison of five-step (5 hours) prediction results.
Table 3. Comparison of five-step (5 hours) prediction results.
MethodsRMSENRMSEMAESMAPER
ESN43.12430.067729.38790.37770.8907
LI-ESN41.15740.064627.66110.35770.9027
ELM45.76170.071829.78920.37260.8678
H-ELM49.91590.078332.19770.39740.8368
SAE51.98470.081634.54870.40390.8394
SLI-ESN37.68740.059125.58710.33920.9108
Table 4. Comparison of ten-step (10 hours) prediction results.
Table 4. Comparison of ten-step (10 hours) prediction results.
MethodsRMSENRMSEMAESMAPER
ESN72.37230.229950.59280.56200.6858
LI-ESN70.57000.294449.36280.55690.6895
ELM71.66190.242947.10850.53850.6582
H-ELM66.79320.276046.15290.50030.7053
SAE71.77580.363149.78440.56170.6623
SLI-ESN65.71080.196646.36330.54430.7314
Table 5. Comparison of running time on six methods (second).
Table 5. Comparison of running time on six methods (second).
MethodsOne-step (1 hour)Five-step (5 hours)Ten-step (10 hours)
Training TimeTesting TimeTraining TimeTesting TimeTraining TimeTesting Time
ESN0.11450.02380.11380.02130.12810.0226
LI-ESN0.11390.02250.16480.03070.50860.0418
ELM0.06240.03120.06240.03120.06240.0312
H-ELM0.42920.10390.48940.11600.17480.0696
SAE143.13210.0292139.40190.0268145.44500.0816
SLI-ESN3.10610.17303.31480.16694.37050.2193

Share and Cite

MDPI and ACS Style

Xu, X.; Ren, W. Prediction of Air Pollution Concentration Based on mRMR and Echo State Network. Appl. Sci. 2019, 9, 1811. https://0-doi-org.brum.beds.ac.uk/10.3390/app9091811

AMA Style

Xu X, Ren W. Prediction of Air Pollution Concentration Based on mRMR and Echo State Network. Applied Sciences. 2019; 9(9):1811. https://0-doi-org.brum.beds.ac.uk/10.3390/app9091811

Chicago/Turabian Style

Xu, Xinghan, and Weijie Ren. 2019. "Prediction of Air Pollution Concentration Based on mRMR and Echo State Network" Applied Sciences 9, no. 9: 1811. https://0-doi-org.brum.beds.ac.uk/10.3390/app9091811

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop