Let’s compare the weather in winter versus other seasons.
compare the weather pattern in winter and other seasonsTemperature, wind speed and humidity are all lower in winter, but not by a large amount.
Now, let’s look at the relationship of each of these with the PM 2.
Effect of temperature, wind speed, and humidity on PM 2.
5 levelHigher temperature (which disrupts the temperature inversion effect), wind speed and humidity have a positive correlation with the pollution level.
Effect of wind on PM 2.
5 level in winterOn windy days, the pollution is clearly better.
The median of the distribution for PM 2.
5 levels is lower on windy days compared to on days without wind.
In fact, the pollution level also depends on the wind direction, as seen in this plot.
I selected only four major wind directions for simplicity.
5 relationship with the wind direction in winterOn the days where the wind comes from the south, the pollution level is lower likely because the Thai gulf is to the south of Bangkok.
The clean ocean wind improves the air quality.
Wind from the other three directions pass overland.
However, having any wind is better than the stagnant atmospheric conditions on calm days.
The shift in the median PM 2.
5 level is smaller between rainy days and days with no rain.
There are fewer rainy days during the winter season, so the data is somewhat noisy, but a difference can be observed in the cumulative density function.
Traffic IndexOne of the sources of PM 2.
5 particles is car engine exhaust.
While campaigning for more public transportation usage is in general good for the environment, the effectiveness toward reducing PM 2.
5 pollution is unclear.
Here is why.
traffic index and air pollutionWe have seen that PM 2.
5 levels are related to the time of day.
The pollution is lower around 3 pm, but remains high during the night time.
When plotted against the traffic data, the relationship with the pollution level is very noisy.
There does not seem to be a strong correlation.
5 relationship with traffic indexIncluding the time of day and weekday versus weekend information into the model might make the relationship more clear.
Autoregression processThe current PM 2.
5 value can also depend on the previous value.
The partial autocorrelation plot below shows a strong correlation at 1 hr time lag, which means the PM 2.
5 level is an autoregression process.
Thus I include the 24 hour average values in the model, with the restriction that the model is only allowed to see the previous value for future predictions.
The importance of this feature should be directly related to how long the particles stay in the atmosphere.
Machine Learning ModelThe picture below show a dedrogram of all input features calculated from Spearman correlation.
The dendrogram helps to identify redundant features that can be removed from the model.
The number of fires within various distances and the level of PM 2.
5 are closely related.
Other features are further away.
I ended up using all of these features in the model.
dendrogram of inputfeaturesTo identify the major contributions to the pollution, I used a random forest regression to fit the model because of its simplicity and ease of interpretation.
During hyper-parameter tuning, 25% of the data was allocated for the validation set.
The model was retrained again using the entire dataset.
The model achieves 0.
99 R-squared on the training set.
Since the purpose of this study is to understand the sources of the air pollution in the past, I focused on the training set.
The plot below ranks the importance of each contributing factor.
The importance is calculated from the decrease in the R-squared values upon permuting the column, and re-normalizing the sum of all columns.
feature of importanceAs expected the previous pollution level is the most important predictor.
This is followed by the number of fires from the closest to the furthest.
The number of fires as far away as 720 km has more influence on the air quality than the local humidity, traffic, or even rain.
The hour of day is a more important predictor than the traffic index.
Among the weather features, humidity is the most important feature.
The influence of each feature is illustrated below using a tree interpreter for the data on Jan 13, 2019 at 8 am with 96 PM 2.
model interpretation on Jan 23, 2019We start with the average value of 26.
The PM 2.
5 level for the previous hour was 62, thus the model adds a value 20.
There were 150 fires within a 240 km radius, thus the model adds 10 to the pollution level.
The value is now 56.
There are 1649 fires between 240-480 km, and 896 fires between 480-720 km, and the model adds a value of 9 and 8 respectively.
The low wind speed and the morning rush hour (8 am) adds 8 to the model.
These six top factors account for 81 out of the total 96 predicted for the PM 2.
The remaining features to the right are less important and thus increase the predicted pollution value less.
model interpretation on Feb 2, 2019On a good day such as Feb 2, 2019 at 7 pm the PM 2.
5 level was 10.
The pollution level in the hour before was low, thus the model subtracts a value of 10.
There were still a lot of fires in the area, and the model adds a value of 2.
The wind speed was high, reducing the value by 2.
The weather and traffic were good.
The combination of many factors results in a low predicted PM 2.
5 level of 10.
ConclusionsThe PM 2.
5 level has a complex relationship with various factors: number of fires, weather patterns, and traffic.
But this analysis confirms the suspicion that many people have — agricultural burning is the root cause of PM 2.
5 pollution in Thailand.
Burning activities as far as 720 km away from Bangkok, an area which extends into Myanmar, Laos, and Cambodia, can cause air problems in Bangkok.
Solving this problem will not be easy.
It will require a collaborative international effort among the Southeast Asian countries.
I leave you with a fire map from March 17, 2019, one of the worst days ever!.