Now you can tell everybody that you’re doing machine learning and you’ll be the coolest nerd at the nerd party, while your mum is concerned about rogue artificial intelligence and machine uprising.
Model ValidationTraining machine learning models is easy.
The hard part is training them well.
In order to do so, we need an appropriate algorithm, a reliable reference map, and enough of CPU/memory resources.
Even then the results might not be what you hope for, so the validation of the classifier by calculating confusion matrices and other validation metrics is absolutely crucial if you want to add credibility to your work.
Confusion MatrixConfusion matrices are the first thing to look at in order to evaluate the model performance.
They show the number of correctly and incorrectly predicted labels for each label from the reference map or vice versa.
Usually, a normalised confusion matrix is shown, where all the values in the rows are divided by their collective sum.
This shows if the classifier has a bias towards wrongly classifying certain types of land cover.
Two aspects of viewing the normalised confusion matrix of a trained model.
For most of the classes, the model seems to perform well.
Problems occur for some of the classes due to the unbalanced training set.
We see that the problematic cases are, for example, shrubland and water, where true pixels, belonging to these classes, can be misidentified as other classes.
On the other hand, what is predicted as shrubland or water is in good agreement with the reference map.
In the image below we notice that the problems generally occur for the under-represented classes, so one should keep in mind that these statements are only superficial, since the training sample in this example is relatively small and serves only as a proof of principle.
Frequency of pixels for each class in the training dataset.
In general, the distribution is not uniform.
Receiver Operating Characteristic — ROC CurveClassifiers predict the labels with a certain confidence, but the threshold for the prediction of a certain label can be manipulated.
The ROC curve shows the diagnostic ability of a classifier as its discrimination threshold is varied.
Usually, it is shown for a binary system, but it can also be used in multiclass classifications, where we calculate the “one vs.
rest” characteristics for each represented class.
The x-axis shows the false positive rate (we want it to be small), and the y-axis shows the true positive rate (we want it to be large) at different thresholds.
Good classifier performance is characterised by a curve, which has a large integral value, also known as the area under curve (AUC).
The same conclusion about the underrepresented classes based on the plot below can be made for shrubland, while the ROC curve for water looks much better, because water is more easily distinguishable, even if underrepresented.
ROC curves of the classifier, represented as “one vs.
rest” for each class in the dataset.
Numbers in brackets are the area-under-curve (AUC) values.
Feature ImportanceA deeper insight into the intricate workings of the classifier can be obtained by looking at the feature importance, which tells us which features contributed the most to the final results.
Some machine learning algorithms, like the one we use in this example, return these values as an output, while for others, these values have to be calculated separately.
The plot below shows the importance of each feature at each specific date.
Feature importance map for the features used in this classification.
While other features (e.
NDVI) in spring are generally more important, we see that there is a specific date when one of the bands (B2 — blue) is the most important feature.
Taking a closer look, it turns out that the area at the time was covered with snow.
It seems that the snow coverage unveils information about the underlying texture, which helps the classifier determine the correct land cover type.
However, one should keep in mind that this fact is specific to the AOI that you are observing and generally cannot be relied upon.
A part of this AOI consisting of 3×3 EOPatches covered with snow.
Prediction ResultsWith the validation performed, we now understand the strengths and weaknesses of our prediction model.
If we are not satisfied with the situation, it is possible to tweak the workflow and try again.
Once the model is optimised, we define a simple EOTask which accepts the EOPatch and the classifier model, makes the prediction and applies it to the patch.
Sentinel-2 image (left), ground truth (centre) and prediction (right) for a random EOPatch in the selected AOI.
Some differences are visible, which is mostly due to the application of the negative buffer on the reference map, otherwise the agreement is satisfactory for this use case.
From this point on, the path is clear.
Repeat the procedure for all EOPatches.
You can even export the predictions as GeoTIFF images, and merge them with gdal_merge.
We also uploaded the merged GeoTIFF to our CloudGIS Geopedia portal, so you can view the results in greater detail here: https://www.
world/#T244Screenshot of the land cover prediction for Slovenia 2017 using the approach shown in this blog post, available for detailed browsing in the CloudGIS Geopedia portal (https://www.
You can also compare official land use data with automatically classified land cover data.
Note the land use and land cover difference, which represents a common challenge in ML processes — it is not always easy to map from classes used in official registers to classes that can be observed in nature.
To illustrate this challenge, we show two airports in Slovenia.
The first one is in Levec, near the city of Celje.
This sports airfield is small, mostly used for private aeroplanes, and it is covered with grass.
The official land use data marks the grass-covered landing strip as artificial surface, while the classifier is able to correctly predict the land cover as grassland, as shown below.
Sentinel-2 image (left), ground truth (centre) and prediction (right) for the area around the small sports airfield Levec, near Celje, Slovenia.
The classifier correctly recognises the landing strip as grassland, which is marked as artificial surface in the official land use data.
On the other hand, looking at the Ljubljana Jože Pučnik Airport, the largest airport in Slovenia, the areas marked as artificial surface in the official land use data are tarmac runway areas and road networks.
In this case, the classifier is able to recognise the built-up areas, while still correctly identifying the grassland and cultivated land in the surrounding sites.
Sentinel-2 image (left), ground truth (centre) and prediction (right) for the area around the Ljubljana Jože Pučnik Airport, the largest airport in Slovenia.
The classifier recognises the tarmac runway and the road network, while still correctly identifying grassland and cultivated land in the surrounding area.
Voilà! Now you know how to make a reliable prediction on the country scale!.Make sure you put that in your CV.
We hope that these two blogs provide enough information for you to try to do it yourself.
Using the exemplary Jupyter Notebook, you should be able to perform land classification for just about any area in the world, assuming you have some (reliable) reference data.
We are looking forward to getting any comments, ideas, and examples from you on how the process could be further improved.
We also plan to publish Part 3, the last part of this series, where we will show how to experiment with the ML workflow and try to improve the results! Additionally, we will openly share all EOPatches for Slovenia 2017 — that’s right, you heard correctly, the whole dataset, containing Sentinel-2 L1C data, s2cloudless cloud masks, reference data, etc.
, so everyone can try it out!We cannot wait to see how you apply your own methods inside of eo-learn.