How to use open source satellite data for your investigative reporting

How to use open source satellite data for your investigative reportingtechjournalismBlockedUnblockFollowFollowingMar 6Undoubtedly a vast goldmine of information — perhaps too vast — many journalists still shy away from working with satellite data extensively.

Since the emergence of high-resolution data provided from firms like DigitalGlobe and Planet Labs, the supply of satellite stories has mushroomed.

But open source data — a valid and timely source for stories, nonetheless, despite lower resolution— remains under-leveraged.

With an untouched mountain of data at hand, many either fear to miss the forest — or story in this case — for the trees or to misinterpret.

Both valid concerns.

Today, we will try to tackle some of the reservations that you might share with your colleagues and teach some basics in accessing, understanding and handling open source satellite data.

Recent technical commentary on the execution of remote sensing exercises within and outside newsrooms attempted to demystify how to take advantage of open source satellite data platforms.

Some attempted to explain how to streamline the process of gathering data.

Few drawn viable connections between feasible stories and technical capabilities.

This tutorial attempts to challenge this notion.

We will walk you through some basic examples — from a beginner to a more advanced technical level.

Understanding how satellite imagery works:Different satellites send different images to earth.

Distinction includes resolution (how crisp the images are), the number and types of bands they produce and how frequently they are updated.

Resolution mattersFinding the balance between resolution, capabilities of bands and availability: Credit: Mark Corcoran, Reuters Institute Fellowship Paper, University of OxfordHow frequently available?Credit: Mark Corcoran, Reuters Institute Fellowship Paper, University of OxfordWhat are spectral bands?Not a new musical genre: Spectral bands determine what type of analysis you can do with the data; Credit: Mark Corcoran, Reuters Institute Fellowship Paper, University of OxfordSentinel 2 satellite image data, for instance, comes in the form of thirteen difference spectral bands, ranging from visible and near infrared to the shortwave infrared, featuring four spectral bands at 10 meters, six bands at 20 meters and three bands at 60 meters spatial resolution.

Consider these bands as a form of binoculars that allow you to spot things that otherwise would remain hidden in the data.

The right mix of these bands is key.

Various scripts — different band combos if you will — can be run on the data (either on your local machine or on Sentinel hub).

If you are reporting on a broad basis, you want to make yourself familiar with those different combinations and what they achieve for you, as it might come in handy at times when other outlets miss the beat.

Tutorial: What do you need from here?Python 3.

6An adequate Tif reader (if you want to download raster files)Jupyter notebook and various python packagesA free account for Sentinel hub (find a description in the python tutorial)2.

Searching for a story using the Sentinel Hub browser tool:If you are an affectionate technologist, you might be slightly turned off by the idea of using a browser application.

But do hear me out.

For exploring and investigating, the EO Browser is a decent option (if you want to step back even further, ‘Sentinel Playground’ features fewer satellites but offers a slightly easier way to explore).

Predecessor and other open source satellite platforms may offer limited options in using python in the workflow.

Sentinel Hub runs some useful options in this regard.

Also, there is no need to download the whole raster tiles in order to do something interesting (arguably, it’s rare that investigations require all the tile data at the same time).

Instead, it allows to zoom in on specific areas.

Here is a list of data the EO Browser provides and a rationale to use them:Description of EO Browser dataLet’s get going and indulge in a satellite frenzy.


BeginnerTracking wildfires:Detection and reporting of the sudden proliferation and destruction of wildfires that spiraled last year when record flames raged across the U.


state of California.

It might not be the last time.

Such fires are feasible to re-emerge in the near future, experts claim.

Freely available go-to sources constitute Landsat 8 data — kindly provided with the help of U.


Geological Survey — as well as Sentinel-2 data.

Sentinel-2 — offering higher resolution imagery than its open source colleagues in the visible and infrared part of the spectrum — is well up to the task to monitor vegetation, soil and water cover, inland waterways and coastal areas.

ChallengeGo to EO Browser — sign up and login (it's free)Select Sentinel-2Narrow down the data collection by limiting cloud coverage to 30%.

Spot Wildfires in the U.


state of California, that climaxed between July and August of 2018 (they are so comprehensively pronounced state-wide, you should not have problems to spot plumes of clouds)Possible examples of 2018 fires:Natchez Fire (July 20, 2018): 41.

956°N 123.

551°WCarr Fire (July 28, 2018): 40.

6543°N 122.

6236°WMendocino Complex Fire (July 29, 2018): 39.

243283°N 123.

103367°WFerguson Fire (July 14, 2018): 37.

652°N 119.

881°WNext, we want to render a specific band combination to see more clearly where the action on the ground is happening.

copy ‘Wildfire script’:The Wildfire script was kindly provided by Pierre Markuse.

Insert it into the ‘Custom’ section (under the Visualisation tab) where the ‘</>’ is exhibited (next to the hand button).

Locating California wildfires (August 2018)Interesting angle: How successful are firefighters to contain/isolate firesExample by Pierre Markuse: Fire retardant caused by firefighting planes https://flic.

kr/p/Wt8VzoIf you successfully found a wildfire in the specified time range, you should spot yellow-red ish blotches.

Importantly: do not interpret these as flames.

Despite showing it, you should tell your audience that what can be seen are not actual fires but a mere IR overlay — that, to some degree, lines up with the active fires and hot spots.

ChallengeNew wildfires recently sprawled around Mount Kenya (long, lat: -0.

152739, 37.

309095) burning nearly a week at the time of writing.

If you have time, discover those fires, apply the script as before and investigate.

MSIOther band combination can be used to illustrate potential areas at risk of wildfires.

Dryness of vegetation is one of such indicators.

Moisture stress index — or MSI – can reveal such dry areas — and aid in something called ‘a fire hazard condition analysis’.

The index is inverted relative to other water vegetation indices.

The higher the value, the greater the water stress level (and the less water content there is).

Give it a try and follow the same procedure with a different band script and see what you can retrieve.

MSI script:Now let's work with Python:In order to use Sentinel Hub services, you need a Sentinel Hub account (sign up for free here https://www.


com/, if you haven’t done already).

Login to Sentinel Hub Configurator.

A configuration with an instance ID (alpha-numeric code of length 36) will already exist.

For this tutorial, it is recommended that you create a new configuration (via “Add new configuration”) and set the configuration to be based on ‘Python scripts template’.

Write down your configuration’s instance ID and paste it into the INSTANCE_ID variable declaration:All requests require a bounding box to be given as an instance of sentinelhub.


BBox with corresponding Coordinate Reference System (sentinelhub.



We will use WGS84 and we can use the predefined WGS84 coordinate reference system from sentinelhub.



Now we simply provide a URL address of a JS evalscript (a number of other clever scripts are available on this dedicated page).

Let’s select the fire script again and provide its URL as a value of parameter CustomUrlParam.


Python output: Downloaded Sentinel-2 image with the provided wildfire JS evalscriptNBRAnother custom script, to specifically detect blight from fires is NBR — short for normalized burn ratio — (link to the script here).

If you report on the post-mortem status of a large fire, this could aid your analysis and coverage.

Further explanation of NBR hereChallengeLocate burned vegetation with the NBR script.

Python output: Downloaded Sentinel-2 image with the provided NBR JS evalscript2.

Intermediate — becoming more investigativeRelying on the fire script to locate brigadier General Suheil al-Hassan’s hideout — ‘one of Syria’s most notorious warlords’ (post by The Atlantic)The general with the frightening nickname ‘The Tiger’ is the helmsman of Qawat Al-Nimr a.


a Tiger Forces, a Russian backed campaign and an elite formation of the Syrian Arab Army which functions primarily as an offensive unit in the Syrian Civil War.

In Syria to retake eastern Ghouta, a recent operation executed by the Tiger Forces killed, according to the Violations Documentation Center (VDC), at least 600 civilians — of whom at least 100 were children.

To find out where Suheil al-Hassan was hiding back in 2016, we perform a typical act of intelligence work and start by taking a look at the following video:We can spot whiffs of smoke wafted towards the left.

On another sequence, we see the hideout the General operated out of.

The Hideout (left) of Brigadier General Suheil al-Hassan, a.


a the Tiger — “If you’re not with God then you’re with the devil.

Be on the side of God so that God will be with you,” should Hassan reportedly have said at a more recent campaign on the edge of eastern GhoutaSmoke clouds from nearby Aleppo thermal plantFrom the video, we learn that it was shot for ‘Palmyra Battle Against ISIS’.

This could be the Aleppo Thermal Plant.

Confirmation: Google earth pics for the power plant show the extent of destruction after the fire raged (burned out circles on the right)A simple web search provides us with a level of clarification:Search on Google for ‘Aleppo thermal plant’.

The Wikipedia link supplies us with long/lat of the Thermal Power Plant.

Next, go to Google Earth or Google Maps and enter the coordinates you found: ‘36°10′30″N 37°26′22″E’.

What you will see is a set of burned out towers to the right of the plant.

On EO Brower, turn on roads and enter the longitude and latitude (36.

175000, 37.

439444) from the Google maps result (into the search window of the EO Browser).

In our case, we are interested in February 16 of 2016 (2016–02–16), for which we witness wondrous smoke plumes.

Smoke plums move to the leftNext, we proceed as before and apply the fire-script to visualize fires in Sentinel-2 imagery (Challenge: if you feel confident, do it in your Python environment, alternatively analog, within your EO Browser window).

Now that we have a better understanding, we can infer the location of the Tiger’s hideout at the time of the fire.

Al-Hassan hideout (left) and Aleppo thermal plant (right) on google mapsTo confirm our suspicion, we can check Google maps satellite images and learn that the hideout has since been bombed.

Benjamin Strick, an open source expert for conflict, security, arms, and digital forensics (who also suggested this example) explains that it does help to show which of the towers were on fire at the time.

Later imagery of Al-Hassan in the plant would have confirmed: those four towers were on fire that day.

Spotting specific details in places from space has its merits.

One, in fact, within the field of human rights.

A recent investigation showed that satellite images can help reveal slavery from space.

Doreen Boyd, director of the data program at the Rights Lab at the University of Nottingham in the United Kingdom estimates, that one-third of all slavery would be visible from space — whether in the form of scars of kilns or illegal mines or the outlines of transient fish processing camps (arguably, high-resolution commercial images may be better suited for this kind of investigation).


Advanced — running an algorithmWater level extractionLet us assume you are reporting on distressed water levels, maybe covering a conflict that resulted from such (tension and fighting resulting from scarce water situations are becoming increasingly likely, according to recent research and covered by an Economist special report).

A Jupyter notebook* was composed to detect levels of water bodies, using Sentinel-2 multi-spectral and multi-temporal imagery.

We will run a water detection algorithm in python and extract surface water level for a single reservoir in a given time interval.

Here is what you will do:Defining geometries of a few waterbodiesPreparing and executing the full workflow for water detection: Downloading Sentinel-2 data (true color and NDWI index) using SentinelHub services and cloud detection using the s2cloudless cloud detector, and finally detect waterVisualizing water bodies and the water level over a period of timeFiltering out cloudy scenes to improve the resultsWhat do you need?`eo-learn` — https://github.

com/sentinel-hub/eo-learn`Water Observatory Backend` — https://github.

com/sentinel-hub/water-observatory-backendBasic terminal/file setup:As in the previous example: earlier, in order to run it, you will also need a Sentinel Hub account.

You can create a free trial account at Sentinel Hub webpage.

Once you have the account set up, login to Sentinel Hub Configurator.

By default, you will already have the default configuration with an instance ID (alpha-numeric code of length 36).

For this tutorial, we recommend that you create a new configuration ("Add new configuration") and set the configuration to be based on Python scripts template.

Such configuration will already contain all layers used in these examples.

Otherwise, you will have to define the layers for your configuration yourself.

After you have prepared a configuration please put configuration’s instance ID into sentinelhub package's configuration file following the configuration instructions.

Set your Python working environment by loading the following Python Libraries.

Make sure you run the Python virtual environment as instructed above.

Obtaining geometries for waterbodiesLet’s use the Theewaterskloof Dam, in South Africa as an example — a substantial water reserve supplying the precious resource to a large chunk of the 4 million dwellers in Cape Town.

It is the largest dam in the Western Cape Water Supply System and can run low during droughts.

There are signs of increased consciousness for water shortages.

How to cover such a topic shows this example.

In the case of Theewaterskloof Dam — or any other large waterbody across the planet — you can easily obtain geometries via the BlueDot Water Observatory API.

By searching for a specific waterbody, you can copy the ID number in the URL in order to access the nominal geometry of the corresponding waterbody (i.


number 38538 in url https://water.


com/38538/2019-02-05)The BlueDot Water ObservatoryPython code to download geometries:Now we need a bounding box for this geometry, in order to download Sentinel-2 data.

We define a bounding box and inflate it a little bit in order to construct a BBox object which is used with Sentinel Hub services.

The BBox class also accepts the coordinate system (CRS), where we use the same one as in the case of the geometry (which is WGS84).

Plotting the BBox and the geometryPreparing/executing the full workflow for water detectionSentinel Hub services are installed with eo-learn.

It is an open-source earth observation processing framework for machine learning in Python, which provides seamless access and abilities to process spatiotemporal image sequences acquired by any satellite fleet.

eo-learn works as a workflow — where a workflow consists of one or multiple tasks.

Each tasks achieves a specific job (downloading data, calculating band combinations, etc.

) on a small patch of an area, called EOPatch.

EOPatch is a container for EO and non-EO data.

Let’s define a workflow to download and obtain the necessary data for water detection.

We will download the RGB bands in order to actually visualize the true-color image of the waterbody.

Additionally, we will download the NDWI band combination (Normalized Difference Water Index), which we will use for water detection.

It is defined asFormula for Normalized Difference Water Indexwhere B3 and B8 are the green and near-infrared Sentinel-2 bands, respectively.

Next: Definitions of some custom tasks that will be used in the workflowInitializations of EOTasks:Output: Finished loading model, total used 170 iterationsOutput: CPU times: user 3min 9s, sys: 14.

7 s, total: 3min 24sWall time: 3min 23sStructure of the `EOPatch`Check structure by typingInput: eopatchLet’s now visualize the first few true-color images of the selected waterbody in the given time series.

We see below that some images contain clouds, which causes problems in proper water level detection.

Plot the NDWI to see how the water detector traces the waterbody outline:Plotting of Normalized Difference Water IndexPlot true-color images with the detected water outlines:Clear as day: comparing true water levels with Theewaterskloof Dam dam’s outlinePlotting the detected water levelsYou should see a lot of fluctuations in the data due to cloud interference (in grey, cloud coverage is plotted.

It shares the same dates as the water level outliers).

Let us now set a threshold for the maximum cloud coverage of 2 % and filter out the dates which correspond to cloudy scenes.

This is done by filtering out the dates which have a value of eopatch.

scalar['COVERAGE'] larger than 0.


A lot less vacillatingVoilà, there you have it.

Water-levels hit a historic three-year low in mid-2018 but convalesced since.

Conclusion:Still hungry for more satellite image analysis?.I covered some basic techniques related to economic nightlight analysis here(which served as a proxy for economic growth).

99 other ideas for application were pooled by Gisgeography.

com and listed below.

Knock yourself out.

The tutorial was composed by Ben Heubl, an investigative journalist with kind support from Matic Lubej, Data Scientist at @sinergise, Pierre Markuse, a remote sensing evangelist and Benjamin Strick, an open-source investigator for the BBC and instructor with the EUArms workshops.


. More details

Leave a Reply