One of these things is not like the others: geological anomalies and anomaly detection with dataA data scientist and a geologist discuss multi-disciplinary approachesYhana LucasBlockedUnblockFollowFollowingApr 10Shaan Hurley, FlickrIn geoscience, we regularly talk about looking for anomalies in data as indicators of potential mineralisation.
But what do we define as an anomaly, and how can data scientists and geologists collaborate to generate new insights?Geologist Holly Bridgwater and data scientist, Dr Jess Robertson recently caught up for a chat about how to describe anomalies in way that would make sense for someone looking at geoscience data from a generalist data science perspective.
Read on to see their take — or watch the original video here.
Note: this discussion was prompted by the Explorer Challenge, a global competition to use data to improve the minerals exploration process — the video makes some specific references to that, but this article generalises a bit more.
Holly:Today we are going to host a session to address the question ‘What is an anomaly and how do you detect it in data?’ Our aim is to help data scientists get a handle on geological data.
I’m going to play the role of the geologist, asking frustrating, poorly framed data science questions, and Jess will be able to translate that for you, hopefully in a meaningful way.
Fingers crossed!So what we thought we’d do is focus on three specific types of anomalies which are really common to see in geological data.
Physical characteristicsHolly: First we’re going to talk about the physical characteristics of rocks, and specifically geophysical data.
As a geologist what I’m looking for in this type of data is something that indicates to me a physical differences in the rock.
As an example, certain types of ore bodies, like sulfide, are more dense and more conductive than the surrounding rocks — so I go and look for signals that might indicate to me more dense or more conductive rocks.
How I do that is firstly I get someone to collect data, usually an airborne geophysical survey, probably magnetics and gravity.
Then, I get someone much smarter than me to do some nice maths on the data (inversions) and generate a number of gridded images that are typically coloured; red as high magnetic intensity, blue as low.
Geology is just paint by numbers then, right? (Flickr)Still Holly:I look in those images for structures and trends that may indicate something different is going on.
I might be looking for a particular change in rock type or physical characteristics, or I might be looking for a structure or chemical change which is indicated by a difference in density.
Jess:I guess I’d start by noting that you just talked about five or six different steps with this data.
There’s the raw data that you get off the instrument.
There’s some processing of that data — for example, if it was magnetics, you want to remove the effect of the Earth’s magnetic field so that you’re just looking at what the local geology is contributing to that field.
Then, you mentioned inversion and you also mentioned pretty images which I think are two separate steps.
Holly:I guess a geologist’s basic description of why we do inversions is that we’re recording raw measurements from the Earth which are causing the signal to come back in a certain way — but what we actually want to know is what it’s telling us.
During the inversion process we apply some prior knowledge to help us determine that.
A savvy exploration geo, before doing a survey would provide the geophysicists with some characteristics of the rock that you already know.
So if you already know about the rock types and some of their characteristics you can give those to geophysicists and they can generate a better model that honours your known geology.
Jess:So, coming at this from a data science perspective, essentially the thing that we don’t do very well in exploration [note, Jess also worked as a geologist], which we should do better at, is understanding the background as well as trying to understand the anomalies.
Obviously, we know we’re looking for mineralization.
We want to find that big spike of sulfide that’s going to make us all rich.
But sometimes we don’t spend enough time actually understanding what that background actually looks like and what the natural variability of the rock is — so that then we can threshold properly and say “this thing is actually an anomaly”.
One of the advantages that we think data science will be able to bring to exploration is potentially taking a larger look at some of all public data sets and trying to understand what natural variability there is.
Then when you’re presenting your anomalous target for minerals you can say, well actually we looked at all of the greenstones in South Australia and found they have these kind of geophysical signals and this thing is just different to that.
So being able to provide that kind of bigger data or higher level view is a bonus.
Holly:How difficult is it for data scientists when giving them raw geophysical data with no context?Jess:Part of a data scientist’s role is to create the context from the data!.The things that we would identify as ‘useful context’ might not be quantitative enough or scale to all of the data.
So a data scientist is going to take a ‘more is more’ approach, where they try to think of a superset of all possible things that might be ‘useful context’, and then learn which ones are actually useful from the data.
There’s still a lot of things that you can kind of pull out of that data without having to do a full inversion.
Just to clarify some of the language, when I would talk about an inversion, coming from a physics background I would expect that we have some governing equations that describe how the physical properties of the rock lead to the signals that we see in the field.
When I’m doing an inversion I’m basically trying to go from the field back to those physical properties.
This works well when you’re just looking at one or two fields.
Where it becomes really hard is when you’re trying to actually stack a whole bunch of other data on top of there as well.
For example, we don’t have any governing equations that tell us simultaneously how rock’s magnetic and density properties plus the geochemistry plus the mineralogy plus the structure invert.
You put all of those things together and that system of equations is just too complicated to invent properly.
This is where there’s a specific advantage for machine learning, because potentially with machine learning we don’t need to solve that inversion problem — but we can actually look for patterns instead.
This is kind of a data fusion approach.
Or, you can do that conversion and you can start to constrain the inversion with some other data sets; that’s also a really valuable way to go and can actually then feed into a machine learning model as well.
So, again lots of different ways to apply machine learning in geology.
GeochemistryHolly:So, let’s move on to my favourite thing — geochemistry!The first thing to point out with regards to geochemistry is that when we talk data we’re talking about a lab assay result or some real-time chemical reading or indicator of what the elements are within your rocks.
Often geochemistry databases are from drilling, but the density and spacing of drillholes will depend on if you‘re drilling out a resource or doing regional exploration.
So due to the possible sparsity of the data I imagine it’s really quite difficult from a data science point of view.
What we look for in a geochemical anomaly is any chemical change that is indicative of potential mineralization.
Yhana (Unearthed editor) to Holly: “which of these rock photos can be a good geochemistry example image?” Holly: “All rocks are awesome!.[*blank stare*]… fine, the purple sparkly one.
” (Flickr)Still Holly:My process of going through geochemical data is firstly to segment that data.
I know that if I look at the data in bulk I’m unlikely to see any meaningful trends; I’m probably going to see different rock types but I need to separate those different rock types out into separate populations so that I can see if the actual anomalies stand out within those separate datasets.
So, I segment that data out into different rock types and even into potential subsets for alteration types.
Then I can really clearly see what are the anomalous points in thereIf we’re looking for a new deposit, we know we’re looking for something different but we don’t know exactly what the chemical signature might be.
It could be an enrichment of some elements, but it could also be a depletion, so really we’re just looking for something that stands out as different in a certain area.
Jess:This is this is where there’s a real opportunity to connect the chemistry data that you’ve got directly to a geological process actually forming the deposit.
Holly hasn’t mentioned this yet, but what a lot of geochemistry is about is trying to understand how different elements behave under each deposit-forming process.
For example, you might have a bunch of hot fluid moving through some rock and what that’ll do is change all of the minerals in that rock and you will see some elements enriched — like silica.
Minerals like plagioclase are altered into clay minerals, or lose the silica, which is transported elsewhere by that hot moving fluid.
When you can see that kind of operation that’s an indication that there’s been a hydrothermal process acting there.
You might be thinking about hydrothermal deposits like gold or some kind of copper.
Holly:As geologists we learn through our education of what unaltered and altered rocks look like, chemically-speaking.
Breaking it down really basically, we can consider sedimentary, metamorphic, and igneous rock.
Geologists understand the different chemistries of each in their natural state.
They’ll also know what elemental indicators can potentially show up if they’ve been altered.
We’ve learned about those physical systems.
So we apply that knowledge when we’re looking for evidence of economic mineralization.
Jess:There can be such a broad range of elements to look at.
Each element behaves differently in these alteration processes, but groups of elements behave similarly, so there’s actually a lot of information that you can get about the kind of processes that are happening by being able to compare groups of similar elements together.
Geologists have a whole bunch of language around things like hydrothermal elements.
Often what you can start to do is take ratios of elements — and it’s actually the ratio of the elements that changes, giving you this real big indication of… things.
Just as a technical point, you don’t want to be just using the raw element of data outside of the chemistry.
The key thing about all of those results is that each result needs to add to 100%, you can’t have more than 100% silica in a rock for example.
So you need to basically be looking at using log ratio transforms or something similar to actually transform that data into a space where you would start to apply your traditional clustering methods.
Geologists use tools like ioGas to do that.
If you’re doing this in machine learning you’d probably want to look at implementing this yourself.
It’s actually really straightforward to do.
Jess says it’s really straightforward.
I’m sure it’s fine.
(Flickr)StructuresHolly:Our last anomaly type to talk about is structures.
What we’re talking about in structures are features in the crust that may have been involved with mineralization.
These are things like faults, breccias, veins, which may have facilitated mineralization occurring.
You might think of them like traps or sinks.
Basically there’s a reason why this certain feature has allowed fluid to flow along it or has created a chemical or physical barrier — so we look for those kind of features in the data.
Jess:I mean, if you think about it you’re really looking for places in the Earth’s crust where elements can be concentrated and there’s a few ways that they can do that.
We can have sort of transport pathways.
With those hot flowing fluids we mentioned earlier, they can pass through faults that are acting as preferential pathway.
In those cases, you’re more likely to be bringing all of those elements together around that kind of fault, so that makes it a good target for trying to find some kind of mineralization.
Imagine for example that you start off with a fluid deep in the crust and you bring it up along a fault.
While deep in the crust it’s sitting in chemical equilibrium with all the rocks around it, but as you bring it up along that fault it comes into contact with other rocks and then you can have chemical reactions and as those chemical reactions occur that will cause precipitation of certain elements.
This makes it useful to look for places where we have these kind of guiding structures, or transport structures, that are going to be bringing material from a broad area into a really narrow area.
It’s not always the case that every deposit that you’re looking at will have formed this way though.
Sometimes you’re looking at variations in groundwater that are actually causing things to precipitate out.
Holly:It’s worth mentioning that often certain types of ore bodies can be fairly vertical so if you looked from above it would look like a fairly linear straight thin line; this is a kind of feature we look for in the data.
Often in geophysical data because there’s often a contrast across that structure either in density or conductivity or resistivity which enables us to pick that up.
Jess:I would say that if you’re taking a data science approach to geology and you’re looking for changes in these faults then you should be thinking about trying to do some kind of spatial filtering, particularly on the potential field data or the magnetics and gravity.
Mainly because what you’re looking for are high frequency changes, so generally when you look at, say, the magnetic map, you’ll see broad features that are you know changes in cover thickness or changes in the bulk behaviour of the rock.
Then you’ll see really fine scale lumps and bumps; some of them are just noise, but some of them are actually these really fine-scale fault features.
South Australian Government Resources Information Gateway has just added some fresh magnetotellurics data, which is deep crustal stuff.
If you’re interested in trying to understand where deep fluids might be coming from the crust, then it’s probably worth taking a look at that set.
Geology can provide context for what an anomaly is.
What we can do in data science is extract all of those as covariates to our predictor — essentially trying to say that given we know what all of the background rocks look like, tell me whether this is anomalous.
If you’re trying to convince a geologist that your model is worth something, you’re trying to say this is the context that my model is adding to your interpretation of this data.
So you can say, well we think we think this is anomalous because of all of these reasons, does that actually link really well to the geological processes?Holly:And when you are working as a geologist for a company there are additional constraints applied on you.
So, we don’t just get asked, ‘just go and find anything, go and look for anomalies’.
Usually there are other things that are driving what we’re doing — for instance if a deposit of some type has been found nearby, we’ll often be looking to use that model for that style of mineralization to find other similar deposits.
This makes us very driven by a particular model looking for particular styles of anomalies, and it can be easy to miss the other types of anomalies in the data which weren’t expected.
Also, most exploration departments, whether they be small within large miners have a focus on a particular commodity or a particular style.
This focuses us and allows us to apply our funds in a particular way and but it does mean that we might be too biased, looking for ore bodies that the department already processes or trades in.
And that’s a wrap!Many thanks to Holly and Jess for geeking out for 20 minutes to create this.
It’s our hope that it will introduce some more data scientists to the wonderful world of geology — because we need to find more resources to produce all the things society needs, but it’s a difficult game!.. More details