Given that GraalVM can perform translation between its multitude of supported languages, is it possible to define a “Domain” that can be accessed by all?This is, of course, a rhetorical question and the answer is “Yes”.
In this article I’ll demonstrate how to share domain objects between JVM languages and guest languages on the GraalVM platform.
I’m using Scala domain objects (because Scala is awesome), but you could do the same with, for instance, Java or Kotlin.
(If you’re new to GraalVM Polyglot abilities, consider also reading my previous article on the subject: using GraalVM to execute R files from Scala.
)The ProblemTo demonstrate the problem we are trying to solve, we first need a pretend domain.
Let’s do something with Weather Forecasts, because people always talk about the weather!Creating weather forecasts is the kind of terribly complicated modelling business that could be built in R, but luckily we don’t actually need a working model for this article.
So let’s just pretend we already have this awesome R functionality that creates weather forecasts, cleanly abstracted away in a separate file called fun_MagicHappensHere.
R:When brought into scope with R’ssource the above file will yield a magicHappensHere function that can be called and returns a data.
frame with some weather forecast information.
We can then return the result to Scala by simply making it the return of our R function:Wow, that doesn’t look too bad!.This won’t get many complaints from the Data Scientist, I reckon.
So, what’s wrong with this?.What’s the problem?I’m glad you asked, interlocutor!.Let’s take a look on the Scala/JVM side of this equation, to see what the Data Engineer has to deal with:The problem: part 1!Whoa… creating the Graal Context and Source is trivial, but look at the nasty type signature on that call to R!.Let’s pick it apart for a bit:A Map that contains Lists of each data.
frame row keyed by its name… That makes sense, well done Graal!.It’s just too bad it’s Stringly typed, rather than actual methods on an actual class, so any typo will mess us up at runtime.
Unknown content type of the Lists?… That’s unfortunate, we know that some rows should only contain String, while others contain Int but this information is lost in conversion… We have to do a bunch of casting!The returned Collections are Java?.That’s just sad!.The polyglot representation of collections doesn’t transfer to Scala, but Scala Map and List are much more powerful than their Java equivalent, so we’ll have to convert the Java equivalents!Every element of each List doesn’t actually belong to the rest of the List, but instead should be combined with each corresponding position in every other List to actually make a WeatherReport… (The first entry of “humidities”, should be paired with the first entry of “temperatures” etc.
)Let’s see what this means when we try to use the output of this function:The problem: part 2!I don’t know about you, but I’d feel quite uncomfortable at the thought of maintaining the code above.
It’s verbose, error prone, brittle, annoying and it fails at the wrong spot if any mistakes are introduced (namely at the place of conversion, rather than the place of programming error).
I wish the R function would just return a Set of WeatherForecast!Whoops, hold on… Wait a minute…Why don’t we just make it do that?The Solution: BindingsGraalVM comes with an option that makes it possible to explicitly share instances of code across the language divide.
It makes it possible to add symbols to bindings that are accessible to other languages.
The Graal Context has two functions that can be used to do this in a very similar way:getPolyglotBindings()getBindings(“nameOfLanguage")In this article I will be using getBindings, because it doesn’t require an explicit import on the side of the using language and it allows you to limit which languages you are exposing each binding to.
Using getPolyglotBindings() is almost identical from a coding perspective though, so pick the one you like best.
Using Domain objects on both sides of the language divideThis is what our Domain object looks like:Domain is basically a factory that can be used to spawn new instances of all the domain classes that we want to share.
The classDomain itself is immutable!.(As it happens, the spawned instances are too.
)WARNING: You probably don’t want to put a mutable object into bindings.
If you do, this object can be mutated from any language that can reach it.
Just as you don’t want multiple threads to tangle with the same mutable object, you don’t want multiple languages to access the same mutable state!.(Really! Imagine having to debug race conditions across language boundaries.
)Any instance of the Domain class provides methods to spawn new instances of the following domain case classes:Let’s put an instance of our Domain class into the bindings for R, so it can be accessed from the R guest language context:Easy peasy.
From R, the new object will simply be known as Domain and its methods will be accessible like this: Domain$methodName(arguments)We turn a new R file, that uses this binding, into our newest Source:And then we define the function:Now that this is our return type, all we need to do to work with the returned WeatherForecasts is this:That is one very happy Data Engineer!.(Don’t forget to compare with the incomplete parsing above.
)Now, let’s see the impact on the DataScientist side:As we can see, the code has become more verbose (although it’s actually quite efficient still, if you take out all the clarifying comments I put in), but not quite as bad as in the previous solution:In this R file, we now need to convert the data.
frame to proper WeatherForecast instances to be added to the WeatherForecastList we also got from Domain.
But rather than doing a Parse & Pray, as we had to do with the no-bindings solution, we can now use proper constructors that will fail with intelligible errors if we make a mistake.
(Sadly still only at runtime, because this is still R.
) Cleanly taking values out of the data.
frame is also better supported by its native language and we could add more convenience methods to more succinctly create the domain classes if we wanted to.
If we have direct control over the function that creates the weather forecasts, we can even skip the data.
frame altogether and exclusively use WeatherForecastList, which eliminates the extra code seen above.
The biggest advantage, though, is that we now have a very clearly defined interface.
Any user can open up the Domain.
scala file to see what methods are available, what parameters they take and what things they return.
ConclusionUsing Bindings to provide a clean shared domain between guest languages (like R or Python) and JVM languages (like Scala, Java or Kotlin) in GraalVM is pretty easy and gets rid of a lot of ugly and fault-sensitive parsing.
It also provides a crucial stepping stone for further integration of functionalities across language boundaries.
PS: I could have added a factory for each separate domain class to the bindings, instead of giving them a shared factory.
This can make the code on the R side a little shorter, but creates a less clean interface (at least to my taste).
SourcecodeI have reused the example project from my previous article on using GraalVM to execute R files from Scala) and branched it for this article.
The source code can be found here.
The snippets above are taken from the linked project and altered to better fit the sizing of the article.