The xtensor visionWolf VollprechtBlockedUnblockFollowFollowingNov 10, 2018Here we’re laying out a vision for the xtensor project, the n-dimensional array in the C++ language — that makes it easy to write high-performance code and bind it to the languages of data science (Python, Julia and R).
Results of running the same C++ implementation from Julia, Python, R, and C++ (from left to right)It is important to show a clear vision — especially for Open Source projects: only a common goal can bring people together.
That is why we have decided to finally put together a document that lays out our vision on how C++ can play a major role in the future of data science, and why.
The three major languages of Data Science today are Python, Julia, and R.
There are lots of cool libraries in the Python scientific computing ecosystem that are not available in other languages, such as Julia and R — and vice versa.
For example, scikit-learn is a dedicated Python package.
Workarounds usually call into the Python interpreter from Julia, but this is not particularly fast or elegant and the conversion price is high.
The numerical methods included in those packages are well established and have been thoroughly tested thanks to a large user base.
Unfortunately, the choice of one of these three languages for a reference implementation instead of a common-denominator language prohibits adoption by other languages of data science.
The duplication of work in the different communities harms the sustainability of the ecosystem and reproducibility of the scientific results.
This is not only true for well-established numerical methods, but also for the implementation of standard protocols, file format readers, common data structures for data science.
For this task, key advantages of C++ are speed and maturity, and a large developer community.
While there are other promising languages, C++ has been a cornerstone of scientific computing for a long time and it is not going away — even though it has been sorely lacking a n-dimensional array library.
On the contrary, modern C++ is becoming better and better, the language and standard library are as good now as they’ve never been before.
C++ can be very fast.
You generally find C++ at the top of benchmarks.
For example in the benchmarks game, C++ is consistently among the top 3 with C and Rust.
However, a language like Rust shows more variance in the results, while C and C++ remain stable.
You can be sure to always have the means to create the fastest code possible with C++ — the language is never going to prevent you from getting there.
C++ has a strong type system that can be used to great benefit, and a powerful, flexible template system that allows us to let the compiler generate extremely optimized assembly.
Universal means re-usable: all code should ideally be written only once.
That’s why a universal language needs to make it easy to be used from other languages.
The three big ones in Data Science are Python, R, and Julia.
Each one of them has excellent tools to talk to C++: pybind11 for Python, RCpp for R and CxxWrap for Julia.
When you write your code in C++, you can be sure to be able to reuse it from each of these languages.
How xtensor attempts to solve this problemThe xtensor data science stackWith xtensor, we aim at developing a solution that works for the whole Data Science ecosystem.
xtensor is an n-dimensional array container for C++, like NumPy for Python.
We follow the NumPy API closely, but in pure C++ and strongly typed (there is even a cheat-sheet to show how closely the APIs are matched).
Furthermore, we offer official bindings to Python, R, and Julia.
This means, if you write your numerical algorithm once in C++ with xtensor, you can seamlessly use your routines from each of the big languages of Data Science.
xtensor is flexible enough to use the dynamic languages for memory management: if you allocate a Julia array through xtensor-julia, Julia’s own memory allocator and garbage collector are used.
xtensor also transparently handles any memory layout, such as Julia’s and R’s column-major and NumPy’s row-major layout.
C++ has made great strides during the recent years to modernize itself (with C++ 14 and 17) that makes it an incredibly productive language.
In the remaining blog post we will look at the implementation of an algorithm for ray-shading height maps — which has come up as a challenge from the R community first, then was picked up to show how GraalVM can speed up R, which was contested by Julia and Pythran.
The aim of this article is now to show how we could implement the algorithm once and for all in C++ with xtensor, and then create the bindings to each of those languages while retaining a maximum speed.
The challengeWe have ported over the original R & Julia (from Tyler Morgan-Wall, and Simon Danisch) first to Python, and then to xtensor (you can find the Python version here).
Our version lives in https://github.
com/wolfv/xtensor-showcaseYou might note that, apart from the use of curly brackets here and there, the length of the ported C++ code is basically the same as the original Python or R implementation.
NumPy users may also recognize functions such as linspace.
This is because we stick to the NumPy APIs for most high-level array operations.
The big difference is, of course, providing the <double> template parameter which is the strong compile-time typing necessary for efficient computation and the generation of fast code with our template expression engine (a great explanation of how this works can be found here).
We can now create bindings for the big three: Python, R, and Julia.
For each language, the respective xtensor package exists:xtensor-python: deals seamlessly with NumPy arraysxtensor-r: transfer back and forth R vectors and matricesxtensor-julia: bind Julia’s n-dimensional arraysThose three packages all work similarly.
They use a language-specific C++ package (pybind11, CxxWrap.
jl, and RCpp) to natively create the data structures in the host language.
Using these packages we create new xtypes in C++: xt::pyarray<T> for a NumPy data backed python array, xt::rarray<T> for the R version and xt::jlarray<T>for Julia.
Additionally, the packages also contain versions with a static number of dimensions (the equivalent to the xt::xtensor<T, N> where N is an integer indicating the dimensionality.
This enables a bunch of optimizations for the C++ compiler and in the template expression engine.
Let’s look at the Python bindings, for example (Julia and R bindings are just as simple):As input to the function, we take a xt::pyarray<float>.
This type is registered with pybind11, and the type is converted automatically from a NumPy array — without copying the buffer content at any step!.As the rayshade_impl function takes a template argument E as input, we can call it with any class that follows the xexpression interface.
Of course, the pyarray does that (as well as the rarray and jlarray).
BenchmarksBelow are the benchmarks.
As you can see, the benchmarks show how compiled languages like C++ and Julia can generate extremely effective code, and that the xtensor-bindings do not impact the speed at all!.(By the way, the C++ result is a little slower than the Julia one, because no dedicated benchmarking package has been used and we’ve only timed one iteration on a laptop — the CPU usually needs some iterations to ramp up the speed).
Benchmarks ╔══════════════════╦════════════╗ ║ Language ║ Time (s) ║ ╠══════════════════╬════════════╣ ║ C++ ║ .
0164812 ║ ║ Python / xtensor ║ .
0220982 ║ ║ Python / NumPy ║ 14.
224207 ║ ║ Julia / xtensor ║ .
014221 ║ ║ Julia ║ .
014501 ║ ║ R / xtensor ║ .
01328 ║ ║ R ║ 9.
905 ║ ╚══════════════════╩════════════╝We need to make a honorable mention here: Brodie Gaslam showed impressively how to make R code performant by utilizing vectorization (the same principle can be used in NumPy, as well as xtensor).
His code performed almost as fast as R with xtensor and clocked in at 0.
How to get startedThe code for this example is uploaded on https://github.
We’ve created a lot of documentation for xtensor as well as cookie-cutter project templates to help people get started with plugins for Python and Julia.
You could also check out the various xtensor-repositories on GitHub under the QuantStack project.
And we’re often online on the Gitter chat room or on Twitter.
Where do we want to go from hereThere are many interesting and challenging tasks ahead for xtensor:More of the NumPy API: We already cover a lot, but there are still some missing parts!.We’ve even compiled a GitHub project with some easy-to-tackle missing bits of the NumPy API.
If you want to get your feet wet again with C++, for example, we’re happy to mentor newcomers through the implementation of some missing functionality!Accelerate some functions further: There are still some bottlenecks in the functions we have implemented so far.
It’s time to get rid of all of them.
GPU support — an obvious one.
Everybody has it and we need to get it, hopefully, next year.
This can speed up operations on big data massively.
More interoperability: We want to have deep support with Apache Arrow (work in progress) and PyTorch, for example.
We would also be happy with more language bindings.
Matlab/Octave for example, or Java!Compile NumPy to xtensor: This is a big one.
With the help of Pythran, we could compile NumPy code to C++ (Pythran currently has its own NumPy implementation in C++).
We already work closely with the author of Pythran and Pythran is using the same SIMD library as xtensor (called xsimd).
The big vision here would be that one could write the numeric code once using NumPy for fast prototyping, and then automatically compile it to C++.
Once in C++ we could easily generate the binding code to Python (to export to a standalone library) as well as R and Julia!.This could make it potentially extremely nice to write cross-language libraries from a high-level algorithm description.
You can follow the author on Twitter or visit the Gitter chat room at https://gitter.
im/QuantStack/Lobby to discuss xtensor and related projects!Links:xtensor: GitHub ReadTheDocs.