# Understanding Machine Learning on Point Clouds through PointNet++

Understanding Machine Learning on Point Clouds through PointNet++Zachary SingerBlockedUnblockFollowFollowingJan 24IntroductionData can take on a variety of forms.

For processing visual information, images are extremely common.

Images store a two-dimensional grid of pixels that often represent our three-dimensional world.

Some of the most successful advances in machine learning have come from problems involving images.

However, for capturing data in 3D directly, it is less common to have a three-dimensional array of pixels representing a full volume.

One of the simplest and most cost-effective ways to retrieve spatial data in 3D is through a point cloud.

Surprisingly, not much work has been done on machine learning for point clouds, and most people are unfamiliar with the concept.

In this article, I will:Define point clouds and the spaces they live in.

Understand the problems in machine learning involving point clouds.

Unpack one of the pioneering research papers on machine learning for point clouds — PointNet++.

What are Point Clouds?As the name suggests, point clouds are collections of data points in space.

A popular way to gather data in 3D is through a scanner, which detects the surfaces of objects through a series of coordinates.

Storing information as a collection of spatial coordinates can save space, since many objects don’t fill up a lot of the environment.

An image and a 3D point cloud representation detected by a scanner.

However, point clouds aren’t limited to 3D.

Any collection of high dimensional objects can be considered a point cloud.

Even if the information is not visual in nature, interpreting data as a point cloud can help in understanding the relationship between multiple variables.

At its core, a point cloud is just a fancy name for a mathematical set, which is an unordered collection of objects.

Generally, we think of these objects as a bunch of isolated points in space that are loosely describing some solid structure or surface, but this is just the intuition.

As long as we have a way of calculating the distance between points, that’s all that matters.

A space with a notion of distance is called a metric space, and this is one of the most general settings for point clouds.

Machine Learning on Point CloudsMotivationThe types of problems we’d like to solve on point clouds.

First of all, what are some of the tasks we’d like to perform on point clouds?.There are two major types of problems that are common for doing machine learning on point clouds: Classification and Segmentation.

Classification asks the question: What type of object is this?.The goal is to classify the entire point cloud with one label.

There can be two labels (i.

e.

is this data of a cat or a dog?) or multiple (i.

e.

is this data of a car, plane, boat, or bike?).

Segmentation asks the question: Can you separate this object into distinguished parts?.If we have a point cloud describing a bike, maybe we want to separate the wheels, handles, and seat (Part Segmentation).

Segmentation is also used for handling complex point clouds that describe an entire environment rather than a single object.

For example, we may have a point cloud describing a traffic intersection, and want to distinguish each individual car, person, and stoplight (Semantic Segmentation).

How to solve them?Unlike other visual problems, we can’t just throw something like a convolutional neural network at this problem.

Remember: Point clouds don’t have a rigid structure, so we can’t as easily apply the typical tools for processing an image.

The basic idea is this: What we do have is a metric, which tells us the distance between points.

This allows us to group together points that are close to each other in tiny pockets, which are then compressed into a single point.

We can repeatedly apply this principle to summarize the geometric information and ultimately label the entire point cloud.

PointNet++: Deep Learning on Point CloudsPointNet++ is a pioneering work in applying machine learning on point clouds.

The architecture is composed of multiple components that aggregate local information and pass it along to the next step.

Since point clouds are unordered, the aggregation steps cannot depend on the order of the input.

How can a machine learning algorithm not depend on the order of its input?Here’s a principle that helps regardless of the type of data or the type of problem: The essential problem in machine learning is function approximation.

Finding a function that is input invariant sounds daunting, so can you think of any simple ones, even for a function that just takes in three numbers f(x, y, z)?.Such a function is called symmetric.

Here’s some common symmetric functions:f(x, y, z) = xyzf(x, y, z) = x + y + zf(x, y, z) = max(x, y, z)The components in PointNet++ actually utilize that last function!.For each tiny group of points, after a few initial transformations there is a max operation that combines everything.

The Architecture of PointNet++The architecture for PointNet++, broken up into multiple stages.

There are a number of stages in the architecture of PointNet++, but each part has a well-defined goal.

Starting from the entire point cloud, points are grouped into some number of clusters, and condensed into a single point that carries new information.

In addition to its d spatial coordinates, each point also carries C pieces of information.

This process continues, taking the new points and grouping them into more clusters.

Depending on the problem, the process then reverses itself and tries to build back the original structure.

Especially when we would like to classify each original point, the network has a series of interpolating steps to go from one point to a group.

These steps all rely on utilizing the distance function.

The interpolation step uses an inverse distance weighted average, defined below.

Inverse distance weighted average for interpolation (quite a mouthful).

Here f(x) is the interpolated value at coordinate x, C is the number of classes, and k is the number of neighbors used (as in k-Nearest Neighbors).

Results: How does PointNet++ do?PointNet++ builds off a previous iteration by the same group, called PointNet.

For the task of semantic segmentation, the results for a scanned kitchen are shown.

PointNet is the original model, and PointNet++ is the new one (“Ours”).

In addition to the room layout, the individual objects such as chairs, doors, and tables are identified.

For the task of spatial recognition, PointNet++ is a great baseline.

ConclusionPoint clouds can efficiently describe spatial datasets for a variety of situations.

One of the pioneering papers, PointNet++, demonstrates that semantic segmentation problems can be solved for point clouds in complex environments.

Many newer papers aim to apply these principles to more specific problems, such as segmenting branches in blood vessels for medical imaging.

Thanks for reading!.I’d be interested in hearing about other papers on point clouds with machine learning, it’s a relatively unexplored topic.

Feel free to add any questions or comments as well, I really enjoy reading your thoughts!References[1] Qi, Charles R and Su, Hao and Mo, Kaichun and Guibas, Leonidas J.

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation.

arXiv preprint arXiv:1612.

00593, 2016.

http://stanford.

edu/~rqi/pointnet/[2] Qi, Charles R and Yi, Li and Su, Hao and Guibas, Leonidas J.

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space.

arXiv preprint arXiv:1706.

02413, 2017.

http://stanford.

edu/~rqi/pointnet2/.