Connecting POIs to a road networkScalable interpolation based on the nearest edgeYuwen ChangBlockedUnblockFollowFollowingApr 20This article discusses the process of handling the integration of point geometries and a geospatial network.
Figures and references will be added soon.
IntroductionWhen we work with network analysis, our objects of analysis (represented as nodes or vertices) should be connected through a graph, with defined adjacencies and edge weights.
However, we often gather data from different sources and need to integrate them manually.
The process is actually a bit more complicated than I had imagined.
Let us look at this example.
In the scenario of analyzing Points of Interest (POIs) in a city, we may have a pedestrian network retrieved from OpenStreetMap and a set of restaurants queried from Yelp.
One common approach is to snap the POIs to the nearest vertices (nodes, junctions, road bends) on the network.
While this is a very simple an efficient way to implement, some problems with this method include (1) ignoring the edge between the POI and the network, and (2) inconsistent snapping results depending on the availability of nearby vertices.
In other words, although the walking distance of a POI to the network based on the snapping result may be somehow attributed to the vertex as access impedance, we still lose some geospatial information.
Moreover, if the only nearby road segment is a long and straight one with few vertices, the snapping distance will not be a realistic one.
While map matching techniques, as usually adopted in GPS trajectory recovery, address a similar problem, it often returns a calibrated trajectory itself instead of modifying the actual network.
In the reference, you may find some of the questions raised by the community, and yet, the solutions I have found are not very comprehensive or ready-to-use as a geoprocessing tool.
For my use case, it will be best to create a Python function that will be handy whenever I want to merge a set of points into the network.
The IdeaTo work things out, let us break down the process into several steps:Prepare two tables (geopandas.
GeoDataFrame in my case) representing the network: one for nodes and one for edges, referred to as "internal" nodes and edges below.
Prepare another table with Point geometries that we want to merge into the network in step 1.
They will be referred to as “external nodes”.
Update the internal nodes table with all the external nodes.
This should be as easy as a table concatenation, as the difficult part comes with the edges.
Find projected access points of all external nodes and update them to the table.
Now, unlike snapping, we need to generate a “projected access point (PAP)” for each external node.
These PAPs are the nearest point from the network to the external point.
In reality, this would be where you “hit the road” after leaving the building and walk straight to the road with the shortest route.
Update internal edges when necessary.
When we generate PAPs, chances are they may fall on an edge instead of on a vertex (which is basically common snapping).
To make sure our graph recognizes this connection correctly, we need to break the original edge into segments.
, from A — — — — B to A — — P — — B.
In the table, this would mean replacing the original record with multiple new records while inheriting the same road attributes (e.
, road name, number of lanes, etc.
) except for lengths.
Create and update external connections to external nodes.
In reality, this means establishing the shortest walking path from the road to the building.
The DetailsAs easy as these may sound, let us discuss the key issues within step 4 to step 6.
Do keep in mind that the most important goal for me is the scalability in terms of processing time.
Singapore, for instance, has 185k buildings 120k pedestrian road segments, and while this process may be a one-time effort, we really do not want it to take hours or days to finish.
The first naive proof-of-concept version of the function I wrote was still running (after a day and a night) by the time I have the optimized version crafted and debugged and tested and visualized and presented and… well, you know what I mean.
Be noted that the osmnx package has trimmed redundant nodes in the network retrieved from OpenStreetMap, meaning nodes that are just a bend in the road instead of a junction linking more than two edges will be removed.
This is important because it makes graph calculation and our processing more efficient while saving space.
However, with commonly adopted snapping method, we might want to keep as many of these nodes as possible as they may make the snapping result better as compared to sparse nodes.
In the extreme case where there are numerous nodes, the result will be the same as our proposed approach.
Step 4: This step can be further broken down into two parts: find the nearest edge and get the projected point.
Find the nearest edge:In order to make a projection and the interpolate a PAP, we need to determine which edge we are actually making the projection.
project and shapely.
intersection can be applied to MultiLineString, it does not return the target LineString segment and the results are not stable with some unreasonable projections present.
Therefore, we need to locate the nearest edge by ourselves.
A better way to do this is to have our edges stored in a proper data structure.
Since we are storing LineString instead of Point, I choose to go with R-tree and not k-d tree.
This stores each edge as a rectangle bounding box, making it very easy to query efficiently.
While one may use rtree.
nearest, this may yield undesirable results.
My understanding is that the distance being measured is actually that between the Point and the "rectangle", not the LineString.
A more comprehensive way (and scalable later on for variations of nearest edge) is to query k-nearest neighbor (kNN) based on R-tree and then calculate the actual distances to each LineString inside the box.
Get the projected point:With the nearest edge, we can easily get PAP with line.
Step 5: This step is also broken down as the following:a.
Determine edges and nodes to update:Since there can be more than one PAP on each edge, we want to process them all together instead of repeating the process.
We group all PAPs by the lines they are projected onto (i.
, the nearest edge).
Split the edges with careful processingNext, we may easily get all the line segments with the shapely.
split function (split A——B with P1, P2 returns A—P1, P1—P2, P2—B).
But, whenever one thinks that is the easiest part, that is often where bugs live.
So I ended up coming across a precision issue with all the above processing so far.
Simply put, when you project and interpolate a Point on a LineString, the returned PAP does NOT intersect with the LineString, making it unable to be used to split the line.
In order to solve this, we need to "snap the LineString to the PAP".
This results in a very slight change that is not recognizable to us but ensures the intersection is valid and, therefore, enabling the split.
Step 6: This step is simply generating new nodes by specifying the “from” and “to” nodes and update them to the edges table.
All updates including this one are just mere table manipulation and will not be discussed here.
Note that we used GeoDataFrame to store shapefiles and converted the CRS into epsg=3857 to calculate the road length in meters.
DiscussionAs you may have already known, one biggest difference between a geospatial network and an abstract graph structure (e.
, social network) is that the edges in the former are also concrete objects and have shapes.
This is why we have to navigate our way through these edges and manipulate them.
In a social network, it will not make sense to implement this node-to-edge interpolation.
I believe there is definitely room for improvement for this function.
For example, using cython to optimize the code may achieve better performance.
One may also try to carry out these geoprocessing steps with PostGIS, which I haven't tried yet since I prefer having it done in Python currently.
However, given the current runtime of 7 minutes for the Singapore data (the aforementioned buildings to the pedestrian network scenario), it should be acceptable at the moment.
For hundreds of POIs, this process should be done within seconds (if not second).
I am also considering a multi-directional kNN connection extension.
To illustrate the use case, imagine that the ideal bus stop is at your backdoor, but the nearest footway yield by this method connects your house to the road at the front door.
This gives you an unrealistic walk time and route calculation when you are to take a bus at your back door.
Simply establish kNN connections may not fully solve this problem, as they may all be segments at the front door.
Thus, fetching reasonable kNN edges from different directions of the Point will be desirable.
You may find a demonstration Jupyter Notebook and the script of the function here.