Geospatial binning with hexagons on spark

Discrete global grid systems recently got quite some attention in the GIS community when Uber released H3

It is a hexagonal spatial index. Neighbours can be accessed easily.

Several tasks are supported:

  • spatial index
  • spatial smoothing
  • efficient spatial join

And a variety of libraries around the core c based code has been created. nicely demonstrates how spatial anomaly detection can be quickly built using the python API.

But wouldn’t it also be great to use this functionality in spark? They do provide a java library:

Adding this to spark is straight forward. Using your build tool of choice first add a dependency to the jar.

compile group: 'com.uber', name: 'h3', version: '3.6.0'

or just start an interactive shell such as:

spark-shell --packages 'com.uber:h3:3.6.0'

and run:

import com.uber.h3core.H3Core
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.DataFrame

@transient lazy val h3 = new ThreadLocal[H3Core] {
    override def initialValue() = H3Core.newInstance()

def convertToH3Address(xLong: Double, yLat: Double, precision: Int): String = {
    h3.get.geoToH3Address(yLat, xLong, precision)

val geoToH3Address: UserDefinedFunction = udf(convertToH3Address _)

def createSpatialIndexAddress(result: String,
                                XLongColumn: String,
                                YLatColumn: String,
                                precision: Int)(df: DataFrame): DataFrame = {
      geoToH3Address(col(XLongColumn), col(YLatColumn), lit(precision)))

which can now be used as:

val df = Seq((1,2), (3,4)).toDF("x", "y")
df.transform(createSpatialIndexAddress("h3","x", "y", 7)).show(false)

|x  |y  |h3             |
|1  |2  |877545796ffffff|
|3  |4  |8775642cdffffff|

Other functions of the H3 library can be supported similarly. Using the hexagons greatly simplifies efficient spatial smoothing and aggregation (hexbinning).

This was a repost. The original post is found here:

Georg Heiler
Georg Heiler
infra & content

Bringing Innovation down to earth in the domain of data.