Geospatial binning with hexagons on spark

H3 hexagons for equidistant bins on spark.

By Georg Heiler

H3 enables users to partition the globe into hexagons for more accurate analysis

Discrete global grid systems recently got quite some attention in the GIS community when Uber released H3 https://eng.uber.com/h3/.

It is a hexagonal spatial index. Neighbours can be accessed easily.

Several tasks are supported: – spatial index – spatial smoothing – efficient spatial join

And a variety of libraries around the core c based code has been created. https://github.com/uber/h3-py-notebooks nicely demonstrates how spatial anomaly detection can be quickly built using the python API.

Adding this to spark is straight forward. Using your build tool of choice first add a dependency to the jar.

But wouldn’t it also be great to use this functionality in spark? They do provide a java library: https://github.com/uber/h3-java

// https://mvnrepository.com/artifact/com.uber/h3
compile group: 'com.uber', name: 'h3', version: '3.6.0'

or just start an interactive shell such as:

spark-shell --packages 'com.uber:h3:3.6.0'

and run:

import com.uber.h3core.H3Core
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.DataFrame

@transient lazy val h3 = new ThreadLocal[H3Core] {
    override def initialValue() = H3Core.newInstance()
  }


def convertToH3Address(xLong: Double, yLat: Double, precision: Int): String = {
    h3.get.geoToH3Address(yLat, xLong, precision)
  }

val geoToH3Address: UserDefinedFunction = udf(convertToH3Address _)

def createSpatialIndexAddress(result: String,
                                XLongColumn: String,
                                YLatColumn: String,
                                precision: Int)(df: DataFrame): DataFrame = {
    df.withColumn(
      result,
      geoToH3Address(col(XLongColumn), col(YLatColumn), lit(precision)))
  }

which can now be used as:

val df = Seq((1,2), (3,4)).toDF("x", "y")
df.transform(createSpatialIndexAddress("h3","x", "y", 7)).show(false)

+---+---+---------------+
|x  |y  |h3             |
+---+---+---------------+
|1  |2  |877545796ffffff|
|3  |4  |8775642cdffffff|
+---+---+---------------+

Other functions of the H3 library can be supported similarly. Using the hexagons greatly simplifies efficient spatial smoothing and aggregation (hexbinning).

Link to the orignial post: https://georgheiler.com/2019/11/20/geospatial-binning-with-hexagons-on-spark/

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Pin It on Pinterest

Website Security Test