Blog
VDSG blog
Reproducible geospatial visualization in kepler.gl
Jinja templates in jupyter for kepler.gl
by Georg Heiler

Effortless and great looking visualizations can be achieved using https://kepler.gl/. However, kepler by itself is tedious to use as updating the data files like you might have done with QGIS or ArcGIS does not work on the website.…
Read moreGeospatial binning with hexagons on spark
H3 hexagons for equidistant bins on spark.
By Georg Heiler

Discrete global grid systems recently got quite some attention in the GIS community when Uber released H3 https://eng.uber.com/h3/…
Read moreDS Salon Vol. 3: Social Media – Monitoring Their Impact on Civil Society – revisited
On October 21 the Data4Good program lead by VDSG held it’s third DS Salon. The agenda and additional details can be found on our meetup page of the event. The videos are now on our youtube channel.
Knowledgefeed vol. 29: revisited
Jelena’s and Marcin’s talks are now on our youtube channel.
Slides are available here.
You can find slides for this talk here.…
Read moreSpark descriptive name for cached dataframes
Concise names for cached tables.
By Georg Heiler
Have you ever wondered where the cryptic names of cached dataframes and RDD in Spark’s web UI belong to? Usually no specific name is set. When you apply a df.cache
spark will auto generate the name as a snippet from the query plan.…
Ultimate Open Vector Geoprocessing on Spark
By Georg Heiler

More and more people start to work with large quantities of geospatial data and think about using spark and one of its geospatial additions like geomesa, geospark or geotrellis. And soon they come to realize that one of the tools does not provide a function which would be commonly required.…
Read moreScalable cohort sampler
Use Dask to paralellize python
By Georg Heiler
I have a some binary cohorts and need to sample for each cohort where the target label 1
is matching a observation from a dataframe where the target is 0
.
sample data
First, I will generate some sample data.…
Read more