November 3, 2016 · Notes Elasticsearch ·

Getting Started with Geospatial Queries in Elasticsearch

As an Elasticsearch newbie, I have recently bumped into Geospatial queries, so I've decided to write a brief post about it.

I do not intend this post to be a complete list of all geo-related features within Elasticsearch. For that, their documentation should be a good place to start. This is merely a set notes that might help kickstart your Geo + Elasticsearch "adventures" if you are a beginner, like me!

For the examples below, the data we are working on is a list of available Landsat8 scenes within the Philippines during some time around 2015.

I would suggest cloning my script and sample data from Github if you'd like to follow along. It includes a Python script to load data, as well as some bash scripts for each sample HTTP request we'll talk about later.

For reference, here's a sample item in my dataset:

{
    "scene_center": {
        "lat": 8.673640241520552, 
        "lon": 121.13240623401293
    },
    "footprint": {
        "type": "Polygon", 
        "coordinates": [
            [
                [
                    120.47309, 
                    9.72408
                ], 
                [
                    122.167, 
                    9.36347
                ], 
                [
                    121.78675, 
                    7.62076
                ], 
                [
                    120.10088, 
                    7.9841
                ], 
                [
                    120.47309, 
                    9.72408
                ]
            ]
        ]
    }, 
    "scene_id": "LC81150542015162LGN00",
    "capture_time": "2015-06-11T00:00:00"
}

Part 1 - Create a new index

Okay. Let's start. In this example, we will name our Elasticsearch index landsat8.

curl -XPUT 'http://localhost:9200/landsat8'

Part 2 - Define Mapping

Before loading our data, we will have to tell Elasticsearch that our scene_center field is a geo_geopoint and our footprint field is a geo_shape. This is done by explicitly specfying mapping definitions.

To do so, run the following command on your terminal:

curl -XPUT 'http://localhost:9200/landsat8/_mapping/scene' -d '{
  "scene" : {
    "properties" : {
      "scene_center" : {
        "type": "geo_point"
      },
      "footprint" : {
        "type": "geo_shape"
      }
    }
  }
}'

Let's check the currently defined mappings for type scene by calling the _mapping endpoint.

curl -XGET 'http://localhost:9200/landsat8/_mapping/scene?pretty=True'

Mappings for the rest of the fields such as dates and text should automatically be recognized as soon as our actual data is added.

Also, take note that we have refered to a new resource type called scene. This will be our main resource type moving forward.


Part 3 - Insert Data

Loading data can be done by running this from my Github scripts:

python load.py

If you want to manually insert a single item just to try it out, you can do this instead:

curl -XPOST 'http://localhost:9200/landsat8/scene' -d '{
    "scene_center": {
        "lat": 8.673640241520552, 
        "lon": 121.13240623401293
    },
    "footprint": {
        "type": "Polygon", 
        "coordinates": [
            [
                [
                    120.47309, 
                    9.72408
                ], 
                [
                    122.167, 
                    9.36347
                ], 
                [
                    121.78675, 
                    7.62076
                ], 
                [
                    120.10088, 
                    7.9841
                ], 
                [
                    120.47309, 
                    9.72408
                ]
            ]
        ]
    }, 
    "scene_id": "LC81150542015162LGN00",
    "capture_time": "2015-06-11T00:00:00"
}'

Bonus: I included a script to delete all entries from our landsat8 index and then run everything we've discussed so far:

./rebuild.sh

Part 4 - Sample Queries!

Finally we can start playing with our data.

4a. Geo Distance

Here, we'd like to fetch scenes with scene_centers within 100 km from Manila

curl -XPOST 'http://localhost:9200/landsat8/scene/_search?pretty=true' -d '{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "100km",
                    "scene_center" : {
                        "lat" : 14.5995,
                        "lon" : 120.9842
                    }
                }
            }
        }
    }
}'

4b. Geo Shape

Since GeoJSON is pretty popular with us web developers, lets have an example. On this next one, let's fetch all the scenes that have footprints that intersect with a bounding polygon such as this one:

Now here's our query:

curl -XPOST 'http://localhost:9200/landsat8/scene/_search?pretty=true' -d '{
    "query": {
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "footprint": {
                        "shape": {
                            "type": "Polygon",
                            "coordinates": [
                                [
                                    [120.59692382812499, 15.739388446649],
                                    [119.970703125,14.966013251567164],
                                    [120.38818359375, 14.019355706886051],
                                    [120.91552734375,14.583583455156525],
                                    [122.70629882812499,15.241789855961722],
                                    [120.59692382812499, 15.739388446649]
                                ]
                            ]
                        }
                    }
                }
            }
        }
    }
}'

4c. Geohash Grids

This next one aggregates results into buckets, which allows us to do something like this (the illustration is using the same sample data, plotted on a map using Kibana):

We can do this by using geohash_grid aggregation on our scene_center field.

curl -XPOST 'http://localhost:9200/landsat8/scene/_search?pretty=true' -d  '{
    "size": 0,
    "query": {
        "bool": {
            "must": [],
            "must_not": []
        }
    },
    "aggs": {
        "scene-grid": {
            "geohash_grid": {
                "field": "scene_center",
                "precision": 3
            }
        }
    }
}'

This should give us a list of buckets containing keys and doc_counts, where each key is the Geohash of a bucket's center and doc_count is the number of scenes included in that bucket.

Sample results:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1651,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "scene-grid" : {
      "buckets" : [
        {
          "key" : "wdr",
          "doc_count" : 46
        },
        {
          "key" : "wdy",
          "doc_count" : 31
        },
        ...

      ]
    }
  }
}

Alright! That would be all for now. These were just a few of things I have learned so far while trying to get familiar with Elasticsearch. I hope it helps, somehow.