Geospatial Queries with DynamoDB
DynamoDBGeo is a library that adds geospatial querying capabilities to Amazon DynamoDB. It allows for efficient storage and querying of geographic locations using geohashes. Geohashing is a method of encoding geographic coordinates into a string by dividing the globe into a grid of cells and sub-cells. The longer the geohash, the smaller the geographic area it represents, offering greater precision in locating points.
Geohash Example:
- Latitude:
37.7749
(San Francisco, CA) - Longitude:
-122.4194
Using the geohashing algorithm, the result would be "9q8yy"
view map
DynamoDBGeo handles the complexity of encoding, storing, and querying these geospatial data points, allowing for proximity-based queries (like finding locations near a specific point).
Configuring the table
The required table schema for using DynamoDBGeo includes the following index keys to support geospatial queries:
1. Partition Key (Hash Key):
- This will be part of the geohash generated by the DynamoDBGeo library. Typically, it’s represented as a prefix of the geohash. This allows for geo-points to be spread across multiple partitions for efficient queries.
- Example: San Francisco’s geohash is “9q8yy” and Oakland’s geohash is “9q9p1″ to group these cities on the same partition, the hash key should be “9q”. view map
- Name: “
hashKey
“
2. Sort Key (Range Key):
- This key ensures uniqueness for geospatial points. It can be a UUID or another unique identifier.
- Name: “
rangeKey
“
Installation
You’ll need to install the library. Go to your terminal and and run the following command:
pip install boto3 dynamodbgeo s2sphere
The Starter Code
The following code sets up a connection to an AWS DynamoDB table with dynamodbgeo. It configures a GeoDataManager
to handle geospatial queries and operations by wrapping a boto3 table instance.
import boto3
import dynamodbgeo
import uuid
# you need a boto3 resource instance to communicate with dynamodb
dynamodb = boto3.resource('dynamodb', region_name='YOUR TABLE REGION')
# create a configuration instance for your dynamodb table
geo_table_config = dynamodbgeo.GeoDataManagerConfiguration(dynamodb, 'YOUR TABLE NAME')
# create a table manager instance
geo_data_manager = dynamodbgeo.GeoDataManager(geo_table_config)
Put Point
Parameters
geo_point (dynamodbgeo.GeoPoint)
: the geographical coordinates of the itemitem_id (str)
: The unique id of the new itemput_parameters (dict)
: DynamoDB standard parameters. Documentation can be found [here].
You will pass these parameters into a dynamodbgeo.PutPointInput
instance and then execute the operation using the geo_data_manager
we created earlier.
# define the dynamodb put_item parameters
put_parameters = {
'Item': {
'Country': {"S": "US"},
'City': {"S": "San Francisco"}
}
}
# define the geo point
geo_point = dynamodbgeo.GeoPoint(
37.7749, # latitude
-122.4194 # longitude
)
# create a unique id for your item
item_id = str( uuid.uuid4())
# run the put operation
geo_data_manager.put_Point(
dynamodbgeo.PutPointInput(
geopoint, # GeoPoint instance that represents the gps coordinates
item_id, # Use this to ensure uniqueness of the hash/range pairs.
put_parameters # dynamodb put item parameters go here
)
)
Update Point
Parameters
geo_point (dynamodbgeo.GeoPoint)
: the geographical coordinates of the itemitem_id (str)
: The unique id of the item to be updatedupdate_parameters (dict)
: DynamoDB standard parameters. Documentation can be found [here].
You will pass these parameters into a dynamodbgeo.UpdateItemInput
instance and then execute the operation using the geo_data_manager
we created earlier.
# define the update_item parameters. do not add TableName or Key
update_parameters = {
"UpdateExpression": "set City = :val1",
"ConditionExpression": "City = :val2",
"ExpressionAttributeValues": {
":val1": {"S": "Oakland"},
":val2": {"S": "San Francisco"}
},
"ReturnValues": "ALL_NEW"
}
# define the geopoint
geo_point = dynamodbgeo.GeoPoint(37.802663456, -122.26916559)
# run the update operation
geo_data_manager.update_Point(
dynamodbgeo.UpdateItemInput(
geo_point, # GeoPoint instance that represents the gps coordinates
"ITEM ID", # ID of the item you are updating.
update_parameters # your update parameters
)
)
Delete Point
Parameters
geo_point (dynamodbgeo.GeoPoint)
: the geographical coordinates of the itemitem_id (str)
: The unique id of the item to be deleteddelete_parameters (dict)
: DynamoDB standard parameters. Documentation can be found [here].
You will pass these parameters into a dynamodbgeo.DeleteItemInput
instance and then execute the operation using the geo_data_manager
we created earlier.
# define the delete item parameters, leave out the "Keys" attribute
delete_parameters = {
"ConditionExpression": "attribute_exists(Country)",
"ReturnValues": "ALL_OLD"
}
# because the geohash is used to determine the item's partition key,
# you'll need the coordinates to delete the item as well
geo_point = dynamodbgeo.GeoPoint(37.802663456, -122.26916559)
# run the delete operation
geo_data_manager.delete_Point(
dynamodbgeo.DeleteItemInput(
geo_point,
"ITEM ID",
delete_parameters
)
)
Rectangular queries
In dynamodbgeo
, a rectangular query is a type of geospatial query used to retrieve all items within a defined rectangular area on a map. This area is specified by providing the latitude and longitude boundaries with two opposite corners (typically southwest and northeast). This type of query is commonly used for tasks like finding all points of interest (e.g., stores, restaurants, events) within a specific geographical area, such as a city block or a region on a map.
Parameters
southwest_geopoint (dynamodbgeo.GeoPoint)
: represents the bottom left coordinatesnortheast_geopoint (dynamodbgeo.GeoPoint)
: represents the top right coordinatesquery_parameters (dict)
: additional query parameters. Documentation can be found [here].
# Querying a rectangle
query_parameters = {
"FilterExpression": "City = :val1",
"ExpressionAttributeValues": {
":val1": {"S": "San Francisco"}
}
}
southwest_geopoint = dynamodbgeo.GeoPoint(25.609826, -130.749344)
norteast_geopoint = dynamodbgeo.GeoPoint(49.014814, -112.456158)
result = geoDataManager.queryRectangle(
dynamodbgeo.QueryRectangleRequest(
southwest_geopoint,
northeast_geopoint,
query_parameters
)
)
)
Radius Queries
A radius query in DynamoDBGeo retrieves all items within a specified distance from a central point (a geographic radius). This is useful for finding nearby locations, such as stores or events, based on a central point’s latitude and longitude.
Parameters:
center_point (dynamodbgeo.GeoPoint)
: the central point from which the radius is measured.radius_in_meters (float)
: the radius distance in meters.query_parameters (dict)
: additional query parameters. Documentation can be found [here].
center_point = dynamodbgeo.GeoPoint(37.7749, -122.4194) # Center point (San Francisco)
radius_in_meters = 5000 # 5 km radius
query_parameters = {
"FilterExpression": "City = :val1",
"ExpressionAttributeValues": {
":val1": {"S": "San Francisco"}
}
}
# Execute the radius query
result = geo_data_manager.queryRadius(
dynamodbgeo.QueryRadiusRequest(
center_point,
radius_in_meters,
query_parameters
)
)
Choosing a hash key length
DynamoDBGeo spreads geospatial data over multiple partitions by breaking the world into a grid (via geohashing) and using the hash key to determine the partition. This allows for efficient querying and retrieval of nearby points in a geographical area.
How the Hash Key Length Affects Partitioning:
- Short Hash Key (2-3 digits): Represents a larger geographic area, which means that more points will be stored under the same hash key, concentrating the data within fewer partitions. This could result in hotspotting if too many queries are targeting that same area.
- Long Hash Key (5-7 digits): Represents a smaller geographic area, meaning the geohash is more precise. Each hash key will cover a smaller area, so the data gets distributed across more partitions. This reduces the chance of overloading any single partition but comes at the cost of needing to query multiple partitions for larger area queries.
Trade-offs of Longer Hash Keys:
- Wider Distribution: Data will be spread across more partitions, which reduces the chance of overloading a single partition with too much traffic (important if you have dense data or high query rates in certain areas).
- Higher Read Costs: If your queries span multiple partitions (e.g., searching for points in a wide radius), you might incur higher read costs because DynamoDB needs to query multiple partitions to retrieve all relevant data.
Why a Longer Hash Key May Not Always Be Better:
- Sparse Data: If your data points are spread over a large area, a longer hash key might result in DynamoDB scanning many empty partitions during queries, increasing read capacity unit (RCU) usage without retrieving additional results.
- Query Size: If your typical query covers a large geographic area, a long hash key might result in querying multiple partitions to retrieve data, which can impact performance.
Example:
- For dense urban data (e.g., restaurants in New York City), a longer hash key (e.g.,
5-7
digits) will distribute points more evenly, ensuring that no single partition gets overloaded. - For sparse data (e.g., national parks across a country), a shorter hash key (e.g.,
2-3
digits) will prevent DynamoDB from querying too many empty partitions when searching across a large area.
Geohash cheat sheet
The granularity of the geohash depends on its character length. Each additional character refines the precision of the location. Here’s a breakdown:
- 1-2 characters: 5,000 km – large regions (continent-level)
- 3 characters: 630 km – country-level
- 4 characters: 78 km – regional-level
- 5 characters: 20 km – city-level
- 6 characters: 2.4 km – neighborhood-level
- 7 characters: 610 m – street-level
- 8 characters: 76 m – small-area level
- 9 characters: 19 m – building-level
- 10 characters: 2.4 m – room-level accuracy
- 11 characters: 0.6 m – sub-room level
- 12 characters: 0.07 m – an area as small as 7 cm