generate vector zonal stat features

Aggregations

In order to generate zonal stats for an area (or areas) of interest (aoi) , we have come up with the concept of an aggregation specification or agg spec, which is a way to specify what aggregation functions (such as count,sum, mean,std etc.) are to be applied to columns in the source dataframe (data).

The method create_zonal_stats can then take in a list of these agg specs and apply them to create zonal stats from the data for the aoi.

Each agg spec consists of a dict with the following keys:

  • func: (Required) a str or a list [str] of aggregation functions. See the pandas documentation for agg

  • column: (Optional) an existing column in the data to generate the zonal statistic from. If not specified, the grouping key based on the index of the aoi applied to the data is used as default.

  • output: (Optional) a str or a list [str] of the name(s) of the output zonal statistic column. If not specified it is concatenated from the column and func i.e. {column}_{func} (e.g. 'func':'mean' on 'column':'population' has a default value 'output':'population_mean')

  • fillna: (Optional) a bool or a list [bool] of the flag(s) that indicates whether to to a fillna(0) step for the new zonal column, True meaning it will set any NA values in the resulting zonal stat to 0, and False will retain any NA values. The default value of the flag(s) is False.

Examples

  • The simplest aggregation spec. This will result in an output column named index_count as it will use the aoi index as the default column.

    {"func":"count"}
  • The sum function is applied to the data column population which will create an output column named total_population.

{
 "func:"sum",
 "column": "population",
 "output": "total_population"
}
  • Compute the zonal stats mean,sum,max on the population column and rename the output columns (by default) to population_mean, population_sum and population_max.
{
 "func": ["mean","sum","max"],
 "column": "population",
}
  • A full aggregation spec with fillna. fillna == False for std means it will remain an NA if there is no data for the column in the group. The default value for fillna is True which means that 0 is used to replace any NA in the output column.
    {
    "func": ["mean", "sum", "std"],
    "column":"population",
    "output": ["avg_pop", "total_pop", "std_dev"],
    "fillna": [True,True,False],
    }

The agg spec in the list of aggregations can contain the same columns, but the output columns must be unique since they will added as columns in the results.

Regular and Grid Zonal Stats

Vector zonal stats for user defined areas and grids (e.g. Admin areas)

create_zonal_stats[source]

create_zonal_stats(aoi:GeoDataFrame, data:GeoDataFrame, aggregations:List[typing.Dict[str, typing.Any]], overlap_method:str='intersects')

Create zonal stats for area of interest from data using aggregration operations on data columns. Returns the same aoi with additional columns containing the computed zonal features.

Type Default Details
aoi GeoDataFrame Area of interest for which zonal stats are to be computed for
data GeoDataFrame Source gdf containing data to compute zonal stats from
aggregations typing.List[typing.Dict[str, typing.Any]] List of agg specs, with each agg spec applied to a data column
overlap_method str intersects spatial predicate to used in spatial join of aoi and data geopandas.sjoin for more details
simple_aoi  # sample aoi
col1 lat0 lon0 lat1 lon1 lat2 lon2 lat3 lon3 geometry
0 1 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1....
1 2 1.0 0.0 1.0 1.0 2.0 1.0 2.0 0.0 POLYGON ((1.00000 0.00000, 1.00000 1.00000, 2....
2 3 2.0 0.0 2.0 1.0 3.0 1.0 3.0 0.0 POLYGON ((2.00000 0.00000, 2.00000 1.00000, 3....
simple_data  # sample data
col1 lat lon geometry
0 1 0.50 0.50 POINT (0.50000 0.50000)
1 2 1.50 0.50 POINT (1.50000 0.50000)
2 3 2.50 0.50 POINT (2.50000 0.50000)
3 4 0.45 0.50 POINT (0.45000 0.50000)
4 5 1.45 0.50 POINT (1.45000 0.50000)
5 6 2.45 0.50 POINT (2.45000 0.50000)
6 7 0.45 0.45 POINT (0.45000 0.45000)
7 8 1.45 0.45 POINT (1.45000 0.45000)
8 9 2.45 0.45 POINT (2.45000 0.45000)
9 10 0.45 1.45 POINT (0.45000 1.45000)
10 11 1.45 1.45 POINT (1.45000 1.45000)
11 12 2.45 1.45 POINT (2.45000 1.45000)
ax = simple_aoi.plot(
    ax=plt.axes(), facecolor="none", edgecolor=["red", "blue", "green"]
)
ax = simple_data.plot(ax=ax, color="purple")
results = create_zonal_stats(simple_aoi, simple_data, aggregations=[{"func": "count"}])
results
col1 lat0 lon0 lat1 lon1 lat2 lon2 lat3 lon3 geometry index_count
0 1 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1.... 3
1 2 1.0 0.0 1.0 1.0 2.0 1.0 2.0 0.0 POLYGON ((1.00000 0.00000, 1.00000 1.00000, 2.... 3
2 3 2.0 0.0 2.0 1.0 3.0 1.0 3.0 0.0 POLYGON ((2.00000 0.00000, 2.00000 1.00000, 3.... 3

Index name is not none

named_index_aoi = simple_aoi.copy()
named_index_aoi.index.name = "myindex"
named_index_aoi
col1 lat0 lon0 lat1 lon1 lat2 lon2 lat3 lon3 geometry
myindex
0 1 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1....
1 2 1.0 0.0 1.0 1.0 2.0 1.0 2.0 0.0 POLYGON ((1.00000 0.00000, 1.00000 1.00000, 2....
2 3 2.0 0.0 2.0 1.0 3.0 1.0 3.0 0.0 POLYGON ((2.00000 0.00000, 2.00000 1.00000, 3....
named_index_results = create_zonal_stats(
    named_index_aoi, simple_data, aggregations=[{"func": "count"}]
)
named_index_results.head()
col1 lat0 lon0 lat1 lon1 lat2 lon2 lat3 lon3 geometry index_count
myindex
0 1 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1.... 3
1 2 1.0 0.0 1.0 1.0 2.0 1.0 2.0 0.0 POLYGON ((1.00000 0.00000, 1.00000 1.00000, 2.... 3
2 3 2.0 0.0 2.0 1.0 3.0 1.0 3.0 0.0 POLYGON ((2.00000 0.00000, 2.00000 1.00000, 3.... 3

Bing Map Tile Grid Zonal Stats

Generating zonal stats for Bing tile grid AOIs

compute_quadkey[source]

compute_quadkey(data:GeoDataFrame, zoom_level:int, inplace:bool=False, quadkey_column:str='quadkey')

Computes the quadkeys for the geometries of the data. If geometries are not points, the quadkeys are computed from the centroids of the geometries.

Type Default Details
data GeoDataFrame The geodataframe
zoom_level int The quadkey zoom level (1-23)
inplace bool False Whether to change data inplace or not
quadkey_column str quadkey The name of the quadkey output column

If our existing data geodataframe doesn't have quadkeys, we can use the compute_quadkey to generate the quadkeys for the centroid of the data's geometries.

%%time
DATA_ZOOM_LEVEL = 19
AOI_ZOOM_LEVEL = 9
simple_data_quadkey = compute_quadkey(simple_data, DATA_ZOOM_LEVEL)
CPU times: user 13.9 ms, sys: 12.2 ms, total: 26.1 ms
Wall time: 24 ms
simple_data_quadkey.head()
col1 lat lon geometry quadkey
0 1 0.50 0.5 POINT (0.50000 0.50000) 1222222221211211222
1 2 1.50 0.5 POINT (1.50000 0.50000) 1222222320210201222
2 3 2.50 0.5 POINT (2.50000 0.50000) 1222222331200311222
3 4 0.45 0.5 POINT (0.45000 0.50000) 1222222221210201333
4 5 1.45 0.5 POINT (1.45000 0.50000) 1222222320200311333

create_bingtile_zonal_stats[source]

create_bingtile_zonal_stats(aoi:DataFrame, data:DataFrame, aggregations:List[typing.Dict[str, typing.Any]], aoi_quadkey_column:str='quadkey', data_quadkey_column:str='quadkey')

Type Default Details
aoi DataFrame An aoi with quadkey column
data DataFrame Data with quadkey column
aggregations typing.List[typing.Dict[str, typing.Any]] List of agg specs, with each agg spec applied to a data column
aoi_quadkey_column str quadkey Column name of aoi quadkey
data_quadkey_column str quadkey Column name of data quadkey

To create bingtile zonal stats, we need to compute the quadkeys for the areas of interest (AOI) and the data.

The geowrangler.grids module provides a BingTileGridGenerator that will generate the quadkeys for the areas covered by your AOIs.

import geowrangler.grids as gr

bgtile_generator = gr.BingTileGridGenerator(AOI_ZOOM_LEVEL)
simple_aoi_bingtiles = bgtile_generator.generate_grid(simple_aoi)

Using the data with the computed quadkeys, we can generate zonal stats for our bingtile grid aois. This just uses the regular pandas grouping and merging function and skips any geospatial joins which results in faster computation.

%%time
bingtile_results = create_bingtile_zonal_stats(
    simple_aoi_bingtiles,
    simple_data_quadkey,
    aggregations=[dict(func="count", fillna=True)],
)
CPU times: user 42.5 ms, sys: 375 ┬Ás, total: 42.9 ms
Wall time: 37.6 ms
bingtile_results
quadkey geometry index_count
0 122222220 POLYGON ((0.00000 0.70311, 0.00000 1.40611, 0.... 0.0
1 122222222 POLYGON ((0.00000 0.00000, 0.00000 0.70311, 0.... 3.0
2 122222221 POLYGON ((0.70313 0.70311, 0.70313 1.40611, 1.... 0.0
3 122222223 POLYGON ((0.70313 0.00000, 0.70313 0.70311, 1.... 0.0
4 122222230 POLYGON ((1.40625 0.70311, 1.40625 1.40611, 2.... 0.0
5 122222232 POLYGON ((1.40625 0.00000, 1.40625 0.70311, 2.... 3.0
6 122222231 POLYGON ((2.10938 0.70311, 2.10938 1.40611, 2.... 0.0
7 122222233 POLYGON ((2.10938 0.00000, 2.10938 0.70311, 2.... 3.0
8 122222320 POLYGON ((2.81250 0.70311, 2.81250 1.40611, 3.... 0.0
9 122222322 POLYGON ((2.81250 0.00000, 2.81250 0.70311, 3.... 0.0

We can also use any bingtile grid for any zoom level lower than the data's zoom level

bgtile_generator10 = gr.BingTileGridGenerator(AOI_ZOOM_LEVEL + 1)
simple_aoi_bingtiles10 = bgtile_generator10.generate_grid(simple_aoi)
bingtile_results10 = create_bingtile_zonal_stats(
    simple_aoi_bingtiles10,
    simple_data_quadkey,
    aggregations=[dict(func="count", fillna=True)],
)
bingtile_results10[bingtile_results10.index_count > 0]
quadkey geometry index_count
4 1222222221 POLYGON ((0.35156 0.35156, 0.35156 0.70311, 0.... 3.0
13 1222222320 POLYGON ((1.40625 0.35156, 1.40625 0.70311, 1.... 3.0
19 1222222330 POLYGON ((2.10938 0.35156, 2.10938 0.70311, 2.... 2.0
22 1222222331 POLYGON ((2.46094 0.35156, 2.46094 0.70311, 2.... 1.0
ax = results.plot(ax=plt.axes(), column="index_count", edgecolor="blue", alpha=0.2)
ax = simple_data.plot(ax=ax, color="purple")
ax = bingtile_results.plot(ax=ax, column="index_count", edgecolor="black", alpha=0.4)
ax = bingtile_results10.plot(ax=ax, column="index_count", edgecolor="red", alpha=0.4)