You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support geospatial joins to a table of stored polygons.
We're investigating clickhouse (which has been awesome!) for a geo heavy product where we'd like to be able to aggregate a collection of rows with (lng,lat) points to a number of selected/filtered polygons. Since pointInPoly only works for polygon constants, I've seen some folks either chaining checks with ORs or UNION ALLs with different polygon constants. We have too many polygons for this to work well, and sometimes may want to "group" on other polygon metadata (ex: internal id), which would require the UNION ALL approach, with additional hard coded "select" columns. For our use case, we'd like to join to a geo table with stored polygons. For example:
createtablepoints (lng Float64, lat Float64) engine = MergeTree() order by (lng, lat);
insert into points (lng, lat)
values
(1, 1),
(2.5, 2.5),
(7.5, 7.5),
(10, 10)
;
-- Specifying Array(Tuple(Float64, Float64)) will automatically get decomposed into two separate array columns (geom.0-- and geom.1), so instead give better names. This decomposition does not happen for 3d+ arrays.createtablegeoms (id text, geom_lng Array(Float64), geom_lat Array(Float64)) engine = MergeTree() order by (geom_lng, geom_lat);
insert into geoms (id, geom_lng, geom_lat)
values ('test shape', array(0, 5, 5, 0, 0), array(0, 0, 5, 5, 0));
select id, arrayZip(geom_lng, geom_lat) from geoms;
-- ┌─id─────────┬─arrayZip(geom_lng, geom_lat)────┐-- │ test shape │ [(0,0),(5,0),(5,5),(0,5),(0,0)] │-- └────────────┴─────────────────────────────────┘-- Note that the polygon matches the arrayZip output above (but added type functions calls)select lng, lat from points where pointInPolygon(tuple(lng, lat), array(tuple(0,0), tuple(5, 0), tuple(5, 5), tuple(0, 5), tuple(0, 0)));
-- ┌─lng─┬─lat─┐-- │ 1 │ 1 │-- │ 2.5 │ 2.5 │-- └─────┴─────┘-- CH doesn't support non-equality join clauses, so imitate pointInPolygon inner join with cross join + where. In-- reality, we'd still probably inner join on some containing geo cell index (s2, h3, whatever) with further-- pointInPolygon filtering below.select lng, lat
from points
cross join (
select arrayZip(geom_lng, geom_lat) as geom from geoms where id in ('test shape')
)
where pointInPolygon(tuple(lng, lat), geom);
-- Received exception from server (version 20.1.3):-- Code: 44. DB::Exception: Received from <ip>:<port>. DB::Exception: Argument 2 for function pointInPolygon must be constant array of tuples..
If this support was added, the current pointInPolygon structure of taking the polygon exterior as the second argument and holes as additional arguments wouldn't be suitable for column storage, as there'd be no way to expand the hole polygons. Instead, a "Polygon" column structure like Array(Array(Tuple(Flaot64, Float64))) would be more flexible and match geojson (geo types appear to be on the extended roadmap). Here's a geojson shape with a hole for example:
I know we'll generally want to avoid joins and pre-calculate things like this, but that's not feasible because we store upwards of 20 different geometry layers (ex: neighborhoods vs cities vs states, etc) and are starting to supporting custom (created real time) geometry aggregation.
Describe the solution you'd like
Support for pointInPolygon with non-constant values, ideally exact column references (ie: no need for arrayZip(lngs, lats)). Column references would need to support holes, so could be a 3-dimensional array. Supporting non-equality joins (join geo on pointInPolygon(pnt, geom)) is out of scope for this request.
Describe alternatives you've considered
The only current alternatives are manually selecting out all of the geometry shapes we want to query and creating Clickhouse SQL strings for each one to use in pointInPolygon queries, either separated by OR or in multiple UNION ALL queries (if we need to select other geometry metadata). These are not desirable because:
we need to fetch the geometries client side (may be quite large to transport)
construct SQL representation of these geometries (ugly)
build very large/duplicated queries (hit query size/length limits, hard to work with)
Obviously the geo aspects of our system are a great fit for postgres+postgis, but the nature of our queries is entirerly aggregation based, hence our interest in Clickhouse. 😁 As the majority of our spatial needs can be pre-processed, we primarily need pointInPolygon for the realtime aggreagation. Even so, it might be worth trying to imitate parts of the SQL/MM Spatial standard for easier interop with other databases, like postgis or bigquery - in that scenario this could be something like ST_Covers with limited type support for now (only polygon as first arg, point as second).
The text was updated successfully, but these errors were encountered:
Use case
Support geospatial joins to a table of stored polygons.
We're investigating clickhouse (which has been awesome!) for a geo heavy product where we'd like to be able to aggregate a collection of rows with (lng,lat) points to a number of selected/filtered polygons. Since
pointInPoly
only works for polygon constants, I've seen some folks either chaining checks withOR
s orUNION ALL
s with different polygon constants. We have too many polygons for this to work well, and sometimes may want to "group" on other polygon metadata (ex: internalid
), which would require theUNION ALL
approach, with additional hard coded "select" columns. For our use case, we'd like to join to ageo
table with stored polygons. For example:If this support was added, the current pointInPolygon structure of taking the polygon exterior as the second argument and holes as additional arguments wouldn't be suitable for column storage, as there'd be no way to expand the hole polygons. Instead, a "Polygon" column structure like
Array(Array(Tuple(Flaot64, Float64)))
would be more flexible and match geojson (geo types appear to be on the extended roadmap). Here's a geojson shape with a hole for example:I know we'll generally want to avoid joins and pre-calculate things like this, but that's not feasible because we store upwards of 20 different geometry layers (ex: neighborhoods vs cities vs states, etc) and are starting to supporting custom (created real time) geometry aggregation.
Describe the solution you'd like
Support for pointInPolygon with non-constant values, ideally exact column references (ie: no need for
arrayZip(lngs, lats)
). Column references would need to support holes, so could be a 3-dimensional array. Supporting non-equality joins (join geo on pointInPolygon(pnt, geom)
) is out of scope for this request.Describe alternatives you've considered
The only current alternatives are manually selecting out all of the geometry shapes we want to query and creating Clickhouse SQL strings for each one to use in
pointInPolygon
queries, either separated byOR
or in multipleUNION ALL
queries (if we need to select other geometry metadata). These are not desirable because:Example `OR` queries
1 geom:
100 Chicago census blocks:
Obviously the geo aspects of our system are a great fit for postgres+postgis, but the nature of our queries is entirerly aggregation based, hence our interest in Clickhouse. 😁 As the majority of our spatial needs can be pre-processed, we primarily need
pointInPolygon
for the realtime aggreagation. Even so, it might be worth trying to imitate parts of the SQL/MM Spatial standard for easier interop with other databases, like postgis or bigquery - in that scenario this could be something likeST_Covers
with limited type support for now (only polygon as first arg, point as second).The text was updated successfully, but these errors were encountered: