Rendered at 11:36:51 GMT+0000 (Coordinated Universal Time) with Vercel.
Stratoscope 22 hours ago [-]
One task where GeoJSON falls down is simplification of a group of polygons with common boundaries, e.g. the 48 conterminous US states. If you start with a highly detailed set of polygons, you need to simplify them for practical display in an online map.
GeoJSON doesn't encode the fact that the boundary points are common between adjacent polygons. When you simplify those polygons, each one is handled separately, and you end up with "slivers" where the boundaries are misaligned:
TopoJSON solves this by encoding each such boundary only once. So when you simplify the polygons, they are all done together, and the same simplification applies to adjacent polygons. No more slivers!
Is this actually GeoJSON falling down, or decades of convention extended to JSON? Topology is great, but it is sidestepped by Shapefile/WKT/WKB/etc, in favor of independent primitives like POINT, LINE, POLYGON. If GeoJSON did not exist as a new JSON GIS data format encoding these primitives, TopoJSON would not have "replaced" it, due to the added mis-match with other non-topological formats.
From what I can tell, the top criticism of GeoJSON is the under-enforced winding order specification, and crossing the antemeridian.
jvanderbot 21 hours ago [-]
Right. Encoding a union algorithm into the data structure just introduces the reverse problem: Selecting a subset now requires extra logic beyond jq.
Stratoscope 17 hours ago [-]
Similarly, typical map APIs like the Google Maps API accept GeoJSON and not TopoJSON. I was not suggesting TopoJSON as a replacement for GeoJSON, but as a complement to it. With the tools on the TopoJSON GitHub, you can have GeoJSON input and output, but convert to TopoJSON for the simplification step to avoid the "slivers" problem.
pramsey 20 hours ago [-]
GeoJSON is not TopoJSON. Saying that is "falling down" is like criticizing a zebra for not being a giraffe. GeoJSON is a mapping of the (non-topological) "simple features" model into JSON, full stop. It does that fine.
Stratoscope 17 hours ago [-]
Yes, the same "slivers" problem occurs when you try to simplify features in any format that uses individual polygons, such as shapefiles or whatnot. That's the only case I was referring to.
I don't think I would trust a zebra or a giraffe for this task either.
echoangle 20 hours ago [-]
How is that a geojson problem? If your dataset is correct, adjacent borders will just use the same points and will match exactly.
sdenton4 20 hours ago [-]
The problem is simplification. Suppose two regions share a border with some nonlinear points a, b, c, d. Simplifying the polygon for the first region might yield a, b, d while the second yield a, c, d. This creates gaps or overlaps between the two regions.
qurren 19 hours ago [-]
But what is the border? Set the border to what it actually is, not a simplification of it. The state of Colorado is formally a 697 sided polygon, don't simplify it to a rectangle.
tomrod 19 hours ago [-]
This is not what OP is describing. It is very common to simplify objects for decreasing boundary objects by orders of magnitude. GeoJSON is missing correlation when you do that. Simplifying country objects from a GeoJSON source could lead to a gap between the country borders. So you either have poor representation or a longer pipeline to convert objects to an amenable object set. It also breaks idempotency in some regards.
echoangle 19 hours ago [-]
To do the simplification, you detect shared borders, simplify and generate polygons again.
That doesn’t make topojson inherently superior. You can convert back and forth and for many applications geojson is easier to process.
Stratoscope 16 hours ago [-]
Yes, you could write code to do that. Or use the utilities provided in the TopoJSON GitHub and let them do it for you: convert to TopoJSON, simplify, convert back to GeoJSON. They have already written all the code for you.
It depends on what purpose you are using the polygons. In an online map you need to simplify way down. Consider these Colorado maps at two different zoom levels:
Even the one zoomed in on the state appears to use maybe 15-20 vertices max.
In the second one, if I squint real hard I can just barely make out one slight dogleg on the western border and one on the south. And that is partly because I knew to look for them in the zoomed-in map.
If we use, say, the Census TIGER/Line boundary definitions for the states, we are probably talking about hundreds of thousands of vertices, perhaps millions. You won't be using those in an online map without simplifying.
AlotOfReading 18 hours ago [-]
The Texas border with Mexico is formally down the centerline of the Rio Grande, even as the river moves (ignoring fiddly complications). Even if you could somehow take a perfect snapshot of it at a given time, you'd run into the coastline paradox when sampling it.
echoangle 19 hours ago [-]
So don’t simplify the shapes on their own. Geojson is a storage and exchange format, you can still convert it to other formats if you want to modify it.
rented_mule 13 hours ago [-]
I think what the original comment is pointing out is that GeoJSON lacks a concept of a shared boundary. Shared boundaries expressed in GeoJSON get around that by duplicating data. Whenever data is duplicated, there's a risk that the copies will not be exactly the same. That makes the task of modification more challenging given that the real world is full of messy data, like duplicates not matching.
20-25 years ago I worked a lot with map data from otherwise high quality, and sometimes authoritative, sources like the USGS and NOAA that had this non-identical shared boundaries problem (in formats other than GeoJSON). If the format doesn't allow such mistakes to be expressed, then they have to fix their data to publish it in said format.
echoangle 13 hours ago [-]
Sure, but not every format is useful for everything. Geojson is great if you want a simple way to express a shape to show on a map. It’s like criticizing CSV because people put strings in choice value fields instead of doing a foreign key to another table. That’s just not what the format is used for.
rented_mule 10 hours ago [-]
I'd take your point further... No format is useful for everything. But we have to be aware of the trade-offs of each format (or language or tool or ...) in order to make the right choice of what to use for a given use case. We do that by sharing knowledge of where a given tool succeeds and where it falls down. Pointing out something a format doesn't handle well is not condemning that format for all use cases (I happily choose GeoJSON over other formats for many things).
NelsonMinar 18 hours ago [-]
I like TopoJSON and have used it in projects. But it's weird to set it up as opposition to GeoJSON. It's a complement. GeoJSON is a general data format meant to replace uses of ESRI Shapefiles and other complex formats. TopoJSON is more of a solution for a particular application need.
Is there much work developing or using TopoJSON these days? I haven't seen much about it in a few years.
Stratoscope 17 hours ago [-]
To be clear, I'm not suggesting TopoJSON as an alternative to GeoJSON. I like GeoJSON and was loosely involved with the working group that created and updated its spec.
I'm just saying that for the specific task I mentioned GeoJSON or any format such as shapefiles that store polygons individually naturally leads to the "sliver" problem.
A nice processing pipeline is:
1. Convert GeoJSON to TopoJSON.
2. Run the simplification on the TopoJSON.
3. Convert the resulting TopoJSON back to GeoJSON.
The TopoJSON GitHub has tools for each of these steps.
Waterluvian 23 hours ago [-]
I’ve applied GeoJSON (among many other GIS tech) for mapping and monitoring tens of thousands of warehouse robots. It works great as long as you squint just a bit, ignoring that it generally calls for long,lat and is designed with the assumption of a world CRS.
The dangerous part is that some tools fully assume this and will completely screw with calculations if you’re assuming a flatland CRS. So you’ve got to be careful in checking and setting those parameters.
One nice thing is that the structure of GeoJSON works incredibly well in typescript. It has discriminated unions built in so you can walk entire geodatasets in a pretty comfortable way.
sam_lowry_ 23 hours ago [-]
> tens of thousands of warehouse robots
Sounds like Amazon
Waterluvian 23 hours ago [-]
Emphatically not Amazon. Yuck.
DarkNova6 22 hours ago [-]
I’ve had nothing but problems using GeoJson. The specification has limitations everywhere and doesn’t even support z + m values at the same time.
But thankfully there is also the SQLite backed GeoPackage, which is not only more flexible but also much smaller. It takes some extra steps to get testing teams working due to it’s binary nature, but other than that it is the best format in geospatial data analysis.
Long live SQLite!
sureglymop 4 hours ago [-]
I'm glad there are sqlite backed file formats in that space. Having that said, they're not always the ideal choice.
For example, for map tiles mbtiles (sqlite) files can be used. In many applications though, pmtiles files are better because they allow for http range requests.
cr125rider 21 hours ago [-]
Made by Sean Gillies and a few others. Back when mapbox was doing all sorts of great open source stuff. Legends
GeoJSON is super useful. At Getcho (delivery, logistics) we use zip code GeoJSON encodings to draw polygons on zone maps and quickly generate rates. This has been a persistently annoying thing to do until we discovered this format. If you're curious, someone made a repo with all the 2010 census zips a while back [0].
About 25% of ZIP codes don't have a corresponding Census Bureau ZCTA, for example 10118. Do you end up needing special handling for those cases? Or has it not yet come up in practice?
nobleach 22 hours ago [-]
We used this extensively when I worked in this space (2010 - 2014). My favorite addition was using https://github.com/topojson/topojson to add arcs. That cut down on quite a bit of points to represent curves.
jtbaker 22 hours ago [-]
Dang, fun memories of when I was first getting in to geo/data stuff and doing a lot of web mapping stuff with D3, Leaflet and friends. Seems as tools like Vector tiles/PMTiles have supplanted topojson for a lot of visualization oriented use cases.
nobleach 20 hours ago [-]
I'm gonna have to dive into a rabbit-hole! I was working on an ESRI Shapefile to GeoJson converter back in those days. But D3 and Leaflet were such cool tech! MapBox too. Linking SagaGIS with PostGIS to do pre/post wildfire analysis was my jam.
ragebol 23 hours ago [-]
Have been using GeoJSON, very handy and human-readable, but we recently switched to GeoPackage files, as it allows for different layers, each with a different schema for additional data.
GeoPackages also allow to set a proper CRS, which is not as easy in GeoJSON IIRC.
Interesting but, IMO, probably one of the worst uses of JSON. The data you would want to consume is already not "human readable" so it instead introduces a lot of bloat for really no benefit.
If you have a non-insignificant amount of data points to track this is going to eat just a ton of memory while also being pretty slow to encode/decode.
Imagine, for example, if we encoded this as a binary. First 2 bytes for the feature type, second 2 bytes for the geometry type, 3 bytes for a fixed point x, 3 bytes for a fixed point y, and you could optionally provide the properties as a json blob in a trailing string. That's 10 bytes for all the coordinate stuff. Less bytes than what currently stores the `"type": "Feature"` string.
doginasuit 20 hours ago [-]
Do you mean geocoordinates when you say not human readable? Those are obviously at the heart of geospatial information but there is quite a bit more to the spec that does benefit from being human readable, and I'd include longitude/latitude among them. There are also solutions like cbor which allow them to be transferred and decoded/encoded from binary. For performance critical data you can also use something like protobuf, but it would be a huge pain to handle everything that way. Json is a great choice as a general spec.
morganherlocker 15 hours ago [-]
> If you have a non-insignificant amount of data points to track this is going to eat just a ton of memory while also being pretty slow to encode/decode.
This is a fair critique, however, for any large GeoJSON, the coordinate arrays will dominate the size. I think it's also safe to assume this data will be gzipped at rest and over the wire, which will eliminate most of the "header" metadata size you mention. As you point out, it would be much more efficient to have a binary format, and there are good examples like these, that are ~2-3x smaller in benchmarks:
That said, I think GeoJSON should be compared against other human readable formats like KML, which has a lot of wasted space as well, while being more difficult to read/write.
17 hours ago [-]
dinkumthinkum 6 hours ago [-]
This is just pretty wrong. Sure, geojson can be bloated but it is not for "no benefit." It is a very popular format and it is easy to encode and decode, even if it is slow for large data. It is more for sharing than long term storage. Take a site like below, it is very convenient to render json this way.
GeoJSON is not just for geographical features! Shapes of any kind work just as well.
QuPath[1], a tool for digital pathology whole slide image analysis, can export annotations in GeoJSON format (and import too I suppose).[2] This makes it really very easy to make annotations transportable between tooling.
> The coordinate reference system for all GeoJSON coordinates is a
geographic coordinate reference system, using the World Geodetic
System 1984 (WGS 84) [WGS84] datum, with longitude and latitude units
of decimal degrees.
So that seems to be a misuse of the format. Using a geojson library for this may get you into trouble with ranges or antimeridian cutting.
sam_lowry_ 24 hours ago [-]
Dunno whose website this is, but the format itself is great, and it allows for a relatively compact and relatively human-readable presentation.
A few weeks ago I (vibe)coded mxmap.be and if not for the ubiquity of geojson, it would have taken me significantly more effort.
rippeltippel 23 hours ago [-]
Nice work with mxmap. It's a very good way to appreciate to what extent EU depends on US - email providers being just one of several dimensions.
biosboiii 22 hours ago [-]
I love GeoJSON :) You can bring any Geo/GIS from 0 to visualization by just parsing it into GeoJSON.
geojson.io is a great editor/viewer by Mapbox. Also https://kepler.gl/demo is great for additional filtering, visualizations like heatmaps, arcs etc.
A extension to GeoJSON that works with JSONL-like semantics would be great for huge files, but this could also be solved by tiling.
CamouflagedKiwi 23 hours ago [-]
This is nice. I haven't worked with GIS data for ages but I really like the idea of a well-understood plain text container for it. Much nicer than wrangling with binary formats like shapefiles, especially when something goes wrong and you're not sure if it's your code (well more precisely your usage of whatever library you've got for it) or the data.
kitd 18 hours ago [-]
There's a map facility not linked here that allows you to build GeoJSON graphically:
One should be aware that Google, even though JSON is JSON, would sometimes use its own binary encoding for the content of polylines and generally large sets.
dnnddidiej 22 hours ago [-]
Looks like what any sensible dev would come up with if asked to "return this geo data as json". I like simple!
mtucker502 23 hours ago [-]
The properties key is plural but contains a dictionary. Does the schema allow for this to be a list?
trgn 22 hours ago [-]
nice and simple, great. but because it's json, most parsers are horribly inefficient, which is tough, because a lot of geodata is massive.
jeffbee 21 hours ago [-]
JSON parsing is probably one of the most thoroughly optimized subsystems in the whole industry at this point. Obviously there are ways to encode the same data that are easier to parse (e.g. instead of absolute floating point coordinates, use integer deltas along a path in some reasonable CRS) but because this inefficient representation is so common and so long-standing the parsing is faster than people think.
trgn 21 hours ago [-]
the default parsers all load the entire thing in mem, which is not good.
so you need a stream-based parser, which nobody does an effort to write/use for json. especially since geojson is a web format, and people just default to json.parse, which is blocking. and even then, even if you did use the custom one, it likely won't be a geojson-tailored one, so because key-order isn't guaranteed, any parser for geo-json will need to do some acrobatics to finding the reference-system, dealing with arbitrarily nested geometries etc..
it's a good format for what it is, but it's not a great geo-format. a geo format needs to be easily scannable and, even better, have a geometry index to be able to seek quickly.
kbolino 15 hours ago [-]
For whatever it's worth, you don't have to write anything special to handle the reference system, because the final, published version of RFC 7946 only allows WGS84 anyway.
vortegne 23 hours ago [-]
Recently I got into cartography software for a bit and the horrors of the data formats in this industry are real. Feels like everyone under the sun has their own.
All that said, GeoJSON was a great change of pace, I enjoyed using it. While I'm no professional and have no idea what the professional needs are, it was very good for my hobbyist needs.
Demiurge 21 hours ago [-]
Also, JSON! Wow.
cyberax 17 hours ago [-]
To add a bit of negative here: the format is incredibly inefficient in JS, because each point gets expanded into a full-blown JavaScript object.
You can save a lot of RAM by using an array of interleaved coordinates. For an additional bonus, you can also compress rings by storing the ring offsets inside a larger array.
GeoJSON doesn't encode the fact that the boundary points are common between adjacent polygons. When you simplify those polygons, each one is handled separately, and you end up with "slivers" where the boundaries are misaligned:
https://www.bing.com/images/search?q=map+slivers+betwen+poly...
TopoJSON solves this by encoding each such boundary only once. So when you simplify the polygons, they are all done together, and the same simplification applies to adjacent polygons. No more slivers!
https://github.com/topojson/topojson
https://github.com/topojson/topojson-simplify
From what I can tell, the top criticism of GeoJSON is the under-enforced winding order specification, and crossing the antemeridian.
I don't think I would trust a zebra or a giraffe for this task either.
https://maps.app.goo.gl/JH93ko96QcoLXuBJ9
https://maps.app.goo.gl/au53iTnsmNdFuEZV8
Even the one zoomed in on the state appears to use maybe 15-20 vertices max.
In the second one, if I squint real hard I can just barely make out one slight dogleg on the western border and one on the south. And that is partly because I knew to look for them in the zoomed-in map.
If we use, say, the Census TIGER/Line boundary definitions for the states, we are probably talking about hundreds of thousands of vertices, perhaps millions. You won't be using those in an online map without simplifying.
20-25 years ago I worked a lot with map data from otherwise high quality, and sometimes authoritative, sources like the USGS and NOAA that had this non-identical shared boundaries problem (in formats other than GeoJSON). If the format doesn't allow such mistakes to be expressed, then they have to fix their data to publish it in said format.
Is there much work developing or using TopoJSON these days? I haven't seen much about it in a few years.
I'm just saying that for the specific task I mentioned GeoJSON or any format such as shapefiles that store polygons individually naturally leads to the "sliver" problem.
A nice processing pipeline is:
1. Convert GeoJSON to TopoJSON.
2. Run the simplification on the TopoJSON.
3. Convert the resulting TopoJSON back to GeoJSON.
The TopoJSON GitHub has tools for each of these steps.
The dangerous part is that some tools fully assume this and will completely screw with calculations if you’re assuming a flatland CRS. So you’ve got to be careful in checking and setting those parameters.
One nice thing is that the structure of GeoJSON works incredibly well in typescript. It has discriminated unions built in so you can walk entire geodatasets in a pretty comfortable way.
Sounds like Amazon
But thankfully there is also the SQLite backed GeoPackage, which is not only more flexible but also much smaller. It takes some extra steps to get testing teams working due to it’s binary nature, but other than that it is the best format in geospatial data analysis.
Long live SQLite!
For example, for map tiles mbtiles (sqlite) files can be used. In many applications though, pmtiles files are better because they allow for http range requests.
https://github.com/sgillies
[0] https://github.com/OpenDataDE/State-zip-code-GeoJSON/blob/ma... although you can generate newer versions from the last census.
GeoPackages also allow to set a proper CRS, which is not as easy in GeoJSON IIRC.
Getting your CRSes wrong is fun...
If you have a non-insignificant amount of data points to track this is going to eat just a ton of memory while also being pretty slow to encode/decode.
Imagine, for example, if we encoded this as a binary. First 2 bytes for the feature type, second 2 bytes for the geometry type, 3 bytes for a fixed point x, 3 bytes for a fixed point y, and you could optionally provide the properties as a json blob in a trailing string. That's 10 bytes for all the coordinate stuff. Less bytes than what currently stores the `"type": "Feature"` string.
This is a fair critique, however, for any large GeoJSON, the coordinate arrays will dominate the size. I think it's also safe to assume this data will be gzipped at rest and over the wire, which will eliminate most of the "header" metadata size you mention. As you point out, it would be much more efficient to have a binary format, and there are good examples like these, that are ~2-3x smaller in benchmarks:
https://flatgeobuf.org/ https://github.com/mapbox/geobuf
That said, I think GeoJSON should be compared against other human readable formats like KML, which has a lot of wasted space as well, while being more difficult to read/write.
https://geojson.io
QuPath[1], a tool for digital pathology whole slide image analysis, can export annotations in GeoJSON format (and import too I suppose).[2] This makes it really very easy to make annotations transportable between tooling.
[1] https://qupath.github.io/
[2] https://github.com/qupath/qupath-docs/blob/main/docs/advance...
> The coordinate reference system for all GeoJSON coordinates is a geographic coordinate reference system, using the World Geodetic System 1984 (WGS 84) [WGS84] datum, with longitude and latitude units of decimal degrees.
So that seems to be a misuse of the format. Using a geojson library for this may get you into trouble with ranges or antimeridian cutting.
A few weeks ago I (vibe)coded mxmap.be and if not for the ubiquity of geojson, it would have taken me significantly more effort.
geojson.io is a great editor/viewer by Mapbox. Also https://kepler.gl/demo is great for additional filtering, visualizations like heatmaps, arcs etc.
A extension to GeoJSON that works with JSONL-like semantics would be great for huge files, but this could also be solved by tiling.
https://geojson.io/#map=12.42/51.50593/-0.13003
https://vega.github.io/vega-lite/docs/geoshape.html
so you need a stream-based parser, which nobody does an effort to write/use for json. especially since geojson is a web format, and people just default to json.parse, which is blocking. and even then, even if you did use the custom one, it likely won't be a geojson-tailored one, so because key-order isn't guaranteed, any parser for geo-json will need to do some acrobatics to finding the reference-system, dealing with arbitrarily nested geometries etc..
it's a good format for what it is, but it's not a great geo-format. a geo format needs to be easily scannable and, even better, have a geometry index to be able to seek quickly.
All that said, GeoJSON was a great change of pace, I enjoyed using it. While I'm no professional and have no idea what the professional needs are, it was very good for my hobbyist needs.
You can save a lot of RAM by using an array of interleaved coordinates. For an additional bonus, you can also compress rings by storing the ring offsets inside a larger array.