Skip to content

Conversation

@bratter
Copy link
Contributor

@bratter bratter commented Feb 1, 2026

Addresses #2998.

Summary

Adds some features to clean-coords:

  • Added support for FeatureCollection and GeometryCollection types through a recursive call.
  • Improves consistency of structural sharing for both mutate options, and documents it (see note below).
  • Improved typscript type signature (see note below).
  • Adds the ability to pass an epsilon through to the booleanPointOnLine calls.
  • Changed the MultiPoint methodology from POJO to Set for substantial speedup.
  • Made some other minor cleanups - removing redundant checks, cleaning up the benchmarks file, etc.

Additional Notes

  1. I had originally suggested also changing the behavior such that output polygons with < 4 points should not throw but instead return a valid 4 Position polygon. Changed my mind given that these, while technically valid geojson, are not OGC Simple Feature valid polygons and could lead to downstream choking.
  2. Wanted to (a) maximize structural sharing to improve performance wherever possible, and (b) ensure we are clear about what gets preserved vs. not. The basic premise is that:
    • The geojson objects down to the Geometry are always reused when mutate=true and always new when mutate=false
    • All properties, id, bbox, are always reused irrespective of mutate value
    • Mutate=false is (as you would expect) guaranteed not to change any coordinates from the original, but otherwise, will reuse or create new or mutate if mutate=true with no specific guarantees
  3. The type signature improvement aims to be maximally informative in that it passes the geometry's generic through, requiring that the function always either outputs the same type it receives or throws. If this is too constraining then can back it off, but figured more type inference was better, and it matches the behavior.
  4. Benchmarks are variable but more got faster than slower, with the main impacts coming from the structural sharing changes. e.g., the biggest slowdown was in the single point test where we replace the object rather than return it. The better sharing tends to boost performance when few changes are made to larger geometries - i.e., more realistic cases.

Future Work

I do wonder if you would be interested in a follow up PR that adds a collapse option? Reason being I am concerned by the workflow of input > truncate > clean throwing errors for valid input. The collapse option would remove the offending geometry at the ring level rather than throw an error. Thoughts?


Please provide the following when creating a PR:

  • [x ] Meaningful title, including the name of the package being modified.
  • [ x] Summary of the changes.
  • [ x] Heads up if this is a breaking change.
  • [ x] Any issues this resolves.
  • [x ] Inclusion of your details in the contributors field of package.json - you've earned it! 👏
  • [ x] Confirmation you've read the steps for preparing a pull request.

- Added support for FeatureCollection and GeometryCollection types
- Improved GeoJSON type signatures, returning the same type that as
passed
- Added ability to pass epsilon through to booleanPointOnLine
- Ensured clarity on structural sharing and documented
- Removed some redundant checks
- Update benchmark results
- Fix test formatting
- Minor typescript enhancements
@bratter bratter marked this pull request as draft February 1, 2026 04:30
- Explict infer for better inference on union types
- Added overload equivalent to old type signature with deprecation
warning
@bratter
Copy link
Contributor Author

bratter commented Feb 1, 2026

After initially failed CI fixed the types to add better inference for union types and also put in a deprecated signature matching the old behavior to not break any build systems that might error on failed type resolution. This signature can be remove in v8.

@bratter bratter marked this pull request as ready for review February 1, 2026 22:20
@smallsaucepan
Copy link
Member

Great additions. Thank you @bratter!

Reckon we should move away from epsilon as a config parameter. I've used it (erroneously now I believe) in a couple of places where I'm sure there's a more relatable and accurate parameter for what we're trying to convey. precision or tolerance might be more meaningful? Thoughts?

Regarding mutation, if this model is used consistently throughout Turf, it might be worth adding a section on it to the website, so we can keep the per-function documentation as brief as possible, and cross reference instead.

Btw, our approach to mutation gives me the heebie-jeebies. To call

const f2 = turf.doSomething(f1);

and then find out changing f2.properties.whatever changes f1? Know it's a performance thing, though possibly worth documenting more clearly.

Can you please remind me about the collapse idea? Is it basically that truncating a geometry makes it more likely for points to become coincident and from there get cleaned beyond what's technically valid?

geojson: T,
options?: { mutate?: boolean; epsilon?: number }
): CleanCoordsResult<T>;
function cleanCoords(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this variation if as well as the one above it?

): any;
/**
* Removes redundant coordinates from any GeoJSON Geometry.
* Removes redundant coordinates from any GeoJSON Type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"any GeoJSON object" might be better here.

geojson: GeoJSON,
options?: { mutate?: boolean; epsilon?: number }
): GeoJSON;
/** @deprecated loosely typed version deprecated. Will be removed in next major version */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark as deprecated now but not necessarily commit to when it will be removed.

): Position[][] {
// Re-use the polygon's rings array when mutating to maximize sharing
// It would be possible to follow a similar approach to cleanLine and only
// create a new raings array when we know something has changed even for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "raings"

@smallsaucepan smallsaucepan changed the title @tirf/clean-coords: Add collection support and typescript fixes cleanCoords: Add collection support and typescript fixes Feb 2, 2026
@bratter
Copy link
Contributor Author

bratter commented Feb 2, 2026

Thanks for the review! Will get to this when I have a sec including responding to your queries.

@bratter
Copy link
Contributor Author

bratter commented Feb 4, 2026

Reckon we should move away from epsilon as a config parameter. I've used it (erroneously now I believe) in a couple of places where I'm sure there's a more relatable and accurate parameter for what we're trying to convey. precision or tolerance might be more meaningful? Thoughts?

Hmmmm. It seems that the term is also used differently in different places too. in the context of booleanPointOnLine epsilon is specifically targeted at avoiding floating point error in cross product === 0 comparison... It does not have an intuitive numerical interpretation for coordinates in degrees like "tolerance" implies. You could also argue that a small epsilon (like 1e-15) should be the default comparison anyway given floating point.

On reflection, the best approach might be to just drop it from the API here again (back to how it was), then consider a change in booleanPointOnLine as a separate issue where I might argue for a default epsilon. Will just remove from the options if you agree.

Btw, our approach to mutation gives me the heebie-jeebies.

True, but the alternatives are really only a deep clone on every function that mutates or immutable data structures... so maybe no alternative. If it were me, I'd have all coordinate mutation functions just mutate by default so that a user would be forced to assume mutation 100% of the time and they do a deep clone if want to preserve the original. Barring that, the current approach is probably best, and my being explicit about it is really just a codification of what was generally happening with some slight consistency fixes.

If you think this works, I can probably work through all the coordinate mutation methods and check/refine their mutation so they are all consistent, then do as you suggest and take out of the docs here. Let me know if you want to take out of docs now.

Can you please remind me about the collapse idea?

The two cases that I'd be trying to address is (a) someone passes in a valid polygon, truncates it, then runs cleanCoords. This could throw if the truncation means that the cleaning degenerates the polygon. (b) even if the original polygon passed in is 0 area, it might be nice to provide a friction-free way to clean them.

What I would propose for the collapse option is that LineStrings that end with only two co-incident points get dropped instead of passed through, and polygon rings that end up with 3 or fewer points get dropped instead of throwing. If this completely eliminates the geometry, then we can either return a null or a null geometry in the feature.

I think this provides a nice way of capturing degenerate geometries during cleaning in a consistent way without the user having to set up try...catch. I suspect many users would just drop the whole shape in such a block, but if, say, the issue is just a small hole in a polygon, you'd likely drop the whole polygon for something very easily fixed.

That may not have made sense... lol. I can make a repro if that'll help. Would likely do as another PR anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants