Skip to content
This repository was archived by the owner on Jan 5, 2026. It is now read-only.
This repository was archived by the owner on Jan 5, 2026. It is now read-only.

Localization for geocoding #221

@AbelVM

Description

@AbelVM

We may have different names for the same place even in the official language(s) of the country, having a strict match to geocode leads to many fails and "holes" in the results. V.g.:

  • Gipuzkoa <> Guipúzcoa
  • Xaló <> Jalón
  • A Coruña <> La Coruña
  • JAEN <> Jaén

As of today, loading a CSV with provinces of Spain may produce several holes always due to accents (accents in uppercase are not compulsory, so JAEN <> Jaén), optional articles, and the different co-official languages in different regions.

Maybe we should make use of fuzzy search like tsvector or trigrams

Tsvector sample pseudocode:

SELECT 
the_geom
FROM
geometries_table
ORDER BY
ts_rank(to_tsvector(storedname),to_tsquery(inputname))
ASC
LIMIT 1;

It would be much faster if we precompute a tsvector column in the geometries table.

More comments about this at: CartoDB/dataservices-api#251

cc @ethervoid

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions