-
Notifications
You must be signed in to change notification settings - Fork 1
Description
AddressForAll recives work from municipalities and other institutions, that is, recive works from "original-data authorities". Is usual to not work with open standards, so, to avoid big barriers, the "AddressForAll reciver agent" will be accept into some minimal constraints:
-
The file is a valid MIME
text/csvas defined by RFC 7111. -
The file has a header line and authority define another or confirm that
,is the field separator. -
The authority say the number of lines and number of bytes of the file. And application that generated the file (e.g. MS-Excel, Autocad, etc.)
After a basic treatment at terminal, by unzip and dos2unix file.csv, check metadata and charset:
wc -l file.csvto confirm number of linesfile -i file.csvto check supposed charsetgrep -axv '.*' file.csvto check lines with strange characters
If no problem, jump to PostgreSQL ingestion, else the "AddressForAll reciver agent" decide if will edit and fix the problems (less tham half hour of work) or it is necessary to report problems and request a new file.
The most frequent fix procedure is to enforce UTF-8:
iconv -c -t UTF-8 < file.csv > new_file.csvAt PostgreSQL define minimal ingestion standards. For example for São Vicente (SP) used
CREATE or replace FUNCTION stable.br_housenumber(p text) RETURNS int AS $f$
SELECT CASE
WHEN x~ '^\d+$' THEN x::int
WHEN x~ '\d' THEN regexp_replace(x, '^[^\d]*(\d+).+$', '-\1')::int
ELSE NULL
END
FROM (select trim(p)) t(x)
$f$ LANGUAGE SQL IMMUTABLE;
COMMENT ON FUNCTION stable.br_housenumber(text)
IS 'Sanitize houseNumber by trim, on error changes to negative or NULL.';
CREATE FOREIGN TABLE ingest.brsp_tipo01 (
houseNumber text,
point_SRID29193 text
) SERVER files OPTIONS (
filename '/tmp/pg_io/mapa01_loteamentos-pontos-Oficial.csv',
format 'csv',
header 'true'
);
CREATE VIEW ingest.vw_brsp_tipo01 AS
SELECT stable.br_housenumber(housenumber) AS housenumber,
ST_PointFromText(point_SRID29193, 29193 ) AS geom
FROM ingest.brsp_tipo01
;