-
Notifications
You must be signed in to change notification settings - Fork 1
Merge trees to database #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- added a script to fetch the tables dynamically and pull schemas into files - added the table schema for individual tables
- updated README with database update instructions
- First iteration to create a table to hold all tree data - data was taken based on the fields pulled in the tree-source repo - added a merge stored procedure split into 3 different pieces of the upserts for performance - staging table is an unlogged table to prevent WAL writes - indexes added similar to the treedata table - migration script to move the treedata data into the tree table
zoobot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start to this!!
For now we probably should use for db serial id_source_staging or something to adhere to the convention of the other table's serials so as not to be confusing. We are using id as the id that get's passed around to the front end. In the tree-sources code base id is used as the city id(basically the city name) and also id for the treedata's tree id that we create(not the db's serialized version) Basically we are currently using the db's serial for nothing at all.
Are tree and tree_staging temporary tables?
If these are temporary tables, can we name them after the tables they are merging into?
Can you add an in depth commit description for this? :)
Also just putting this here for your perusal:
https://standards.opencouncildata.org/#/trees
We've deviated from it. We should maybe submit a PR requesting more fields...
| drop table if exists _tree; | ||
|
|
||
| create temporary table _tree as | ||
| SELECT t.id as id_tree, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id and id_tree are different. id is the id that we create with the tree-id repo, id_tree is the db serial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id_tree for the purposes of this stored procedure just let me know we are using the id that connects to the tree table, as opposed to the staging table. The difference is mainly used below for determining missing / matching data.
| create temporary table _tree as | ||
| SELECT t.id as id_tree, | ||
| ts.id as id_tree_staging, | ||
| ts.ref as id_reference, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id_reference is what we were using and then we moved it to ref because of the open data standards. Unfortunately it's special in react so ref kind of is confusing as an open data standard, imo.
| @@ -0,0 +1,86 @@ | |||
| -- drop table public.tree | |||
| CREATE TABLE public.tree ( | |||
| id bigserial NOT NULL primary key, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id_treedata_staging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an id based on one we create
| @@ -0,0 +1,86 @@ | |||
| -- drop table public.tree | |||
| CREATE TABLE public.tree ( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
treedata_staging
| SELECT 1; | ||
|
|
||
| -- drop table public.tree_staging; | ||
| CREATE UNLOGGED TABLE public.tree_staging ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tree_sources_staging
| country character varying(255), | ||
| neighborhood character varying(255), | ||
| health character varying(255), | ||
| dbh double precision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbh_min and dbh_max
|
HI @tzinckgraf Are you interested in working on this anymore or should I un-assign? |
This is the first iteration for keeping the tree data in the database.
This PR introduces two new tables,
treesandtrees_staging.The
treestable would ideally replace thetreedatatable. Thetreedatatable is currently a mix of time based data and reference data. There was a discussion around normalizing it further. However, we can also use the originaltreedatatable instead of the proposedtreetable with minimal updates. There is a migration file that shows how the data is interchangeable between those two.The
trees_stagingtable is a unlogged table. This is similar to a temporary table in that there is no write to the WAL. It differs in that the data is global. This means you can write with one connection and then use that data with another. A temporary table would need to maintain data and operations on the same connection, which is a problem when using something likeogr2ogrfor writing data.There is also a stored procedure that merges the two tables. This does the full upsert logic in three parts using a temporary table. The procedure is idempotent.