-
Notifications
You must be signed in to change notification settings - Fork 92
[RFR] use pgmigrate for schema migrations #759
Conversation
That way we can use the same
Maybe there's a better way to go about all of this, but to retain the current functionality / behaviour, you'd have to make sure of the following:
As for the
That's unfortunate. Let's skip it for now then.
PostgreSQL auth is a confusing mix between UNIX accounts and database's own ones so don't feel bad. Currently implemented docker run -it dockermediacloud/postgresql-server # defaults to ":latest" tagand then docker exec -it <container_id> psqlSo dunno, maybe run |
|
not sure how to go about testing this in production and extremely paranoid about doing so, but i think the PR is in decent shape now, and behaving as intended locally |
|
Thanks, will have a look soon. |
pypt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all of your work on this, and sorry for the 428342837th time for the delay.
Just some minor changes here and there, plus a single bug (migrations don't seem to work on second run of the database service in the container).
| cd /opt/mediacloud && pgmigrate -t latest migrate | ||
|
|
||
| # Dump schema file for reference in development | ||
| psql -v ON_ERROR_STOP=1 mediacloud -c '\! pg_dump mediacloud > /tmp/mediawords.sql' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
/tmpand/var/tmpcould be tmpfs filesystems mounted by Docker to the container; I think it's better to store the generated schema somewhere less temporary, e.g./; -
Instead of running
pg_dumpfrom withinpsql(psql's\!command just starts a shell), you can just runpg_dumpdirectly; -
By the way, by default
psqljust ignores errors that it encounters in the input SQL. For example, if you had the following SQL file:CREATE TABLE foo (name TEXT); blergh; CREATE TABLE bar (name TEXT);
and were to run
psql -d database_name -f that_file_with_a_typo_in_the_middle.sql, it wouldCREATE TABLE foo, complain about theblerghstatement and then happilyCREATE TABLE bar. This is something to be wary of when, for example, importing large dumps because one might end up with an incomplete imported dump.
pypt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single tiny revert please, plus update a bunch of docker-compose.tests.yml.
|
Edit: no, still volume problems; I'll keep investigating |
|
@pypt alrighty, only test failures are crimson hexagon-related, so i think this is good to go assuming it looks okay to you |
pypt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more quick find-and-replace, and we're good to go!
|
Amazinglymazing, thank you so much! |
|
Fixes #754. |
|
Hey @jtotoole @pypt any suggestions on how I can resolve this? |
From the comments, I see I can set |
Progress here: pgmigrate is successfully creating the db and running the migrations on container start. Some questions (apologies in advance that these reflect a shaky grasp of the existing system and are therefore likely dumb):
pgmigratedoes the work both of initializing the schema and applying migrations, so I'm thinking thatinitialize_schema.sh,apply_migrations.sh, andgenerate_empty_sql_migration.shcan perhaps be consolidated into one file. The challenge, then, is whether there's a way of running a check, like what's done here, to determine whether new migrations are necessary on container start. Do you have thoughts on that? Maybe running all the migrations every time is unavoidable? One thing thatpgmigratecan do is point to a specific migration number as opposed to applying them all at once (e.gpgmigrate -t 3 migrateto run up to migration 3), and it creates aschema_versiontable in the DB. So, maybe there's a way of scanning all the files in themigrationsfolder, identifying the filename with the highest number, and comparing that to the highest number in theschema_versionstable?--dryrunoption, which rolls back rather than committing, but it doesn't seem to actually log the SQL anywhere when that flag is set, so I'm not sure of the best way to output the pending SQL code without running it. I think, though, that all it's going to do each time is run the files in/migrationssequentially?pg_dumpto get the reference schema file. Based on the permissions errors I've been getting when attempting to run asrootandpostgres, as well as a read of the Postgres docs, it seems like I need to execute the command as themediacloudsuperuser. When I've tried that (specifically, executingpg_dump --dbname=mediacloud --username=mediacloudininitialize_db.sh), I get the error:pg_dump: [archiver (db)] connection to database "mediacloud" failed: FATAL: Peer authentication failed for user "mediacloud". Any thoughts on how to solve this one?Perhaps it's easiest to talk through this via Google Meet—lmk!