[RFR] use pgmigrate for schema migrations #759

jtotoole · 2021-01-24T19:01:13Z

Progress here: pgmigrate is successfully creating the db and running the migrations on container start. Some questions (apologies in advance that these reflect a shaky grasp of the existing system and are therefore likely dumb):

pgmigrate does the work both of initializing the schema and applying migrations, so I'm thinking that initialize_schema.sh, apply_migrations.sh, and generate_empty_sql_migration.sh can perhaps be consolidated into one file. The challenge, then, is whether there's a way of running a check, like what's done here, to determine whether new migrations are necessary on container start. Do you have thoughts on that? Maybe running all the migrations every time is unavoidable? One thing that pgmigrate can do is point to a specific migration number as opposed to applying them all at once (e.g pgmigrate -t 3 migrate to run up to migration 3), and it creates a schema_version table in the DB. So, maybe there's a way of scanning all the files in the migrations folder, identifying the filename with the highest number, and comparing that to the highest number in the schema_versions table?
It seems important to validate whether the ZFS backup snapshot works,, however I don't exactly grok what's happening in that code block. How do you think that check might be incorporated into the new system?
You noted in the issue:

It would be nice to have (retain) to have some way of applying any given migrations manually before deploying the rebuilt postgresql-server service.... If the migration tool would be able to print out SQL that would be run on the live database instead of the tool insisting that it has to run the SQL itself, that would be pretty great!

^pgmigrate has a --dryrun option, which rolls back rather than committing, but it doesn't seem to actually log the SQL anywhere when that flag is set, so I'm not sure of the best way to output the pending SQL code without running it. I think, though, that all it's going to do each time is run the files in /migrations sequentially?
I'm considering the best way to run pg_dump to get the reference schema file. Based on the permissions errors I've been getting when attempting to run as root and postgres, as well as a read of the Postgres docs, it seems like I need to execute the command as the mediacloud superuser. When I've tried that (specifically, executing pg_dump --dbname=mediacloud --username=mediacloud in initialize_db.sh), I get the error: pg_dump: [archiver (db)] connection to database "mediacloud" failed: FATAL: Peer authentication failed for user "mediacloud". Any thoughts on how to solve this one?

Perhaps it's easiest to talk through this via Google Meet—lmk!

pypt · 2021-01-25T22:06:23Z

I'm thinking that initialize_schema.sh, apply_migrations.sh, and generate_empty_sql_migration.sh can perhaps be consolidated into one file

generate_empty_sql_migration.sh is a script that creates a bunch of boilerplate for a new migration, so given that there's not going to be any boilerplate to wrap the actual migration in, I don't think it's necessary to have that script anymroe.

initialize_schema.sh preloads /var/lib/postgresql/ with a empty schema. We employ a somewhat neat Docker trick here: /var/lib/postgresql/ is defined as a volume in Dockerfile, and if you write something to a directory (/var/lib/postgresql/) and later declare it as a volume, on a container start Docker will then:

If no volumes from the host system are attached to the container, use /var/lib/postgresql/ as any other directory;
If a newly created volume gets attached to the container, preload that new volume with the contents of /var/lib/postgresql from the container image;
If a previously created volume gets attached to the container, don't preload anything and just start the container with the volume mounted.

That way we can use the same postgresql-server image for both testing and production:

When running a test (or anything else that gets started via docker-compose.tests.yml), no volumes get mounted to /var/lib/postgresql/ in postgresql-server, so the container starts with an empty schema loaded to it via initialize_schema.sh;
When starting the very same Docker image on production, a data volume gets mounted from the host to the container so PostgreSQL gets started with the production data instead of an empty schema that got previously built into the container image.

Maybe there's a better way to go about all of this, but to retain the current functionality / behaviour, you'd have to make sure of the following:

On Docker build, the image ends up with the empty, up-to-date schema preloaded under /var/lib/postgresql. To do that, you'd probably want to start PostgreSQL in the background, apply all the migrations in the sequence to it, and stop it (as is currently done in initialize_schema.sh).
On container start, whenever container comes around to opening up a public port (i.e. lets users connect), it should expose an up-to-date schema to the users. To achieve that, you might start PostgreSQL on a non-public port (to avoid letting users connect to an outdated database), run the migration tool against that instance (which would then in turn figure out at which "version" the currently live schema is at, and apply the missing migrations to make the schema up-to-date), stop the instance, and restart it on a public port for the rest of the services to use.

As for the schema_version table, I think the tool is supposed to do all of the version comparing and such itself?

It seems important to validate whether the ZFS backup snapshot works,, however I don't exactly grok what's happening in that code block. How do you think that check might be incorporated into the new system?

MC_POSTGRESQL_SKIP_MIGRATIONS is an environment variable which, when set, skips all of the migration stuff and just starts PostgreSQL on whatever is currently in /var/lib/postgresql/. ZFS backups is just one of the ways how it gets used. Please leave support for that environment variable if you can.

^pgmigrate has a --dryrun option, which rolls back rather than committing, but it doesn't seem to actually log the SQL anywhere when that flag is set, so I'm not sure of the best way to output the pending SQL code without running it. I think, though, that all it's going to do each time is run the files in /migrations sequentially?

That's unfortunate. Let's skip it for now then.

I'm considering the best way to run pg_dump to get the reference schema file. Based on the permissions errors I've been getting when attempting to run as root and postgres, as well as a read of the Postgres docs, it seems like I need to execute the command as the mediacloud superuser. When I've tried that (specifically, executing pg_dump --dbname=mediacloud --username=mediacloud in initialize_db.sh), I get the error: pg_dump: [archiver (db)] connection to database "mediacloud" failed: FATAL: Peer authentication failed for user "mediacloud". Any thoughts on how to solve this one?

PostgreSQL auth is a confusing mix between UNIX accounts and database's own ones so don't feel bad. Currently implemented postgresql-server is configured to authenticate as mediacloud via environment variables so you should be able to do something like:

docker run -it dockermediacloud/postgresql-server    # defaults to ":latest" tag

and then

docker exec -it <container_id> psql

So dunno, maybe run pg_dump at some point when building and then add a shell script in ./dev/ that would cat the schema from postgresql-server:latest?

.gitignore

jtotoole · 2021-01-29T00:45:53Z

not sure how to go about testing this in production and extremely paranoid about doing so, but i think the PR is in decent shape now, and behaving as intended locally

pypt · 2021-02-08T21:40:17Z

Thanks, will have a look soon.

pypt

Thank you for all of your work on this, and sorry for the 428342837th time for the delay.

Just some minor changes here and there, plus a single bug (migrations don't seem to work on second run of the database service in the container).

.gitignore

apps/postgresql-server/Dockerfile

apps/postgresql-server/bin/apply_migrations.sh

pypt · 2021-06-08T12:40:49Z

apps/postgresql-server/bin/initialize_db.sh

+cd /opt/mediacloud && pgmigrate -t latest migrate
+
+# Dump schema file for reference in development
+psql -v ON_ERROR_STOP=1 mediacloud -c '\! pg_dump mediacloud > /tmp/mediawords.sql'


/tmp and /var/tmp could be tmpfs filesystems mounted by Docker to the container; I think it's better to store the generated schema somewhere less temporary, e.g. /;

Instead of running pg_dump from within psql (psql's \! command just starts a shell), you can just run pg_dump directly;

By the way, by default psql just ignores errors that it encounters in the input SQL. For example, if you had the following SQL file:

CREATE TABLE foo (name TEXT); blergh; CREATE TABLE bar (name TEXT);

and were to run psql -d database_name -f that_file_with_a_typo_in_the_middle.sql, it would CREATE TABLE foo, complain about the blergh statement and then happily CREATE TABLE bar. This is something to be wary of when, for example, importing large dumps because one might end up with an incomplete imported dump.

dev/get_schema.sh

…ot-pgmigrate

pypt

A single tiny revert please, plus update a bunch of docker-compose.tests.yml.

.github/workflows/build.yml

jtotoole · 2021-07-02T20:34:02Z

~~Looks like the latest failures are all crimson hexagon related (so no more volume problems I don't think)—I can confirm once #793 is merged~~

Edit: no, still volume problems; I'll keep investigating

jtotoole · 2021-07-06T20:49:55Z

@pypt alrighty, only test failures are crimson hexagon-related, so i think this is good to go assuming it looks okay to you

pypt

One more quick find-and-replace, and we're good to go!

apps/common/docker-compose.tests.yml

pypt · 2021-07-22T12:47:44Z

Amazinglymazing, thank you so much!

pypt · 2021-07-22T16:06:08Z

Fixes #754.

esirK · 2021-10-19T06:52:07Z

Hey @jtotoole @pypt
I restarted our Swarm and since it's using gcr.io/mcback/postgresql-server:release for the PostgreSQL image, it picked up all the latest changes which I think might be breaking the system. I'm currently getting the following error

psycopg2.errors.DuplicateTable: relation "database_variables" already exists

 File "/usr/local/bin/pgmigrate", line 8, in <module>
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     sys.exit(_main())
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 891, in _main
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     COMMANDS[args.cmd](config)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 753, in migrate
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _migrate_step(state, config.callbacks, config.base_dir, config.user,
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 555, in _migrate_step
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _apply_version(version, base_dir, user, schema, cursor)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 465, in _apply_version
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _apply_file(version_info.file_path, cursor)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 451, in _apply_file
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     _apply_statement(statement, cursor)
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |   File "/usr/local/lib/python3.8/dist-packages/pgmigrate.py", line 442, in _apply_statement
cfamediacloud_postgresql-server.1.nmvjhex3xf5b@ip-172-31-10-89    |
cfamediacloud_postgresql-server.1.nmvjhex3xf5b@ip-172-31-10-89    | During handling of the above exception, another exception occurred:
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    |     raise MigrateError('Unable to apply statement')
cfamediacloud_postgresql-server.1.pq7qkzbi9nxr@ip-172-31-10-89    | pgmigrate.MigrateError: Unable to apply statement
cfamediacloud_postgresql-server.1.nmvjhex3xf5b@ip-172-31-10-89    |

any suggestions on how I can resolve this?

esirK · 2021-10-19T07:09:52Z

MC_POSTGRESQL_SKIP_MIGRATIONS is an environment variable which, when set, skips all of the migration stuff and just starts PostgreSQL on whatever is currently in /var/lib/postgresql/. ZFS backups is just one of the ways how it gets used. Please leave support for that environment variable if you can.

From the comments, I see I can set MC_POSTGRESQL_SKIP_MIGRATIONS as an environment variable. Where should I set this from?

jtotoole marked this pull request as draft January 24, 2021 19:01

jtotoole added the enhancement label Jan 24, 2021

jtotoole commented Jan 29, 2021

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

jtotoole marked this pull request as ready for review January 29, 2021 00:44

jtotoole requested a review from pypt January 29, 2021 00:44

jtotoole changed the title ~~[WIP] use pgmigrate for schema migrations~~ use pgmigrate for schema migrations Jan 29, 2021

pypt self-assigned this Jun 8, 2021

pypt suggested changes Jun 8, 2021

View reviewed changes

jtotoole added 14 commits June 15, 2021 18:10

start on pgmigrate implementation

63d5ef9

don't initialize db with schema

a17394c

fix psycopg2 build

65eb9bc

pgmigrate working

76a3216

experiment with pg_dump

b25f584

dump schema file during build

b74d75f

port schema comments to COMMENT ON statements

7c8c6bd

add documentation + get_schema script

2fa3d48

fix path to pgmigrate-callbacks in Dockerfile

a750cb7

more cleanup

d092536

moar cleanup

e98b54e

delete test migration

8a0586b

add comment in get_schema.sh

302b683

review feedback

56a8832

jtotoole force-pushed the jot-pgmigrate branch from 6f9f239 to 56a8832 Compare June 15, 2021 22:45

jtotoole added 4 commits June 15, 2021 18:52

dump schema to correct dir

d486adb

fix container paths

2806c27

typo fix + add test table

44c8ca1

fix test migration

05a5991

jtotoole added 5 commits June 22, 2021 14:45

move pg_dump back to initialize_db.sh

6b9ba24

remove foo.txt test

038692c

do not ignore /schema/mediawords.sql

2172459

print test fails

793709a

Merge branch 'master' of https://github.com/mediacloud/backend into j…

194c34b

…ot-pgmigrate

pypt suggested changes Jul 2, 2021

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

remove tmate step from GHA

29dbb3f

jtotoole added 9 commits July 2, 2021 16:37

actually gitignore mediawords.sql

937ee2e

do not mount /postgresql-server/schema

fef95ab

refactor dir structure + mount pgmigrate dir + do not mount schema

60de341

put pgmigrate callbacks in subdir

cf3b6ab

copy pgmigrate dir recursively

db8b9b6

run pg_dump in Dockerfile

9af0a35

remove erroneous chmod

dde0b1d

run pg_dump in initialize_db.sh, mv in Dockerfile

d08fc82

typo in Dockerfile

60763f0

jtotoole changed the title ~~use pgmigrate for schema migrations~~ [RFR] use pgmigrate for schema migrations Jul 6, 2021

Merge branch 'master' into jot-pgmigrate

a69189c

pypt suggested changes Jul 19, 2021

View reviewed changes

apps/common/docker-compose.tests.yml Outdated Show resolved Hide resolved

jtotoole added 2 commits July 20, 2021 12:55

fix mount point in docker-compose.tests.yml files

f70b20c

typo in mediawords.sql path

5944625

pypt merged commit 82ead93 into master Jul 22, 2021

pypt deleted the jot-pgmigrate branch July 22, 2021 12:47

pypt mentioned this pull request Jul 22, 2021

Replace home-grown PostgreSQL database migrations with something modern #754

Closed

[RFR] use pgmigrate for schema migrations #759

[RFR] use pgmigrate for schema migrations #759

Uh oh!

Conversation

jtotoole commented Jan 24, 2021

Uh oh!

pypt commented Jan 25, 2021

Uh oh!

Uh oh!

jtotoole commented Jan 29, 2021

Uh oh!

pypt commented Feb 8, 2021

Uh oh!

pypt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pypt Jun 8, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pypt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jtotoole commented Jul 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtotoole commented Jul 6, 2021

Uh oh!

pypt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pypt commented Jul 22, 2021

Uh oh!

pypt commented Jul 22, 2021

Uh oh!

esirK commented Oct 19, 2021

Uh oh!

esirK commented Oct 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jtotoole commented Jul 2, 2021 •

edited

Loading