Migrate from popit-django to django-popolo / multiple-django-popolo-sources#1181
Migrate from popit-django to django-popolo / multiple-django-popolo-sources#1181
Conversation
f352344 to
07352bd
Compare
07352bd to
500c1db
Compare
500c1db to
4c5fa8d
Compare
4c5fa8d to
2763382
Compare
2763382 to
1211646
Compare
1211646 to
ae24194
Compare
ae24194 to
a8a4f80
Compare
a8a4f80 to
d212d48
Compare
d212d48 to
dc98bea
Compare
dc98bea to
0ecc77e
Compare
1 similar comment
| return url | ||
|
|
||
|
|
||
| def forwards(apps, schema_editor): |
There was a problem hiding this comment.
woow this is incredibly cautious particularly in preserving the old data!
I get the impression that this is a pretty interesting piece of code and we should be aware of what comes out of processing this migration. Anyhow, 👍
There was a problem hiding this comment.
😄 The scariest bit of this migration for me (although this whole PR is quite scary!) is actually the update_source_url function above, since there are so many different PopIt URLs in the live database, lots of which don't work at the moment anyway. I hope that update_source_url will produce working URLs for those were working previously, but I think it's particularly worth double-checking that.
instance/models.py
Outdated
| class PopoloPersonQuerySet(models.QuerySet): | ||
|
|
||
| def get_from_api_uri(self, uri_from_api): | ||
| try: |
There was a problem hiding this comment.
Couldn't this be achieved by doing something like
PopoloPerson.objects.filter(identifiers__identifier=uri_from_api,identifiers__scheme__in=['popit_url', 'popolo_uri' ]).first()
the difference that I can see in both is that this one would return None if the Person isn't found, while in the current code it would raise PopoloPerson.DoesNotExist.
If the exception is there for a reason that I'm not seeing then it makes total sense.
There was a problem hiding this comment.
I think I wrote it that way to be clear that you should use the popit_url for finding people, if it exists, in preference to the popolo_uri - you wouldn't be sure of that using scheme_in unless you use an order_by on scheme as well, which I don't like much.
That's almost certainly being excessively defensive, however - given how they are formed, I don't think there should ever be two different PopoloPerson objects where one has a popolo_uri which is the popit_url of the other.
(I wanted it to raise a DoesNotExist if the person couldn't be found because the code that calls this is replacing a use of .get, so it seemed best to keep that behaviour.)
But yeah, I think we could save a query in lots of cases by doing what you suggest with get instead of filter & first.
There was a problem hiding this comment.
I've made the fix that you suggested in this fixup: d579dab
This include a migration that changes some foreign keys to use the PopoloPerson proxy model instead of the base Person model.
Rather than using the /persons collection endpoint of PopIt's API, the code now just uses export.json to extract Popolo JSON from PopIt. This commit has the results of re-recording these fixtures (using roughly the procedure here: https://gist.github.com/mhl/848f7980d256ca38ff603375be4f264e ). The exception was persons_with_null_values.yaml which seems to be missing the source JSON MongoDB fixture, so this was created by hand based on one of the others.
The start_local_popit_api.bash script is designed to make it easy to set up a local version of PopIt for testing against; this is still useful for re-recording the fixtures (see the commit message for 967f842529 and https://gist.github.com/mhl/848f7980d256ca38ff603375be4f264e ). However, this script installed a very old version of PopIt - this commit updates it to use the most recent version. (A key problem with the old version of this script is that it sets up a version of PopIt without the export.json endpoints.)
This doesn't seem to matter, and since there is no Identifier associated with a ContactDetail in django-popolo, it'd be awkward to preserve the ContactDetail identifiers. It doesn't seem to be actually used anyway.
5afb702 to
66160ec
Compare
It's possible for the test: nuntium.tests.home_view_tests:HomeViewTestCase.test_list_instances ... to fail because the order that instances are listed is not specified. This commit orders the instances by their slug.
Since the migration to syncing people using django-popolo and multiple-django-popolo-sources, we're now syncing all people (even historic representatives) from EveryPolitician, not just those in the current term(s). However, we should only be allowing people to write to current representatives. The problem here is that the old behaviour was just to include everyone from the data source who had a contact and hadn't been deliberately disabled - some (or perhaps most) of these non-EveryPolitician data sources won't have any way to tell who is current or not; the assumption should continue that they should all be made available to write to. We could divide these cases into sources that look like they're from EveryPolitician or not from the URL, but that wouldn't help with people serving EP data from elsewhere, or after modification. Probably the best approach would be to only show current representatives for sources that have Event objects with classification "legislative period", and then decide who is current by looking at those who have memberships with a legislative_period_id corresponding to a current legislative period. Unfortunately, at the moment django-popolo does not include the Event class, and we need a workaround quickly for the Alpaca project, so this commit uses a different heuristic for when to attempt to filter people for being current memberships of a legislature: if the people from that source have any memberships at all, it only includes those people who have a current membership (of any organisation - of course this might be completely spurious). This is good enough for the Alpaca case, which will give us enough time to update django-popolo and multiple-django-popolo-sources so that we can do this properly for the main production writeinpublic site. This commit also doesn't address whether historic representatives are available via the API.
This makes the handling of a missing 'identifiers' array for people less confusing and adds the option to preserve particular IDs on syncing.
The default behaviour is to update identifiers to remove any not in the upstream data source when syncing, but we've deliberately created some legacy IDs in migration that we want to keep. (n.b. the popolo:person scheme is used is implicitly preserved because we specified an id_prefix of 'popolo:')
We were getting one new (identical) popolo_uri Identifier with each sync of the data; this commit should fix that.
PopIt is being deprecated, so it no longer makes sense for write-it / WriteInPublic to depend on PopIt instances as sources of data, or for it to depend on the deprecated PopIt client,
popit-django. It makes more sense now to move to a model where data sources are all URLs of Popolo JSON, and to use thedjango-popoloproject to store the data from such Popolo JSON sources.multiple-django-popolo-sourcescan be used to keep the data from each of these sources distinct.(There is also the PopIt 2.0 / NG project from Sinar now; so long as that exposes an endpoint similar to PopIt's
export.json, this can be still be used - we should check that, and make a feature request if not.)This pull request migrates from using
popit-djangoand PopIt to usingdjango-popolo,multiple-django-popolo-sourcesand Popolo JSON URLs instead.Notes for reviewers / self review
This is a large pull request, so at the request of @chrismytton I'm going to split it into smaller ones targetting a branch called
reviewed-migrate-to-django-popolo, since it still needs to be deployed in one go. Many of the commits after the data migration (i.e. after "Remove a now unneeded popitapiinstance field") could be squashed into one commit which represents updates to the code and tests for the new models, but this would make it much harder to review.Another consideration is that there are a lot of instances in
writeinpublic.com- I've tried the data migration on various sources locally, but:Sub-Pull Requests
Part 1: add django-popolo data in parallel Migration to django-popolo part 1 #1193Part 2: remove references to old popit-django models Migration to django-popolo part 2 #1195Part 3: update all the code and tests Migration to django-popolo part 3 #1196