Skip to content

Conversation

@erinspace
Copy link
Member

Purpose

SHARE collects a lot of URIs, and it would be nice to also collect information about where those uris actually go. This is the begnings o f a processor that will follow those links and save information about their status, and where they resolve.

TODO

There is a lot to be done in order for this plan to be viable, most notably creating a central "control" system that keeps track of how many times we've hit a particular website with requests according to their terms of service.

@fabianvf
Copy link
Contributor

fabianvf commented Aug 6, 2015

I would make this a new celery task and just make it an additional part of the pipe, I think.

erinspace and others added 27 commits August 6, 2015 17:16
@erinspace
Copy link
Member Author

Moving development to #389 for some more generalized post processing of both URLs and Contributors together, adding a new postgres model instead of just sticking it in the normalized document

@erinspace erinspace closed this Oct 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants