Skip to content

Conversation

@frostyshadows
Copy link
Collaborator

🔨 Changes

Instead of adding jobs directly to the database, the script makes the POST request to job server for the job server to add them. This way we can run the script on AWS Lambda.

:squirrel: Testing instructions

Have job server running on localhost:5000. Run the script (pipenv run python3 data_acquisition.py) and make sure jobs are added to your local database.

📄 Relevant screenshots or documentation links

📋 Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Collaborator

@cowmanjoe cowmanjoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a comment. Also, I think it would be good if there was a log message indicating how many jobs were inserted.

One concern I have that you can't really address here is that the ZipRecruiter jobs seem to actually come back with different URLS every time for the same job. It appears they are tagging some kind of unique ID in the URL, maybe because they want to count the number of clicks from that link? The reason this is a concern is it messes up our idea for not allowing duplicate jobs in with the unique link index. I'm not sure how we get around this, maybe some analysis on the other fields. Anyway, it's out of the scope of this PR.

"longitude": 0.0,
"company_name": job["hiring_company"]["name"],
"start_date": None,
"salary_min": job["salary_min"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use salary_min_annual here because salary_min can be hourly or monthly I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants