-
Notifications
You must be signed in to change notification settings - Fork 241
Books dataset #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Books dataset #21
Conversation
| # Books Dataset | ||
|
|
||
| The books.json is a subset from the openlibrary [books datasets](https://openlibrary.org/developers/dumps) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would need to add the CC0 1.0 universal license here I think: https://openlibrary.org/help/faq/using#ownership
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Haroenv To the best of my knowledge when it comes to CC0 1.0 universal license following rules apply.
- You may use the dataset for commercial purposes.
- No need to cite or reference the license.
- Attribution is optional, not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Haroenv if you insist will add a copy in the folder. Do advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging in on the licensing, Ankur. Based on your research I agree with you.
|
Hey @originalankur, thanks for the PR. I had a look at the content of the file, and I'm afraid some of the books might contain sensitive content (at least one suspicious case of doxxing, and mentions of child pornography), that we don't really want in our public list of data. I cleaned the list and shrinked the number of books to ~24k rather than ~33k (which also puts the file size at 49MB, right below the suggested 50MB github limit). Can you pull it in to replace your version, please? |
|
@pixelastic Thank you for cleaning the data, I should have thought of this. I will update the PR. Thanks Tim. |
|
Hey @originalankur ping me once you've updated the PR and I'll merge it. Thanks. |
Extracted from open library dataset.