pandas-API-analysis

This repo contains the code and analysis used in this blogpost.

Running data processing pipeline from raw corpus

If you are interested in running our complete analysis end-to-end, starting with the raw 1M+ notebook dataset, the dataset can be downloaded from the UCSD Library Digital Collections. Then you can follow the instructions at the top of notebook/1-raw-notebook-processing.ipynb to clean the data, or download the cleaned data via Google Drive and run the processing yourself. Note that the processing pipeline takes around 20 minutes to run.

Running only the analysis notebook

If you are only interested in the analysis portion, we have distilled the dataset down to a smaller dataset that contains the count of pandas API usage across each notebook. This smaller dataset (filtered_token_breakdown.csv) can be downloaded at this link. Once you have downloaded the dataset, you can place it in the data/ folder and follow the analysis in notebook/2-pandas-usage-analysis.ipynb.

Questions?

If you have any questions or feedback on our blog post or analysis, please send us an email at contact@ponder.io.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
notebook		notebook
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pandas-API-analysis

Running data processing pipeline from raw corpus

Running only the analysis notebook

Questions?

About

Uh oh!

Releases

Packages

Languages

License

ponder-org/pandas-API-analysis

Folders and files

Latest commit

History

Repository files navigation

pandas-API-analysis

Running data processing pipeline from raw corpus

Running only the analysis notebook

Questions?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages