Data Projects

A repository for all my data-related projects.

2019 Stack Overflow Developer Survey

Data analysis of the 2019 Stack Overflow Developer survey results in four different softwares:

Microsoft Excel: data transformation and creation of two dashboards
Microsoft Power BI: usage of the data transformed in Excel to recreate the same dashboards
Jupyter Notebook (Python): complete analysis from scratch with NumPy, pandas and Plotly
R Notebook: same analysis made in Python, but replicated with R code

The source dataset can be downloaded here (the file is too large to include in the repository).

You can view the Excel version of the analysis here.

You can find the results of the Power BI version report as a PDF in this folder of the repository or as a downloadable .pbix file in this folder.

You can view the Python version of the analysis in an HTML version of the resulting Jupyter Notebook here.

The R notebook is available here too.

Anime analysis

Data analysis of a dataset about anime found on Kaggle. This dataset contains anime listings, the studios responsible for the animation, genres, warnings, etc. This project was divided in two parts:

Microsoft Power BI data analysis: a straightforward data analysis in Microsoft Power BI
anime_db: creation of a PostgreSql database and data analysis in Python, using the tables created in the Power BI analysis (exported from Power BI as CSV files). The psycopg2 library was used to perform database operations, and pandas, NumPy and Plotly were used for the data analysis.

The original dataset is available on Kaggle here.

The results of the Power BI analysis are available here (a PDF of the report).

The results of the anime_db work is available here (two Jupyter Notebooks and the database ERD). You can read the first notebook here and the second here. The first is all about the data engineering part, that is, writing data to the database. The second notebook is the actual data analysis.

2020 Stack Overflow Developer Survey

Similar to the 2019 counterpart, but using the 2020 data. It is similar to last year's analysis but, in my opinion, it is better given the new knowledge and skills I've acquired since the first analysis.

So far, the data analysis of the 2020 Stack Overflow Developer survey was made in:

Microsoft Power BI: data transformations in Power Query and creation of four dashboards (general data, technology-related, professional status, and other data)
Microsoft Excel: similar analysis to Power BI, but has only two dashboards, including mostly the same information, but visually rearranged
Python/PostgreSQL: a two-phase project. The first is data engineering oriented, where I pre-processed the dataset, created a PostgreSQL database using the psycopg2 driver and then inserted the data. The second phase is the proper data analysis, using Plotly for the visualization, similar to what I did in Excel and Power BI
R: replicated the Python data analysis in R. Extracted the data from the same database I had created and replicated the data analysis code in R

The source dataset can be downloaded here (the file is too large to include in the repository).

You can find the results of the Power BI report as a PDF here and as a downloadable .pbix file here.

You can find the Excel file version of the data analysis, as well as screenshots of the resulting dashboards, here.

For the Python (and PostgreSQL) part, I divided the work in two Jupyter Notebooks. The first notebook covers the data engineering part, and the second covers the data analysis part. You can read the first notebook here and the second notebook here. Both notebooks are also available in this repository.

For the R part, you can read the notebook online here, and find all the code in this repository.

Demos

Smaller demos created for specific purposes, such as how to perform a certain data transformation in Python, data analyses in Power BI, etc., including the resulting files of tutorials I've completed. The link to the original datasets can always be found in the respective "source.txt" file. Some examples of the demos available so far:

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
SO2019DevSurvey		SO2019DevSurvey
SO2020DevSurvey		SO2020DevSurvey
anime analysis		anime analysis
arknights-data-science		arknights-data-science
azure_data_explorer		azure_data_explorer
databricks ms-learn		databricks ms-learn
demos		demos
freecodecamp data analysis with python		freecodecamp data analysis with python
llm_study		llm_study
machine learning		machine learning
spotify-data-analysis		spotify-data-analysis
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Projects

2019 Stack Overflow Developer Survey

Anime analysis

2020 Stack Overflow Developer Survey

Demos

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Ze1598/data-projects

Folders and files

Latest commit

History

Repository files navigation

Data Projects

2019 Stack Overflow Developer Survey

Anime analysis

2020 Stack Overflow Developer Survey

Demos

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages