From 99ab89b5c884ed5df5c6dc859a600777b04f2f20 Mon Sep 17 00:00:00 2001 From: Anna Date: Thu, 2 May 2024 23:59:21 +0200 Subject: [PATCH] Completed lab --- your-code/challenge-1.ipynb | 1936 ++++++++++++++++++++++++++++++----- your-code/challenge-2.ipynb | 1060 +++++++++++++++---- your-code/challenge-3.ipynb | 1412 ++++++++++++++++++++++--- 3 files changed, 3790 insertions(+), 618 deletions(-) diff --git a/your-code/challenge-1.ipynb b/your-code/challenge-1.ipynb index cd674cb..eaee3b5 100644 --- a/your-code/challenge-1.ipynb +++ b/your-code/challenge-1.ipynb @@ -1,276 +1,1660 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 1\n", - "\n", - "In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.\n", - "\n", - "![Pokemon](../images/pokemon.jpg)\n", - "\n", - "Follow the instructions below and enter your code.\n", - "\n", - "#### Import all required libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import libraries" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Import data set.\n", - "\n", - "Read the dataset `pokemon.csv` into a dataframe called `pokemon`.\n", - "\n", - "*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Print first 10 rows of `pokemon`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.\n", - "\n", - "For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.\n", - "\n", - "| Column | Description |\n", - "| --- | --- |\n", - "| # | ID for each pokemon |\n", - "| Name | Name of each pokemon |\n", - "| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |\n", - "| Type 2 | Some pokemon are dual type and have 2 |\n", - "| Total | A general guide to how strong a pokemon is |\n", - "| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |\n", - "| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |\n", - "| Defense | The base damage resistance against normal attacks |\n", - "| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |\n", - "| SP Def | The base damage resistance against special attacks |\n", - "| Speed | Determines which pokemon attacks first each round |\n", - "| Generation | Number of generation |\n", - "| Legendary | True if Legendary Pokemon False if not |" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Obtain the distinct values across `Type 1` and `Type 2`.\n", - "\n", - "Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Cleanup `Name` that contain \"Mega\".\n", - "\n", - "If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain \"Mega\". We want to clean up the pokemon names. For instance, \"VenusaurMega Venusaur\" should be \"Mega Venusaur\", and \"CharizardMega Charizard X\" should be \"Mega Charizard X\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here\n", - "\n", - "\n", - "# test transformed data\n", - "pokemon.head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.\n", - "\n", - "For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon with the highest `A/D Ratio`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon with the lowest A/D Ratio." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.\n", - "\n", - "Rules:\n", - "\n", - "* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of ` `. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.\n", - "\n", - "* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon whose `A/D Ratio` are among the top 5." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.\n", - "\n", - "Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.\n", - "\n", - "Your output should look like below:\n", - "\n", - "![Aggregate](../images/aggregated-mean.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Challenge 1\n", + "\n", + "In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.\n", + "\n", + "![Pokemon](../images/pokemon.jpg)\n", + "\n", + "Follow the instructions below and enter your code.\n", + "\n", + "#### Import all required libraries." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# import libraries\n", + "\n", + "import pandas as pd\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Import data set.\n", + "\n", + "Read the dataset `pokemon.csv` into a dataframe called `pokemon`.\n", + "\n", + "*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# import dataset\n", + "pokemon = pd.read_csv('/Users/anna/iron_hack/lab-dataframe-calculations/your-code/pokemon.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Print first 10 rows of `pokemon`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \n", + "0 65 65 45 1 False \n", + "1 80 80 60 1 False \n", + "2 100 100 80 1 False \n", + "3 122 120 80 1 False \n", + "4 60 50 65 1 False " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "pokemon.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.\n", + "\n", + "For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.\n", + "\n", + "| Column | Description |\n", + "| --- | --- |\n", + "| # | ID for each pokemon |\n", + "| Name | Name of each pokemon |\n", + "| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |\n", + "| Type 2 | Some pokemon are dual type and have 2 |\n", + "| Total | A general guide to how strong a pokemon is |\n", + "| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |\n", + "| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |\n", + "| Defense | The base damage resistance against normal attacks |\n", + "| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |\n", + "| SP Def | The base damage resistance against special attacks |\n", + "| Speed | Determines which pokemon attacks first each round |\n", + "| Generation | Number of generation |\n", + "| Legendary | True if Legendary Pokemon False if not |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Obtain the distinct values across `Type 1` and `Type 2`.\n", + "\n", + "Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',\n", + " 'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',\n", + " 'Dragon', 'Dark', 'Steel', 'Flying', nan], dtype=object)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "type1_values = pokemon['Type 1'].unique()\n", + "type2_values = pokemon['Type 2'].unique()\n", + "\n", + "type1_and_type2 = pd.unique(pd.concat([pokemon['Type 1'], pokemon['Type 2']]))\n", + "type1_and_type2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Cleanup `Name` that contain \"Mega\".\n", + "\n", + "If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain \"Mega\". We want to clean up the pokemon names. For instance, \"VenusaurMega Venusaur\" should be \"Mega Venusaur\", and \"CharizardMega Charizard X\" should be \"Mega Charizard X\"." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33Mega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
55CharmeleonFireNaN4055864588065801False
66CharizardFireFlying534788478109851001False
76Mega Charizard XFireDragon63478130111130851001False
86Mega Charizard YFireFlying63478104781591151001False
97SquirtleWaterNaN3144448655064431False
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 65 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 80 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 100 \n", + "3 3 Mega Venusaur Grass Poison 625 80 100 123 122 \n", + "4 4 Charmander Fire NaN 309 39 52 43 60 \n", + "5 5 Charmeleon Fire NaN 405 58 64 58 80 \n", + "6 6 Charizard Fire Flying 534 78 84 78 109 \n", + "7 6 Mega Charizard X Fire Dragon 634 78 130 111 130 \n", + "8 6 Mega Charizard Y Fire Flying 634 78 104 78 159 \n", + "9 7 Squirtle Water NaN 314 44 48 65 50 \n", + "\n", + " Sp. Def Speed Generation Legendary \n", + "0 65 45 1 False \n", + "1 80 60 1 False \n", + "2 100 80 1 False \n", + "3 120 80 1 False \n", + "4 50 65 1 False \n", + "5 65 80 1 False \n", + "6 85 100 1 False \n", + "7 85 100 1 False \n", + "8 115 100 1 False \n", + "9 64 43 1 False " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "def cleaned_name(name):\n", + " if 'Mega ' in name:\n", + " return 'Mega ' + name.split('Mega ')[-1]\n", + " else:\n", + " return name\n", + "\n", + "pokemon['Name'] = pokemon['Name'].apply(cleaned_name)\n", + "\n", + "# test transformed data\n", + "pokemon.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.\n", + "\n", + "For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RationA/D Ratio
01BulbasaurGrassPoison3184549496565451False1.0000001.000000
12IvysaurGrassPoison4056062638080601False0.9841270.984127
23VenusaurGrassPoison525808283100100801False0.9879520.987952
33Mega VenusaurGrassPoison62580100123122120801False0.8130080.813008
44CharmanderFireNaN3093952436050651False1.2093021.209302
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 65 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 80 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 100 \n", + "3 3 Mega Venusaur Grass Poison 625 80 100 123 122 \n", + "4 4 Charmander Fire NaN 309 39 52 43 60 \n", + "\n", + " Sp. Def Speed Generation Legendary A/D Ration A/D Ratio \n", + "0 65 45 1 False 1.000000 1.000000 \n", + "1 80 60 1 False 0.984127 0.984127 \n", + "2 100 80 1 False 0.987952 0.987952 \n", + "3 120 80 1 False 0.813008 0.813008 \n", + "4 50 65 1 False 1.209302 1.209302 " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "pokemon['A/D Ratio'] = pokemon[\"Attack\"]/pokemon['Defense']\n", + "pokemon.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Identify the pokemon with the highest `A/D Ratio`." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RationA/D Ratio
429386DeoxysAttack FormePsychicNaN6005018020180201503True9.09.0
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "429 386 DeoxysAttack Forme Psychic NaN 600 50 180 20 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ration A/D Ratio \n", + "429 180 20 150 3 True 9.0 9.0 " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "pokemon_highest_ratio = pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].max()]\n", + "pokemon_highest_ratio" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Identify the pokemon with the lowest A/D Ratio." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RationA/D Ratio
230213ShuckleBugRock50520102301023052False0.0434780.043478
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def \\\n", + "230 213 Shuckle Bug Rock 505 20 10 230 10 230 \n", + "\n", + " Speed Generation Legendary A/D Ration A/D Ratio \n", + "230 5 2 False 0.043478 0.043478 " + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "pokemon_lowest_ratio = pokemon[pokemon['A/D Ratio'] == pokemon['A/D Ratio'].min()]\n", + "pokemon_lowest_ratio" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.\n", + "\n", + "Rules:\n", + "\n", + "* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of ` `. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.\n", + "\n", + "* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RationA/D RatioCombo Type
01BulbasaurGrassPoison3184549496565451False1.0000001.000000Grass-Poison
12IvysaurGrassPoison4056062638080601False0.9841270.984127Grass-Poison
23VenusaurGrassPoison525808283100100801False0.9879520.987952Grass-Poison
33Mega VenusaurGrassPoison62580100123122120801False0.8130080.813008Grass-Poison
44CharmanderFireNaN3093952436050651False1.2093021.209302Fire
...................................................
795719DiancieRockFairy60050100150100150506True0.6666670.666667Rock-Fairy
796719Mega DiancieRockFairy700501601101601101106True1.4545451.454545Rock-Fairy
797720HoopaHoopa ConfinedPsychicGhost6008011060150130706True1.8333331.833333Psychic-Ghost
798720HoopaHoopa UnboundPsychicDark6808016060170130806True2.6666672.666667Psychic-Dark
799721VolcanionFireWater6008011012013090706True0.9166670.916667Fire-Water
\n", + "

800 rows × 16 columns

\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 Mega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + ".. ... ... ... ... ... .. ... ... \n", + "795 719 Diancie Rock Fairy 600 50 100 150 \n", + "796 719 Mega Diancie Rock Fairy 700 50 160 110 \n", + "797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 \n", + "798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 \n", + "799 721 Volcanion Fire Water 600 80 110 120 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ration A/D Ratio \\\n", + "0 65 65 45 1 False 1.000000 1.000000 \n", + "1 80 80 60 1 False 0.984127 0.984127 \n", + "2 100 100 80 1 False 0.987952 0.987952 \n", + "3 122 120 80 1 False 0.813008 0.813008 \n", + "4 60 50 65 1 False 1.209302 1.209302 \n", + ".. ... ... ... ... ... ... ... \n", + "795 100 150 50 6 True 0.666667 0.666667 \n", + "796 160 110 110 6 True 1.454545 1.454545 \n", + "797 150 130 70 6 True 1.833333 1.833333 \n", + "798 170 130 80 6 True 2.666667 2.666667 \n", + "799 130 90 70 6 True 0.916667 0.916667 \n", + "\n", + " Combo Type \n", + "0 Grass-Poison \n", + "1 Grass-Poison \n", + "2 Grass-Poison \n", + "3 Grass-Poison \n", + "4 Fire \n", + ".. ... \n", + "795 Rock-Fairy \n", + "796 Rock-Fairy \n", + "797 Psychic-Ghost \n", + "798 Psychic-Dark \n", + "799 Fire-Water \n", + "\n", + "[800 rows x 16 columns]" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "def combo_type(row):\n", + " if pd.notna(row['Type 1']) and pd.notna(row['Type 2']):\n", + " return row['Type 1'] + '-' + row['Type 2']\n", + " elif pd.notna(row['Type 1']):\n", + " return row['Type 1']\n", + " else:\n", + " return row['Type 2']\n", + "\n", + "# Apply the function to create the Combo Type column\n", + "pokemon['Combo Type'] = pokemon.apply(combo_type, axis=1)\n", + "pokemon" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Identify the pokemon whose `A/D Ratio` are among the top 5." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RationA/D RatioCombo Type
429386DeoxysAttack FormePsychicNaN6005018020180201503True9.0009.000Psychic
347318CarvanhaWaterDark3054590206520653False4.5004.500Water-Dark
1915Mega BeedrillBugPoison495651504015801451False3.7503.750Bug-Poison
453408CranidosRockNaN35067125403030584False3.1253.125Rock
348319SharpedoWaterDark46070120409540953False3.0003.000Water-Dark
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "429 386 DeoxysAttack Forme Psychic NaN 600 50 180 20 \n", + "347 318 Carvanha Water Dark 305 45 90 20 \n", + "19 15 Mega Beedrill Bug Poison 495 65 150 40 \n", + "453 408 Cranidos Rock NaN 350 67 125 40 \n", + "348 319 Sharpedo Water Dark 460 70 120 40 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ration A/D Ratio \\\n", + "429 180 20 150 3 True 9.000 9.000 \n", + "347 65 20 65 3 False 4.500 4.500 \n", + "19 15 80 145 1 False 3.750 3.750 \n", + "453 30 30 58 4 False 3.125 3.125 \n", + "348 95 40 95 3 False 3.000 3.000 \n", + "\n", + " Combo Type \n", + "429 Psychic \n", + "347 Water-Dark \n", + "19 Bug-Poison \n", + "453 Rock \n", + "348 Water-Dark " + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "top_ad_ratio = pokemon.sort_values(by='A/D Ratio', ascending=False).head(5)\n", + "top_ad_ratio" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.\n", + "\n", + "Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`." + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock']" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "top_combo_types = top_ad_ratio['Combo Type'].unique().tolist()\n", + "top_combo_types" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.\n", + "\n", + "Your output should look like below:\n", + "\n", + "![Aggregate](../images/aggregated-mean.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here\n", + "top_combo_means = pokemon[pokemon['Combo Type'].isin(top_combo_types)]\n", + "\n", + "top_combo_means = top_combo_means.groupby('Combo Type').mean()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D RationA/D Ratio
Combo Type
Bug-Poison199.166667347.91666753.75000068.33333358.08333342.50000059.33333365.9166672.3333330.0000001.3159891.315989
Psychic381.973684464.55263272.55263264.94736867.23684298.55263282.39473778.8684213.3421050.2368421.1641961.164196
Rock410.111111409.44444467.111111103.333333107.22222240.55555658.33333332.8888893.8888890.1111111.2600911.260091
Water-Dark347.666667493.83333369.166667120.00000065.16666788.83333363.50000087.1666673.1666670.0000002.2919492.291949
\n", + "
" + ], + "text/plain": [ + " # Total HP Attack Defense \\\n", + "Combo Type \n", + "Bug-Poison 199.166667 347.916667 53.750000 68.333333 58.083333 \n", + "Psychic 381.973684 464.552632 72.552632 64.947368 67.236842 \n", + "Rock 410.111111 409.444444 67.111111 103.333333 107.222222 \n", + "Water-Dark 347.666667 493.833333 69.166667 120.000000 65.166667 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \\\n", + "Combo Type \n", + "Bug-Poison 42.500000 59.333333 65.916667 2.333333 0.000000 \n", + "Psychic 98.552632 82.394737 78.868421 3.342105 0.236842 \n", + "Rock 40.555556 58.333333 32.888889 3.888889 0.111111 \n", + "Water-Dark 88.833333 63.500000 87.166667 3.166667 0.000000 \n", + "\n", + " A/D Ration A/D Ratio \n", + "Combo Type \n", + "Bug-Poison 1.315989 1.315989 \n", + "Psychic 1.164196 1.164196 \n", + "Rock 1.260091 1.260091 \n", + "Water-Dark 2.291949 2.291949 " + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "top_combo_means" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/your-code/challenge-2.ipynb b/your-code/challenge-2.ipynb index d347731..20c04f6 100644 --- a/your-code/challenge-2.ipynb +++ b/your-code/challenge-2.ipynb @@ -1,195 +1,865 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 2\n", - "\n", - "In this challenge we will continue working with the `Pokemon` dataset. We will attempt solving a slightly more complex problem in which we will practice the iterative data analysis process you leaned in [this video](https://www.youtube.com/watch?v=xOomNicqbkk).\n", - "\n", - "The problem statement is as follows:\n", - "\n", - "**You are at a Pokemon black market planning to buy a Pokemon for battle. All Pokemon are sold at the same price and you can only afford to buy one. You cannot choose which specific Pokemon to buy. However, you can specify the type of the Pokemon - one type that exists in either `Type 1` or `Type 2`. Which type should you choose in order to maximize your chance of receiving a good Pokemon?**\n", - "\n", - "To remind you about the 3 steps of iterative data analysis, they are:\n", - "\n", - "1. Setting Expectations\n", - "1. Collecting Information\n", - "1. Reacting to Data / Revising Expectations\n", - "\n", - "Following the iterative process, we'll guide you in completing the challenge." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Problem Solving Iteration 1\n", - "\n", - "In this iteration we'll analyze the problem and identify the breakthrough. The original question statement is kind of vague because we don't know what a *good pokemon* really means as represented in the data. We'll start by understanding the dataset and see if we can find some insights." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Import libraries\n", - "import numpy as np\n", - "import pandas as pd" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "# Importing the dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the data it seems whether a pokemon is good depends on its abilities as represented in the fields of `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total`. We are not sure about `Generation` and `Legendary` because they are not necessarily the decisive factors of the pokemon abilities.\n", - "\n", - "But `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total` are a lot of fields! If we look at them all at once it's very complicated. This isn't Mission Impossible but it's ideal that we tackle this kind of problem after we learn Machine Learning (which you will do in Module 3). For now, is there a way to consolidate the fields we need to look into?\n", - "\n", - "Fortunately there seems to be a way. It appears the `Total` field is computed based on the other 6 fields. But we need to prove our theory. If we can approve there is a formula to compute `Total` based on the other 6 abilities, we only need to look into `Total`.\n", - "\n", - "We have the following expectation now:\n", - "\n", - "#### The `Total` field is computed based on `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed`.\n", - "\n", - "We need to collect the following information:\n", - "\n", - "* **What is the formula to compute `Total`?**\n", - "* **Does the formula work for all pokemon?**\n", - "\n", - "In the cell below, make a hypothesis on how `Total` is computed and test your hypothesis." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Problem Solving Iteration 2\n", - "\n", - "Now that we have consolidated the abilities fields, we can update the problem statement. The new problem statement is:\n", - "\n", - "### Which pokemon type is most likely to have the highest `Total` value?\n", - "\n", - "In the updated problem statement, we assume there is a certain relationship between the `Total` and the pokemon type. But we have two *type* fields (`Type 1` and `Type 2`) that have string values. In data analysis, string fields have to be transformed to numerical format in order to be analyzed. \n", - "\n", - "In addition, keep in mind that `Type 1` always has a value but `Type 2` is sometimes empty (having the `NaN` value). Also, the pokemon type we choose may be either in `Type 1` or `Type 2`.\n", - "\n", - "Now our expectation is:\n", - "\n", - "#### `Type 1` and `Type 2` string variables need to be converted to numerical variables in order to identify the relationship between `Total` and the pokemon type.\n", - "\n", - "The information we need to collect is:\n", - "\n", - "#### How to convert two string variables to numerical?\n", - "\n", - "Let's address the first question first. You can use a method called **One Hot Encoding** which is frequently used in machine learning to encode categorical string variables to numerical. The idea is to gather all the possible string values in a categorical field and create a numerical field for each unique string value. Each of those numerical fields uses `1` and `0` to indicate whether the data record has the corresponding categorical value. A detailed explanation of One Hot Encoding can be found in [this article](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f). You will formally learn it in Module 3.\n", - "\n", - "For instance, if a pokemon has `Type 1` as `Poison` and `Type 2` as `Fire`, then its `Poison` and `Fire` fields are `1` whereas all other fields are `0`. If a pokemon has `Type 1` as `Water` and `Type 2` as `NaN`, then its `Water` field is `1` whereas all other fields are `0`.\n", - "\n", - "#### In the next cell, use One Hot Encoding to encode `Type 1` and `Type 2`. Use the pokemon type values as the names of the numerical fields you create.\n", - "\n", - "The new numerical variables you create should look like below:\n", - "\n", - "![One Hot Encoding](../images/one-hot-encoding.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Problem Solving Iteration 3\n", - "\n", - "Now we have encoded the pokemon types, we will identify the relationship between `Total` and the encoded fields. Our expectation is:\n", - "\n", - "#### There are relationships between `Total` and the encoded pokemon type variables and we need to identify the correlations.\n", - "\n", - "The information we need to collect is:\n", - "\n", - "#### How to identify the relationship between `Total` and the encoded pokemon type fields?\n", - "\n", - "There are multiple ways to answer this question. The easiest way is to use correlation. In the cell below, calculate the correlation of `Total` to each of the encoded fields. Rank the correlations and identify the #1 pokemon type that is most likely to have the highest `Total`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Bonus Question\n", - "\n", - "Say now you can choose both `Type 1` and `Type 2` of the pokemon. In order to receive the best pokemon, which types will you choose?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Challenge 2\n", + "\n", + "In this challenge we will continue working with the `Pokemon` dataset. We will attempt solving a slightly more complex problem in which we will practice the iterative data analysis process you leaned in [this video](https://www.youtube.com/watch?v=xOomNicqbkk).\n", + "\n", + "The problem statement is as follows:\n", + "\n", + "**You are at a Pokemon black market planning to buy a Pokemon for battle. All Pokemon are sold at the same price and you can only afford to buy one. You cannot choose which specific Pokemon to buy. However, you can specify the type of the Pokemon - one type that exists in either `Type 1` or `Type 2`. Which type should you choose in order to maximize your chance of receiving a good Pokemon?**\n", + "\n", + "To remind you about the 3 steps of iterative data analysis, they are:\n", + "\n", + "1. Setting Expectations\n", + "1. Collecting Information\n", + "1. Reacting to Data / Revising Expectations\n", + "\n", + "Following the iterative process, we'll guide you in completing the challenge." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Problem Solving Iteration 1\n", + "\n", + "In this iteration we'll analyze the problem and identify the breakthrough. The original question statement is kind of vague because we don't know what a *good pokemon* really means as represented in the data. We'll start by understanding the dataset and see if we can find some insights." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# Import libraries\n", + "import numpy as np\n", + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \n", + "0 65 65 45 1 False \n", + "1 80 80 60 1 False \n", + "2 100 100 80 1 False \n", + "3 122 120 80 1 False \n", + "4 60 50 65 1 False " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Importing the dataset\n", + "pokemon = pd.read_csv('/Users/anna/iron_hack/lab-dataframe-calculations/your-code/pokemon.csv')\n", + "pokemon.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the data it seems whether a pokemon is good depends on its abilities as represented in the fields of `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total`. We are not sure about `Generation` and `Legendary` because they are not necessarily the decisive factors of the pokemon abilities.\n", + "\n", + "But `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total` are a lot of fields! If we look at them all at once it's very complicated. This isn't Mission Impossible but it's ideal that we tackle this kind of problem after we learn Machine Learning (which you will do in Module 3). For now, is there a way to consolidate the fields we need to look into?\n", + "\n", + "Fortunately there seems to be a way. It appears the `Total` field is computed based on the other 6 fields. But we need to prove our theory. If we can approve there is a formula to compute `Total` based on the other 6 abilities, we only need to look into `Total`.\n", + "\n", + "We have the following expectation now:\n", + "\n", + "#### The `Total` field is computed based on `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed`.\n", + "\n", + "We need to collect the following information:\n", + "\n", + "* **What is the formula to compute `Total`?**\n", + "* **Does the formula work for all pokemon?**\n", + "\n", + "In the cell below, make a hypothesis on how `Total` is computed and test your hypothesis." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here\n", + "pokemon_subset = pokemon.head(20).copy()\n", + "\n", + "pokemon_subset['Calculated Total'] = pokemon_subset['HP'] + pokemon_subset['Attack'] + \\\n", + " pokemon_subset['Defense'] + pokemon_subset['Sp. Atk'] + \\\n", + " pokemon_subset['Sp. Def'] + pokemon_subset['Speed']\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Your hypothesis is correct!\n" + ] + } + ], + "source": [ + "if pokemon_subset['Total'].equals(pokemon_subset['Calculated Total']):\n", + " print('Your hypothesis is correct!')\n", + "else:\n", + " print('Your hypothesis is incorrect!')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Problem Solving Iteration 2\n", + "\n", + "Now that we have consolidated the abilities fields, we can update the problem statement. The new problem statement is:\n", + "\n", + "### Which pokemon type is most likely to have the highest `Total` value?\n", + "\n", + "In the updated problem statement, we assume there is a certain relationship between the `Total` and the pokemon type. But we have two *type* fields (`Type 1` and `Type 2`) that have string values. In data analysis, string fields have to be transformed to numerical format in order to be analyzed. \n", + "\n", + "In addition, keep in mind that `Type 1` always has a value but `Type 2` is sometimes empty (having the `NaN` value). Also, the pokemon type we choose may be either in `Type 1` or `Type 2`.\n", + "\n", + "Now our expectation is:\n", + "\n", + "#### `Type 1` and `Type 2` string variables need to be converted to numerical variables in order to identify the relationship between `Total` and the pokemon type.\n", + "\n", + "The information we need to collect is:\n", + "\n", + "#### How to convert two string variables to numerical?\n", + "\n", + "Let's address the first question first. You can use a method called **One Hot Encoding** which is frequently used in machine learning to encode categorical string variables to numerical. The idea is to gather all the possible string values in a categorical field and create a numerical field for each unique string value. Each of those numerical fields uses `1` and `0` to indicate whether the data record has the corresponding categorical value. A detailed explanation of One Hot Encoding can be found in [this article](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f). You will formally learn it in Module 3.\n", + "\n", + "For instance, if a pokemon has `Type 1` as `Poison` and `Type 2` as `Fire`, then its `Poison` and `Fire` fields are `1` whereas all other fields are `0`. If a pokemon has `Type 1` as `Water` and `Type 2` as `NaN`, then its `Water` field is `1` whereas all other fields are `0`.\n", + "\n", + "#### In the next cell, use One Hot Encoding to encode `Type 1` and `Type 2`. Use the pokemon type values as the names of the numerical fields you create.\n", + "\n", + "The new numerical variables you create should look like below:\n", + "\n", + "![One Hot Encoding](../images/one-hot-encoding.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BugDarkDragonElectricFairyFightingFireFlyingGhostGrassGroundIceNormalPoisonPsychicRockSteelWater
0000000000100010000
1000000000100010000
2000000000100010000
3000000000100010000
4000000100000000000
.........................................................
795000010000000000100
796000010000000000100
797000000001000001000
798010000000000001000
799000000100000000001
\n", + "

800 rows × 18 columns

\n", + "
" + ], + "text/plain": [ + " Bug Dark Dragon Electric Fairy Fighting Fire Flying Ghost Grass \\\n", + "0 0 0 0 0 0 0 0 0 0 1 \n", + "1 0 0 0 0 0 0 0 0 0 1 \n", + "2 0 0 0 0 0 0 0 0 0 1 \n", + "3 0 0 0 0 0 0 0 0 0 1 \n", + "4 0 0 0 0 0 0 1 0 0 0 \n", + ".. ... ... ... ... ... ... ... ... ... ... \n", + "795 0 0 0 0 1 0 0 0 0 0 \n", + "796 0 0 0 0 1 0 0 0 0 0 \n", + "797 0 0 0 0 0 0 0 0 1 0 \n", + "798 0 1 0 0 0 0 0 0 0 0 \n", + "799 0 0 0 0 0 0 1 0 0 0 \n", + "\n", + " Ground Ice Normal Poison Psychic Rock Steel Water \n", + "0 0 0 0 1 0 0 0 0 \n", + "1 0 0 0 1 0 0 0 0 \n", + "2 0 0 0 1 0 0 0 0 \n", + "3 0 0 0 1 0 0 0 0 \n", + "4 0 0 0 0 0 0 0 0 \n", + ".. ... ... ... ... ... ... ... ... \n", + "795 0 0 0 0 0 1 0 0 \n", + "796 0 0 0 0 0 1 0 0 \n", + "797 0 0 0 0 1 0 0 0 \n", + "798 0 0 0 0 1 0 0 0 \n", + "799 0 0 0 0 0 0 0 1 \n", + "\n", + "[800 rows x 18 columns]" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "one_hot_encoded_type1 = pd.get_dummies(pokemon['Type 1'])\n", + "one_hot_encoded_type2 = pd.get_dummies(pokemon['Type 2'])\n", + "\n", + "pokemon_encoded = one_hot_encoded_type1+one_hot_encoded_type2\n", + "pokemon_encoded" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Problem Solving Iteration 3\n", + "\n", + "Now we have encoded the pokemon types, we will identify the relationship between `Total` and the encoded fields. Our expectation is:\n", + "\n", + "#### There are relationships between `Total` and the encoded pokemon type variables and we need to identify the correlations.\n", + "\n", + "The information we need to collect is:\n", + "\n", + "#### How to identify the relationship between `Total` and the encoded pokemon type fields?\n", + "\n", + "There are multiple ways to answer this question. The easiest way is to use correlation. In the cell below, calculate the correlation of `Total` to each of the encoded fields. Rank the correlations and identify the #1 pokemon type that is most likely to have the highest `Total`." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Dragon 0.229705\n", + "Psychic 0.124688\n", + "Steel 0.109703\n", + "Fire 0.078726\n", + "Fighting 0.077786\n", + "Ice 0.060248\n", + "Flying 0.059383\n", + "Dark 0.056154\n", + "Rock 0.032731\n", + "Electric 0.020971\n", + "Ground 0.015060\n", + "Ghost 0.003641\n", + "Water -0.021665\n", + "Fairy -0.036698\n", + "Grass -0.052592\n", + "Poison -0.090441\n", + "Normal -0.105331\n", + "Bug -0.145781\n", + "dtype: float64" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "correlations = pokemon_encoded.corrwith(pokemon['Total']).sort_values(ascending=False)\n", + "correlations\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Bonus Question\n", + "\n", + "Say now you can choose both `Type 1` and `Type 2` of the pokemon. In order to receive the best pokemon, which types will you choose?" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryCalculated Total
417380LatiasDragonPsychic6008080901101301103TrueNaN
418380LatiasMega LatiasDragonPsychic700801001201401501103TrueNaN
419381LatiosDragonPsychic6008090801301101103TrueNaN
420381LatiosMega LatiosDragonPsychic700801301001601201103TrueNaN
\n", + "
" + ], + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "417 380 Latias Dragon Psychic 600 80 80 90 \n", + "418 380 LatiasMega Latias Dragon Psychic 700 80 100 120 \n", + "419 381 Latios Dragon Psychic 600 80 90 80 \n", + "420 381 LatiosMega Latios Dragon Psychic 700 80 130 100 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary Calculated Total \n", + "417 110 130 110 3 True NaN \n", + "418 140 150 110 3 True NaN \n", + "419 130 110 110 3 True NaN \n", + "420 160 120 110 3 True NaN " + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "pokemon[(pokemon['Type 1'] == 'Dragon') & (pokemon['Type 2'] == 'Psychic')]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/your-code/challenge-3.ipynb b/your-code/challenge-3.ipynb index a42a586..1c21333 100644 --- a/your-code/challenge-3.ipynb +++ b/your-code/challenge-3.ipynb @@ -1,147 +1,1265 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 3\n", - "\n", - "In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n", - "\n", - "You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n", - "\n", - "**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n", - "\n", - "**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Q1: How to identify VIP & Preferred Customers?\n", - "\n", - "We start by importing all the required libraries:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import required libraries\n", - "import numpy as np\n", - "import pandas as pd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n", - "\n", - "## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We break down the main problem into several sub problems:\n", - "\n", - "#### Sub Problem 1: How to aggregate the `amount_spent` for unique customers?\n", - "\n", - "#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n", - "\n", - "#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", - "\n", - "*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n", - "\n", - "Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", - "\n", - "## Q2: How to identify which country has the most VIP Customers?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Q3: How to identify which country has the most VIP+Preferred Customers combined?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Challenge 3\n", + "\n", + "In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n", + "\n", + "You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n", + "\n", + "**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n", + "\n", + "**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q1: How to identify VIP & Preferred Customers?\n", + "\n", + "We start by importing all the required libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# import required libraries\n", + "import numpy as np\n", + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spent
0053636585123A20101238white hanging heart t-light holder62010-12-01 08:26:002.5517850United Kingdom15.30
115363657105320101238white metal lantern62010-12-01 08:26:003.3917850United Kingdom20.34
2253636584406B20101238cream cupid hearts coat hanger82010-12-01 08:26:002.7517850United Kingdom22.00
3353636584029G20101238knitted union flag hot water bottle62010-12-01 08:26:003.3917850United Kingdom20.34
4453636584029E20101238red woolly hottie white heart.62010-12-01 08:26:003.3917850United Kingdom20.34
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "0 0 536365 85123A 2010 12 3 8 \n", + "1 1 536365 71053 2010 12 3 8 \n", + "2 2 536365 84406B 2010 12 3 8 \n", + "3 3 536365 84029G 2010 12 3 8 \n", + "4 4 536365 84029E 2010 12 3 8 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "0 white hanging heart t-light holder 6 2010-12-01 08:26:00 \n", + "1 white metal lantern 6 2010-12-01 08:26:00 \n", + "2 cream cupid hearts coat hanger 8 2010-12-01 08:26:00 \n", + "3 knitted union flag hot water bottle 6 2010-12-01 08:26:00 \n", + "4 red woolly hottie white heart. 6 2010-12-01 08:26:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent \n", + "0 2.55 17850 United Kingdom 15.30 \n", + "1 3.39 17850 United Kingdom 20.34 \n", + "2 2.75 17850 United Kingdom 22.00 \n", + "3 3.39 17850 United Kingdom 20.34 \n", + "4 3.39 17850 United Kingdom 20.34 " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "orders = pd.read_csv('/Users/anna/iron_hack/lab-dataframe-calculations/your-code/Orders.csv')\n", + "orders.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n", + "\n", + "## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We break down the main problem into several sub problems:\n", + "\n", + "#### Sub Problem 1: How to aggregate the `amount_spent` for unique customers?\n", + "\n", + "#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n", + "\n", + "#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", + "\n", + "*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n", + "\n", + "Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CustomerIDamount_spent
01234677183.60
1123474310.00
2123481797.24
3123491757.55
412350334.40
.........
433418280180.60
43351828180.82
433618282178.05
4337182832094.88
4338182871837.28
\n", + "

4339 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " CustomerID amount_spent\n", + "0 12346 77183.60\n", + "1 12347 4310.00\n", + "2 12348 1797.24\n", + "3 12349 1757.55\n", + "4 12350 334.40\n", + "... ... ...\n", + "4334 18280 180.60\n", + "4335 18281 80.82\n", + "4336 18282 178.05\n", + "4337 18283 2094.88\n", + "4338 18287 1837.28\n", + "\n", + "[4339 rows x 2 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your code here\n", + "\n", + "customer_spending = orders.groupby('CustomerID')['amount_spent'].sum().reset_index()\n", + "customer_spending" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "lower_quantile_preferred_customers = customer_spending['amount_spent'].quantile(0.75)\n", + "upper_quantile_preferred_customers = customer_spending['amount_spent'].quantile(0.95)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "quantile_range_preferred_customers = customer_spending[\n", + " (customer_spending['amount_spent'] >= lower_quantile_preferred_customers) &\n", + " (customer_spending['amount_spent'] < upper_quantile_preferred_customers)\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CustomerIDamount_spent
1123474310.00
2123481797.24
3123491757.55
5123522506.04
9123562811.43
.........
4319182592338.60
4320182602643.20
4328182723078.58
4337182832094.88
4338182871837.28
\n", + "

868 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " CustomerID amount_spent\n", + "1 12347 4310.00\n", + "2 12348 1797.24\n", + "3 12349 1757.55\n", + "5 12352 2506.04\n", + "9 12356 2811.43\n", + "... ... ...\n", + "4319 18259 2338.60\n", + "4320 18260 2643.20\n", + "4328 18272 3078.58\n", + "4337 18283 2094.88\n", + "4338 18287 1837.28\n", + "\n", + "[868 rows x 2 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "quantile_range_preferred_customers" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "lower_quantile_vip_customers = customer_spending['amount_spent'].quantile(0.95)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "quantile_range_vip_customers = customer_spending[\n", + " (customer_spending['amount_spent'] >= lower_quantile_vip_customers)]" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CustomerIDamount_spent
01234677183.60
10123576207.67
12123596372.58
501240911072.67
5512415124914.53
.........
4207181098052.97
4229181398438.34
4253181727561.68
4292182236484.54
4298182297276.90
\n", + "

217 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " CustomerID amount_spent\n", + "0 12346 77183.60\n", + "10 12357 6207.67\n", + "12 12359 6372.58\n", + "50 12409 11072.67\n", + "55 12415 124914.53\n", + "... ... ...\n", + "4207 18109 8052.97\n", + "4229 18139 8438.34\n", + "4253 18172 7561.68\n", + "4292 18223 6484.54\n", + "4298 18229 7276.90\n", + "\n", + "[217 rows x 2 columns]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "quantile_range_vip_customers" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def get_customer_type(customer_id):\n", + " if customer_id in quantile_range_vip_customers['CustomerID'].values:\n", + " return \"VIP\"\n", + " elif customer_id in quantile_range_preferred_customers['CustomerID'].values:\n", + " return 'Preferred'\n", + " else:\n", + " return \"Normal\"" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "orders['Customer type'] = orders['CustomerID'].apply(get_customer_type)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spentCustomer type
0053636585123A20101238white hanging heart t-light holder62010-12-01 08:26:002.5517850United Kingdom15.30Preferred
115363657105320101238white metal lantern62010-12-01 08:26:003.3917850United Kingdom20.34Preferred
2253636584406B20101238cream cupid hearts coat hanger82010-12-01 08:26:002.7517850United Kingdom22.00Preferred
3353636584029G20101238knitted union flag hot water bottle62010-12-01 08:26:003.3917850United Kingdom20.34Preferred
4453636584029E20101238red woolly hottie white heart.62010-12-01 08:26:003.3917850United Kingdom20.34Preferred
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "0 0 536365 85123A 2010 12 3 8 \n", + "1 1 536365 71053 2010 12 3 8 \n", + "2 2 536365 84406B 2010 12 3 8 \n", + "3 3 536365 84029G 2010 12 3 8 \n", + "4 4 536365 84029E 2010 12 3 8 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "0 white hanging heart t-light holder 6 2010-12-01 08:26:00 \n", + "1 white metal lantern 6 2010-12-01 08:26:00 \n", + "2 cream cupid hearts coat hanger 8 2010-12-01 08:26:00 \n", + "3 knitted union flag hot water bottle 6 2010-12-01 08:26:00 \n", + "4 red woolly hottie white heart. 6 2010-12-01 08:26:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent Customer type \n", + "0 2.55 17850 United Kingdom 15.30 Preferred \n", + "1 3.39 17850 United Kingdom 20.34 Preferred \n", + "2 2.75 17850 United Kingdom 22.00 Preferred \n", + "3 3.39 17850 United Kingdom 20.34 Preferred \n", + "4 3.39 17850 United Kingdom 20.34 Preferred " + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "orders.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spentCustomer type
26265363702272820101238alarm clock bakelike pink242010-12-01 08:45:003.7512583France90.0VIP
27275363702272720101238alarm clock bakelike red242010-12-01 08:45:003.7512583France90.0VIP
28285363702272620101238alarm clock bakelike green122010-12-01 08:45:003.7512583France45.0VIP
29295363702172420101238panda and bunnies sticker sheet122010-12-01 08:45:000.8512583France10.2VIP
30305363702188320101238stars gift tape242010-12-01 08:45:000.6512583France15.6VIP
................................................
397883541868581584850382011125126 chocolate love heart t-lights482011-12-09 12:25:001.8513777United Kingdom88.8VIP
39790554189058158622061201112512large cake stand hanging strawbery82011-12-09 12:49:002.9513113United Kingdom23.6VIP
39790654189158158623275201112512set of 3 hanging owls ollie beak242011-12-09 12:49:001.2513113United Kingdom30.0VIP
39790754189258158621217201112512red retrospot round cake tins242011-12-09 12:49:008.9513113United Kingdom214.8VIP
39790854189358158620685201112512doormat red retrospot102011-12-09 12:49:007.0813113United Kingdom70.8VIP
\n", + "

104484 rows × 15 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "26 26 536370 22728 2010 12 3 8 \n", + "27 27 536370 22727 2010 12 3 8 \n", + "28 28 536370 22726 2010 12 3 8 \n", + "29 29 536370 21724 2010 12 3 8 \n", + "30 30 536370 21883 2010 12 3 8 \n", + "... ... ... ... ... ... ... ... \n", + "397883 541868 581584 85038 2011 12 5 12 \n", + "397905 541890 581586 22061 2011 12 5 12 \n", + "397906 541891 581586 23275 2011 12 5 12 \n", + "397907 541892 581586 21217 2011 12 5 12 \n", + "397908 541893 581586 20685 2011 12 5 12 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "26 alarm clock bakelike pink 24 2010-12-01 08:45:00 \n", + "27 alarm clock bakelike red 24 2010-12-01 08:45:00 \n", + "28 alarm clock bakelike green 12 2010-12-01 08:45:00 \n", + "29 panda and bunnies sticker sheet 12 2010-12-01 08:45:00 \n", + "30 stars gift tape 24 2010-12-01 08:45:00 \n", + "... ... ... ... \n", + "397883 6 chocolate love heart t-lights 48 2011-12-09 12:25:00 \n", + "397905 large cake stand hanging strawbery 8 2011-12-09 12:49:00 \n", + "397906 set of 3 hanging owls ollie beak 24 2011-12-09 12:49:00 \n", + "397907 red retrospot round cake tins 24 2011-12-09 12:49:00 \n", + "397908 doormat red retrospot 10 2011-12-09 12:49:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent Customer type \n", + "26 3.75 12583 France 90.0 VIP \n", + "27 3.75 12583 France 90.0 VIP \n", + "28 3.75 12583 France 45.0 VIP \n", + "29 0.85 12583 France 10.2 VIP \n", + "30 0.65 12583 France 15.6 VIP \n", + "... ... ... ... ... ... \n", + "397883 1.85 13777 United Kingdom 88.8 VIP \n", + "397905 2.95 13113 United Kingdom 23.6 VIP \n", + "397906 1.25 13113 United Kingdom 30.0 VIP \n", + "397907 8.95 13113 United Kingdom 214.8 VIP \n", + "397908 7.08 13113 United Kingdom 70.8 VIP \n", + "\n", + "[104484 rows x 15 columns]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "orders[orders[\"Customer type\"] == 'VIP']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", + "\n", + "## Q2: How to identify which country has the most VIP Customers?" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here\n", + "\n", + "vip_customers = orders[orders['Customer type'] == 'VIP']\n", + "\n", + "vip_counts_by_country = vip_customers.groupby('Country').size().reset_index(name='VIP Count')\n", + "\n", + "most_vip_country = vip_counts_by_country.sort_values(by='VIP Count', ascending=False).iloc[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Country United Kingdom\n", + "VIP Count 84185\n", + "Name: 17, dtype: object" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "most_vip_country" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q3: How to identify which country has the most VIP+Preferred Customers combined?" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here\n", + "\n", + "vip_preferred_customers = orders[(orders['Customer type'] == 'VIP') | (orders['Customer type'] == 'Preferred')]\n", + "\n", + "vip_preferred_by_country = vip_preferred_customers.groupby('Country').size().reset_index(name='VIP+Preferred Count')\n", + "\n", + "most_vip_preferred_country = vip_preferred_counts_by_country.sort_values(by='VIP+Preferred Count', ascending=False).iloc[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Country United Kingdom\n", + "VIP+Preferred Count 221635\n", + "Name: 26, dtype: object" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "most_vip_preferred_country" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}