From b215994d0cd8ae44f76730317c90fa97bbe220a1 Mon Sep 17 00:00:00 2001 From: Alex Husiev Date: Tue, 20 Aug 2024 22:04:38 +0200 Subject: [PATCH 1/2] Alex --- your-code/challenge-1.ipynb | 2502 +++++++++++++++++++++++++++++++---- your-code/challenge-2.ipynb | 947 ++++++++++--- your-code/challenge-3.ipynb | 877 ++++++++++-- 3 files changed, 3708 insertions(+), 618 deletions(-) diff --git a/your-code/challenge-1.ipynb b/your-code/challenge-1.ipynb index cd674cb..32c9df7 100644 --- a/your-code/challenge-1.ipynb +++ b/your-code/challenge-1.ipynb @@ -1,276 +1,2226 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 1\n", - "\n", - "In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.\n", - "\n", - "![Pokemon](../images/pokemon.jpg)\n", - "\n", - "Follow the instructions below and enter your code.\n", - "\n", - "#### Import all required libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import libraries" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Import data set.\n", - "\n", - "Read the dataset `pokemon.csv` into a dataframe called `pokemon`.\n", - "\n", - "*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Print first 10 rows of `pokemon`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.\n", - "\n", - "For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.\n", - "\n", - "| Column | Description |\n", - "| --- | --- |\n", - "| # | ID for each pokemon |\n", - "| Name | Name of each pokemon |\n", - "| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |\n", - "| Type 2 | Some pokemon are dual type and have 2 |\n", - "| Total | A general guide to how strong a pokemon is |\n", - "| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |\n", - "| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |\n", - "| Defense | The base damage resistance against normal attacks |\n", - "| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |\n", - "| SP Def | The base damage resistance against special attacks |\n", - "| Speed | Determines which pokemon attacks first each round |\n", - "| Generation | Number of generation |\n", - "| Legendary | True if Legendary Pokemon False if not |" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Obtain the distinct values across `Type 1` and `Type 2`.\n", - "\n", - "Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Cleanup `Name` that contain \"Mega\".\n", - "\n", - "If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain \"Mega\". We want to clean up the pokemon names. For instance, \"VenusaurMega Venusaur\" should be \"Mega Venusaur\", and \"CharizardMega Charizard X\" should be \"Mega Charizard X\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here\n", - "\n", - "\n", - "# test transformed data\n", - "pokemon.head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.\n", - "\n", - "For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon with the highest `A/D Ratio`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon with the lowest A/D Ratio." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.\n", - "\n", - "Rules:\n", - "\n", - "* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of ` `. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.\n", - "\n", - "* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Identify the pokemon whose `A/D Ratio` are among the top 5." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.\n", - "\n", - "Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.\n", - "\n", - "Your output should look like below:\n", - "\n", - "![Aggregate](../images/aggregated-mean.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "ZJH9wO3-iq7h" + }, + "source": [ + "# Challenge 1\n", + "\n", + "In this challenge you will be working on **Pokemon**. You will answer a series of questions in order to practice dataframe calculation, aggregation, and transformation.\n", + "\n", + "![Pokemon](../images/pokemon.jpg)\n", + "\n", + "Follow the instructions below and enter your code.\n", + "\n", + "#### Import all required libraries." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "ersihvOoiq7i" + }, + "outputs": [], + "source": [ + "# import libraries\n", + "import pandas as pd\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TqPgYOzriq7j" + }, + "source": [ + "#### Import data set.\n", + "\n", + "Read the dataset `pokemon.csv` into a dataframe called `pokemon`.\n", + "\n", + "*Data set attributed to [Alberto Barradas](https://www.kaggle.com/abcsds/pokemon/)*" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "BDB9p0wsiq7j" + }, + "outputs": [], + "source": [ + "# import dataset\n", + "pokemon = pd.read_csv('Pokemon.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dpDVxx5giq7j" + }, + "source": [ + "#### Print first 10 rows of `pokemon`." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 432 + }, + "id": "6lkleR-2iq7k", + "outputId": "c62526f8-1da9-410c-9a40-3aae2740c0c2" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + "5 5 Charmeleon Fire NaN 405 58 64 58 \n", + "6 6 Charizard Fire Flying 534 78 84 78 \n", + "7 6 CharizardMega Charizard X Fire Dragon 634 78 130 111 \n", + "8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 \n", + "9 7 Squirtle Water NaN 314 44 48 65 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \n", + "0 65 65 45 1 False \n", + "1 80 80 60 1 False \n", + "2 100 100 80 1 False \n", + "3 122 120 80 1 False \n", + "4 60 50 65 1 False \n", + "5 80 65 80 1 False \n", + "6 109 85 100 1 False \n", + "7 130 85 100 1 False \n", + "8 159 115 100 1 False \n", + "9 50 64 43 1 False " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
55CharmeleonFireNaN4055864588065801False
66CharizardFireFlying534788478109851001False
76CharizardMega Charizard XFireDragon63478130111130851001False
86CharizardMega Charizard YFireFlying63478104781591151001False
97SquirtleWaterNaN3144448655064431False
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "pokemon", + "summary": "{\n \"name\": \"pokemon\",\n \"rows\": 800,\n \"fields\": [\n {\n \"column\": \"#\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 208,\n \"min\": 1,\n \"max\": 721,\n \"num_unique_values\": 721,\n \"samples\": [\n 260,\n 659,\n 78\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 800,\n \"samples\": [\n \"Hydreigon\",\n \"Beheeyem\",\n \"Growlithe\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 1\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Grass\",\n \"Fire\",\n \"Fairy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 2\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Poison\",\n \"Flying\",\n \"Steel\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 119,\n \"min\": 180,\n \"max\": 780,\n \"num_unique_values\": 200,\n \"samples\": [\n 700,\n 349,\n 505\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HP\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 25,\n \"min\": 1,\n \"max\": 255,\n \"num_unique_values\": 94,\n \"samples\": [\n 106,\n 81,\n 170\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Attack\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 5,\n \"max\": 190,\n \"num_unique_values\": 111,\n \"samples\": [\n 79,\n 63,\n 52\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Defense\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31,\n \"min\": 5,\n \"max\": 230,\n \"num_unique_values\": 103,\n \"samples\": [\n 20,\n 88,\n 23\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Atk\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 10,\n \"max\": 194,\n \"num_unique_values\": 105,\n \"samples\": [\n 58,\n 150,\n 160\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Def\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 27,\n \"min\": 20,\n \"max\": 230,\n \"num_unique_values\": 92,\n \"samples\": [\n 154,\n 45,\n 44\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Speed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 29,\n \"min\": 5,\n \"max\": 180,\n \"num_unique_values\": 108,\n \"samples\": [\n 113,\n 50,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Generation\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 6,\n \"num_unique_values\": 6,\n \"samples\": [\n 1,\n 2,\n 6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Legendary\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 6 + } + ], + "source": [ + "# your code here\n", + "pokemon.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Wv2otovyiq7k" + }, + "source": [ + "When you look at a data set, you often wonder what each column means. Some open-source data sets provide descriptions of the data set. In many cases, data descriptions are extremely useful for data analysts to perform work efficiently and successfully.\n", + "\n", + "> Add blockquote\n", + "\n", + "\n", + "\n", + "For the `Pokemon.csv` data set, fortunately, the owner provided descriptions which you can see [here](https://www.kaggle.com/abcsds/pokemon/home). For your convenience, we are including the descriptions below. Read the descriptions and understand what each column means. This knowledge is helpful in your work with the data.\n", + "\n", + "| Column | Description |\n", + "| --- | --- |\n", + "| # | ID for each pokemon |\n", + "| Name | Name of each pokemon |\n", + "| Type 1 | Each pokemon has a type, this determines weakness/resistance to attacks |\n", + "| Type 2 | Some pokemon are dual type and have 2 |\n", + "| Total | A general guide to how strong a pokemon is |\n", + "| HP | Hit points, or health, defines how much damage a pokemon can withstand before fainting |\n", + "| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |\n", + "| Defense | The base damage resistance against normal attacks |\n", + "| SP Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |\n", + "| SP Def | The base damage resistance against special attacks |\n", + "| Speed | Determines which pokemon attacks first each round |\n", + "| Generation | Number of generation |\n", + "| Legendary | True if Legendary Pokemon False if not |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RlXsLBvtiq7k" + }, + "source": [ + "#### Obtain the distinct values across `Type 1` and `Type 2`.\n", + "\n", + "Exctract all the values in `Type 1` and `Type 2`. Then create an array containing the distinct values across both fields." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "JG5l4Rpliq7k", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3c4dc63c-8a76-43ea-9212-646f6daa7959" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'\n", + " 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'\n", + " 'Flying']\n" + ] + } + ], + "source": [ + "# your code here\n", + "# Extract the values from 'Type 1' and 'Type 2'\n", + "type1_values = pokemon['Type 1'].dropna().unique() # Dropping NaN values\n", + "type2_values = pokemon['Type 2'].dropna().unique() # Dropping NaN values\n", + "\n", + "# Combine both arrays\n", + "combined_types = pd.Series(list(type1_values) + list(type2_values))\n", + "\n", + "# Find unique values across both columns\n", + "distinct_values = combined_types.unique()\n", + "\n", + "print(distinct_values)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6g7B6aWniq7l" + }, + "source": [ + "#### Cleanup `Name` that contain \"Mega\".\n", + "\n", + "If you have checked out the pokemon names carefully enough, you should have found there are junk texts in the pokemon names which contain \"Mega\". We want to clean up the pokemon names. For instance, \"VenusaurMega Venusaur\" should be \"Mega Venusaur\", and \"CharizardMega Charizard X\" should be \"Mega Charizard X\"." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 432 + }, + "id": "LbbU4AhNiq7l", + "outputId": "0c480a8c-735a-466b-904c-b1b8489d0738" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 Venusaur Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + "5 5 Charmeleon Fire NaN 405 58 64 58 \n", + "6 6 Charizard Fire Flying 534 78 84 78 \n", + "7 6 Charizard Charizard X Fire Dragon 634 78 130 111 \n", + "8 6 Charizard Charizard Y Fire Flying 634 78 104 78 \n", + "9 7 Squirtle Water NaN 314 44 48 65 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \n", + "0 65 65 45 1 False \n", + "1 80 80 60 1 False \n", + "2 100 100 80 1 False \n", + "3 122 120 80 1 False \n", + "4 60 50 65 1 False \n", + "5 80 65 80 1 False \n", + "6 109 85 100 1 False \n", + "7 130 85 100 1 False \n", + "8 159 115 100 1 False \n", + "9 50 64 43 1 False " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33Venusaur VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
55CharmeleonFireNaN4055864588065801False
66CharizardFireFlying534788478109851001False
76Charizard Charizard XFireDragon63478130111130851001False
86Charizard Charizard YFireFlying63478104781591151001False
97SquirtleWaterNaN3144448655064431False
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "pokemon", + "summary": "{\n \"name\": \"pokemon\",\n \"rows\": 800,\n \"fields\": [\n {\n \"column\": \"#\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 208,\n \"min\": 1,\n \"max\": 721,\n \"num_unique_values\": 721,\n \"samples\": [\n 260,\n 659,\n 78\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 800,\n \"samples\": [\n \"Hydreigon\",\n \"Beheeyem\",\n \"Growlithe\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 1\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Grass\",\n \"Fire\",\n \"Fairy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 2\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Poison\",\n \"Flying\",\n \"Steel\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 119,\n \"min\": 180,\n \"max\": 780,\n \"num_unique_values\": 200,\n \"samples\": [\n 700,\n 349,\n 505\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HP\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 25,\n \"min\": 1,\n \"max\": 255,\n \"num_unique_values\": 94,\n \"samples\": [\n 106,\n 81,\n 170\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Attack\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 5,\n \"max\": 190,\n \"num_unique_values\": 111,\n \"samples\": [\n 79,\n 63,\n 52\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Defense\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31,\n \"min\": 5,\n \"max\": 230,\n \"num_unique_values\": 103,\n \"samples\": [\n 20,\n 88,\n 23\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Atk\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 10,\n \"max\": 194,\n \"num_unique_values\": 105,\n \"samples\": [\n 58,\n 150,\n 160\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Def\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 27,\n \"min\": 20,\n \"max\": 230,\n \"num_unique_values\": 92,\n \"samples\": [\n 154,\n 45,\n 44\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Speed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 29,\n \"min\": 5,\n \"max\": 180,\n \"num_unique_values\": 108,\n \"samples\": [\n 113,\n 50,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Generation\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 6,\n \"num_unique_values\": 6,\n \"samples\": [\n 1,\n 2,\n 6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Legendary\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 18 + } + ], + "source": [ + "# your code here\n", + "pokemon['Name'] = pokemon['Name'].str.replace('Mega', '', regex=False).str.strip()\n", + "\n", + "\n", + "# test transformed data\n", + "pokemon.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SaN8Yf2hiq7l" + }, + "source": [ + "#### Create a new column called `A/D Ratio` whose value equals to `Attack` devided by `Defense`.\n", + "\n", + "For instance, if a pokemon has the Attack score 49 and Defense score 49, the corresponding `A/D Ratio` is 49/49=1." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "JjkB_N3giq7l", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 261 + }, + "outputId": "025c3d37-baba-4e2c-a0a4-43e9e2973ebb" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 65 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 80 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 100 \n", + "3 3 Venusaur Venusaur Grass Poison 625 80 100 123 122 \n", + "4 4 Charmander Fire NaN 309 39 52 43 60 \n", + "\n", + " Sp. Def Speed Generation Legendary A/D Ratio \n", + "0 65 45 1 False 1.000000 \n", + "1 80 60 1 False 0.984127 \n", + "2 100 80 1 False 0.987952 \n", + "3 120 80 1 False 0.813008 \n", + "4 50 65 1 False 1.209302 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D Ratio
01BulbasaurGrassPoison3184549496565451False1.000000
12IvysaurGrassPoison4056062638080601False0.984127
23VenusaurGrassPoison525808283100100801False0.987952
33Venusaur VenusaurGrassPoison62580100123122120801False0.813008
44CharmanderFireNaN3093952436050651False1.209302
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "pokemon", + "summary": "{\n \"name\": \"pokemon\",\n \"rows\": 800,\n \"fields\": [\n {\n \"column\": \"#\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 208,\n \"min\": 1,\n \"max\": 721,\n \"num_unique_values\": 721,\n \"samples\": [\n 260,\n 659,\n 78\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 800,\n \"samples\": [\n \"Hydreigon\",\n \"Beheeyem\",\n \"Growlithe\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 1\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Grass\",\n \"Fire\",\n \"Fairy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 2\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Poison\",\n \"Flying\",\n \"Steel\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 119,\n \"min\": 180,\n \"max\": 780,\n \"num_unique_values\": 200,\n \"samples\": [\n 700,\n 349,\n 505\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HP\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 25,\n \"min\": 1,\n \"max\": 255,\n \"num_unique_values\": 94,\n \"samples\": [\n 106,\n 81,\n 170\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Attack\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 5,\n \"max\": 190,\n \"num_unique_values\": 111,\n \"samples\": [\n 79,\n 63,\n 52\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Defense\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31,\n \"min\": 5,\n \"max\": 230,\n \"num_unique_values\": 103,\n \"samples\": [\n 20,\n 88,\n 23\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Atk\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 10,\n \"max\": 194,\n \"num_unique_values\": 105,\n \"samples\": [\n 58,\n 150,\n 160\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Def\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 27,\n \"min\": 20,\n \"max\": 230,\n \"num_unique_values\": 92,\n \"samples\": [\n 154,\n 45,\n 44\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Speed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 29,\n \"min\": 5,\n \"max\": 180,\n \"num_unique_values\": 108,\n \"samples\": [\n 113,\n 50,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Generation\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 6,\n \"num_unique_values\": 6,\n \"samples\": [\n 1,\n 2,\n 6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Legendary\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"A/D Ratio\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.5526044976174346,\n \"min\": 0.043478260869565216,\n \"max\": 9.0,\n \"num_unique_values\": 399,\n \"samples\": [\n 1.0684931506849316,\n 1.6428571428571428\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "# your code here\n", + "pokemon['A/D Ratio'] = pokemon['Attack'] / pokemon['Defense']\n", + "pokemon['A/D Ratio'].replace([float('inf'), -float('inf')], pd.NA, inplace=True)\n", + "\n", + "pokemon.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Uz9MHtx-iq7l" + }, + "source": [ + "#### Identify the pokemon with the highest `A/D Ratio`." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "7jDoVyQGiq7l", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 135 + }, + "outputId": "db4f0d5b-d442-4027-b94f-8b67a472dc43" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "429 386 DeoxysAttack Forme Psychic NaN 600 50 180 20 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ratio \n", + "429 180 20 150 3 True 9.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D Ratio
429386DeoxysAttack FormePsychicNaN6005018020180201503True9.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "max_ad_pokemon", + "repr_error": "0" + } + }, + "metadata": {}, + "execution_count": 22 + } + ], + "source": [ + "# your code here\n", + "max_ad_ratio = pokemon['A/D Ratio'].max()\n", + "max_ad_pokemon = pokemon[pokemon['A/D Ratio'] == max_ad_ratio] # Get the row with the maximum value\n", + "max_ad_pokemon" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ObK6aFTmiq7l" + }, + "source": [ + "#### Identify the pokemon with the lowest A/D Ratio." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "id": "ZsEiGv_tiq7m", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 118 + }, + "outputId": "5229f242-64a4-467b-ac4c-90f8d0572a81" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def \\\n", + "230 213 Shuckle Bug Rock 505 20 10 230 10 230 \n", + "\n", + " Speed Generation Legendary A/D Ratio \n", + "230 5 2 False 0.043478 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryA/D Ratio
230213ShuckleBugRock50520102301023052False0.043478
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "min_ad_pokemon", + "summary": "{\n \"name\": \"min_ad_pokemon\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"#\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 213,\n \"max\": 213,\n \"num_unique_values\": 1,\n \"samples\": [\n 213\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Shuckle\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 1\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Bug\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 2\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Rock\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 505,\n \"max\": 505,\n \"num_unique_values\": 1,\n \"samples\": [\n 505\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HP\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 20,\n \"max\": 20,\n \"num_unique_values\": 1,\n \"samples\": [\n 20\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Attack\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 10,\n \"max\": 10,\n \"num_unique_values\": 1,\n \"samples\": [\n 10\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Defense\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 230,\n \"max\": 230,\n \"num_unique_values\": 1,\n \"samples\": [\n 230\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Atk\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 10,\n \"max\": 10,\n \"num_unique_values\": 1,\n \"samples\": [\n 10\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Def\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 230,\n \"max\": 230,\n \"num_unique_values\": 1,\n \"samples\": [\n 230\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Speed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 5,\n \"max\": 5,\n \"num_unique_values\": 1,\n \"samples\": [\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Generation\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 2,\n \"max\": 2,\n \"num_unique_values\": 1,\n \"samples\": [\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Legendary\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 1,\n \"samples\": [\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"A/D Ratio\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 0.043478260869565216,\n \"max\": 0.043478260869565216,\n \"num_unique_values\": 1,\n \"samples\": [\n 0.043478260869565216\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 23 + } + ], + "source": [ + "# your code here\n", + "min_ad_ratio = pokemon['A/D Ratio'].min()\n", + "min_ad_pokemon = pokemon[pokemon['A/D Ratio'] == min_ad_ratio]\n", + "min_ad_pokemon" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BnV7JdgWiq7m" + }, + "source": [ + "#### Create a new column called `Combo Type` whose value combines `Type 1` and `Type 2`.\n", + "\n", + "Rules:\n", + "\n", + "* If both `Type 1` and `Type 2` have valid values, the `Combo Type` value should contain both values in the form of ` `. For example, if `Type 1` value is `Grass` and `Type 2` value is `Poison`, `Combo Type` will be `Grass-Poison`.\n", + "\n", + "* If `Type 1` has valid value but `Type 2` is not, `Combo Type` will be the same as `Type 1`. For example, if `Type 1` is `Fire` whereas `Type 2` is `NaN`, `Combo Type` will be `Fire`." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "id": "rinbnnyQiq7m", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ea874234-c6fc-4a08-8c0e-aaad6b27931b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " # Name Type 1 Type 2 Total HP Attack Defense Sp. Atk \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 65 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 80 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 100 \n", + "3 3 Venusaur Venusaur Grass Poison 625 80 100 123 122 \n", + "4 4 Charmander Fire NaN 309 39 52 43 60 \n", + "\n", + " Sp. Def Speed Generation Legendary A/D Ratio Combo Type \n", + "0 65 45 1 False 1.000000 Grass-Poison \n", + "1 80 60 1 False 0.984127 Grass-Poison \n", + "2 100 80 1 False 0.987952 Grass-Poison \n", + "3 120 80 1 False 0.813008 Grass-Poison \n", + "4 50 65 1 False 1.209302 Fire \n" + ] + } + ], + "source": [ + "# Define a function to combine Type 1 and Type 2\n", + "def create_combo_type(row):\n", + " if pd.notna(row['Type 1']) and pd.notna(row['Type 2']):\n", + " return f\"{row['Type 1']}-{row['Type 2']}\"\n", + " elif pd.notna(row['Type 1']):\n", + " return row['Type 1']\n", + " else:\n", + " return pd.NA\n", + "\n", + "# Apply the function to create the Combo Type column\n", + "pokemon['Combo Type'] = pokemon.apply(create_combo_type, axis=1)\n", + "\n", + "# Display the first few rows to check the new column\n", + "print(pokemon.head())\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nPUrTA38iq7m" + }, + "source": [ + "#### Identify the pokemon whose `A/D Ratio` are among the top 5." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "id": "3djH1vebiq7m", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d06eaad2-62ce-4082-d9b8-1f5cce683630" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "429 386 DeoxysAttack Forme Psychic NaN 600 50 180 20 \n", + "347 318 Carvanha Water Dark 305 45 90 20 \n", + "19 15 Beedrill Beedrill Bug Poison 495 65 150 40 \n", + "453 408 Cranidos Rock NaN 350 67 125 40 \n", + "348 319 Sharpedo Water Dark 460 70 120 40 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ratio Combo Type \n", + "429 180 20 150 3 True 9.000 Psychic \n", + "347 65 20 65 3 False 4.500 Water-Dark \n", + "19 15 80 145 1 False 3.750 Bug-Poison \n", + "453 30 30 58 4 False 3.125 Rock \n", + "348 95 40 95 3 False 3.000 Water-Dark \n" + ] + } + ], + "source": [ + "# your code here\n", + "# Sort the DataFrame by A/D Ratio in descending order\n", + "top_5_ad_ratio = pokemon.sort_values(by='A/D Ratio', ascending=False).head(5)\n", + "\n", + "# Display the details of the top 5 Pokémon with the highest A/D Ratio\n", + "print(top_5_ad_ratio)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e5pinfG0iq7m" + }, + "source": [ + "#### For the 5 pokemon printed above, aggregate `Combo Type` and use a list to store the unique values.\n", + "\n", + "Your end product is a list containing the distinct `Combo Type` values of the 5 pokemon with the highest `A/D Ratio`." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "id": "6f1jGudPiq7m", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fd6e4737-a2fe-4f86-c3e1-217f235ab9ed" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock']\n" + ] + } + ], + "source": [ + "# your code here\n", + "# Step 1: Sort the DataFrame by A/D Ratio in descending order and get the top 5\n", + "top_5_ad_ratio = pokemon.sort_values(by='A/D Ratio', ascending=False).head(5)\n", + "\n", + "# Step 2: Extract the Combo Type values from these top 5 Pokémon\n", + "combo_types = top_5_ad_ratio['Combo Type']\n", + "\n", + "# Step 3: Get the unique Combo Type values\n", + "unique_combo_types = combo_types.unique().tolist()\n", + "\n", + "# Display the list of unique Combo Type values\n", + "print(unique_combo_types)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdJYQcj2iq7m" + }, + "source": [ + "#### For each of the `Combo Type` values obtained from the previous question, calculate the mean scores of all numeric fields across all pokemon.\n", + "\n", + "Your output should look like below:\n", + "\n", + "![Aggregate](../images/aggregated-mean.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "id": "VbWc_YgHiq7n", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "028d1602-cf9d-42c4-d342-beef932cad9e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " # Total HP Attack Defense \\\n", + "Psychic 381.973684 464.552632 72.552632 64.947368 67.236842 \n", + "Water-Dark 347.666667 493.833333 69.166667 120.000000 65.166667 \n", + "Bug-Poison 199.166667 347.916667 53.750000 68.333333 58.083333 \n", + "Rock 410.111111 409.444444 67.111111 103.333333 107.222222 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary A/D Ratio \n", + "Psychic 98.552632 82.394737 78.868421 3.342105 0.236842 1.164196 \n", + "Water-Dark 88.833333 63.500000 87.166667 3.166667 0.000000 2.291949 \n", + "Bug-Poison 42.500000 59.333333 65.916667 2.333333 0.000000 1.315989 \n", + "Rock 40.555556 58.333333 32.888889 3.888889 0.111111 1.260091 \n" + ] + } + ], + "source": [ + "# your code here\n", + "# Define the unique Combo Type values obtained previously\n", + "combo_types = ['Psychic', 'Water-Dark', 'Bug-Poison', 'Rock']\n", + "\n", + "# Initialize an empty dictionary to store the mean scores for each Combo Type\n", + "mean_scores_by_combo_type = {}\n", + "\n", + "# Loop through each unique Combo Type\n", + "for combo_type in combo_types:\n", + " # Filter the DataFrame by the current Combo Type\n", + " filtered_pokemon = pokemon[pokemon['Combo Type'] == combo_type]\n", + "\n", + " # Calculate the mean of all numeric fields for this Combo Type\n", + " mean_scores = filtered_pokemon.mean(numeric_only=True)\n", + "\n", + " # Store the mean scores in the dictionary with Combo Type as the key\n", + " mean_scores_by_combo_type[combo_type] = mean_scores\n", + "\n", + "# Convert the dictionary to a DataFrame for better readability\n", + "mean_scores_df = pd.DataFrame(mean_scores_by_combo_type).T\n", + "\n", + "# Display the resulting DataFrame\n", + "print(mean_scores_df)\n", + "\n", + "### #: This column represents the number of Pokémon with that particular Combo Type.\n", + "### It's a count of how many entries were used to calculate the mean values\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + }, + "colab": { + "provenance": [] + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/your-code/challenge-2.ipynb b/your-code/challenge-2.ipynb index d347731..8194e68 100644 --- a/your-code/challenge-2.ipynb +++ b/your-code/challenge-2.ipynb @@ -1,195 +1,752 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 2\n", - "\n", - "In this challenge we will continue working with the `Pokemon` dataset. We will attempt solving a slightly more complex problem in which we will practice the iterative data analysis process you leaned in [this video](https://www.youtube.com/watch?v=xOomNicqbkk).\n", - "\n", - "The problem statement is as follows:\n", - "\n", - "**You are at a Pokemon black market planning to buy a Pokemon for battle. All Pokemon are sold at the same price and you can only afford to buy one. You cannot choose which specific Pokemon to buy. However, you can specify the type of the Pokemon - one type that exists in either `Type 1` or `Type 2`. Which type should you choose in order to maximize your chance of receiving a good Pokemon?**\n", - "\n", - "To remind you about the 3 steps of iterative data analysis, they are:\n", - "\n", - "1. Setting Expectations\n", - "1. Collecting Information\n", - "1. Reacting to Data / Revising Expectations\n", - "\n", - "Following the iterative process, we'll guide you in completing the challenge." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Problem Solving Iteration 1\n", - "\n", - "In this iteration we'll analyze the problem and identify the breakthrough. The original question statement is kind of vague because we don't know what a *good pokemon* really means as represented in the data. We'll start by understanding the dataset and see if we can find some insights." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Import libraries\n", - "import numpy as np\n", - "import pandas as pd" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "# Importing the dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the data it seems whether a pokemon is good depends on its abilities as represented in the fields of `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total`. We are not sure about `Generation` and `Legendary` because they are not necessarily the decisive factors of the pokemon abilities.\n", - "\n", - "But `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total` are a lot of fields! If we look at them all at once it's very complicated. This isn't Mission Impossible but it's ideal that we tackle this kind of problem after we learn Machine Learning (which you will do in Module 3). For now, is there a way to consolidate the fields we need to look into?\n", - "\n", - "Fortunately there seems to be a way. It appears the `Total` field is computed based on the other 6 fields. But we need to prove our theory. If we can approve there is a formula to compute `Total` based on the other 6 abilities, we only need to look into `Total`.\n", - "\n", - "We have the following expectation now:\n", - "\n", - "#### The `Total` field is computed based on `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed`.\n", - "\n", - "We need to collect the following information:\n", - "\n", - "* **What is the formula to compute `Total`?**\n", - "* **Does the formula work for all pokemon?**\n", - "\n", - "In the cell below, make a hypothesis on how `Total` is computed and test your hypothesis." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Problem Solving Iteration 2\n", - "\n", - "Now that we have consolidated the abilities fields, we can update the problem statement. The new problem statement is:\n", - "\n", - "### Which pokemon type is most likely to have the highest `Total` value?\n", - "\n", - "In the updated problem statement, we assume there is a certain relationship between the `Total` and the pokemon type. But we have two *type* fields (`Type 1` and `Type 2`) that have string values. In data analysis, string fields have to be transformed to numerical format in order to be analyzed. \n", - "\n", - "In addition, keep in mind that `Type 1` always has a value but `Type 2` is sometimes empty (having the `NaN` value). Also, the pokemon type we choose may be either in `Type 1` or `Type 2`.\n", - "\n", - "Now our expectation is:\n", - "\n", - "#### `Type 1` and `Type 2` string variables need to be converted to numerical variables in order to identify the relationship between `Total` and the pokemon type.\n", - "\n", - "The information we need to collect is:\n", - "\n", - "#### How to convert two string variables to numerical?\n", - "\n", - "Let's address the first question first. You can use a method called **One Hot Encoding** which is frequently used in machine learning to encode categorical string variables to numerical. The idea is to gather all the possible string values in a categorical field and create a numerical field for each unique string value. Each of those numerical fields uses `1` and `0` to indicate whether the data record has the corresponding categorical value. A detailed explanation of One Hot Encoding can be found in [this article](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f). You will formally learn it in Module 3.\n", - "\n", - "For instance, if a pokemon has `Type 1` as `Poison` and `Type 2` as `Fire`, then its `Poison` and `Fire` fields are `1` whereas all other fields are `0`. If a pokemon has `Type 1` as `Water` and `Type 2` as `NaN`, then its `Water` field is `1` whereas all other fields are `0`.\n", - "\n", - "#### In the next cell, use One Hot Encoding to encode `Type 1` and `Type 2`. Use the pokemon type values as the names of the numerical fields you create.\n", - "\n", - "The new numerical variables you create should look like below:\n", - "\n", - "![One Hot Encoding](../images/one-hot-encoding.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Problem Solving Iteration 3\n", - "\n", - "Now we have encoded the pokemon types, we will identify the relationship between `Total` and the encoded fields. Our expectation is:\n", - "\n", - "#### There are relationships between `Total` and the encoded pokemon type variables and we need to identify the correlations.\n", - "\n", - "The information we need to collect is:\n", - "\n", - "#### How to identify the relationship between `Total` and the encoded pokemon type fields?\n", - "\n", - "There are multiple ways to answer this question. The easiest way is to use correlation. In the cell below, calculate the correlation of `Total` to each of the encoded fields. Rank the correlations and identify the #1 pokemon type that is most likely to have the highest `Total`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Bonus Question\n", - "\n", - "Say now you can choose both `Type 1` and `Type 2` of the pokemon. In order to receive the best pokemon, which types will you choose?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "esVToYY6cYvM" + }, + "source": [ + "# Challenge 2\n", + "\n", + "In this challenge we will continue working with the `Pokemon` dataset. We will attempt solving a slightly more complex problem in which we will practice the iterative data analysis process you leaned in [this video](https://www.youtube.com/watch?v=xOomNicqbkk).\n", + "\n", + "The problem statement is as follows:\n", + "\n", + "**You are at a Pokemon black market planning to buy a Pokemon for battle. All Pokemon are sold at the same price and you can only afford to buy one. You cannot choose which specific Pokemon to buy. However, you can specify the type of the Pokemon - one type that exists in either `Type 1` or `Type 2`. Which type should you choose in order to maximize your chance of receiving a good Pokemon?**\n", + "\n", + "To remind you about the 3 steps of iterative data analysis, they are:\n", + "\n", + "1. Setting Expectations\n", + "1. Collecting Information\n", + "1. Reacting to Data / Revising Expectations\n", + "\n", + "Following the iterative process, we'll guide you in completing the challenge." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WwpYrqJlcYvO" + }, + "source": [ + "\n", + "## Problem Solving Iteration 1\n", + "\n", + "In this iteration we'll analyze the problem and identify the breakthrough. The original question statement is kind of vague because we don't know what a *good pokemon* really means as represented in the data. We'll start by understanding the dataset and see if we can find some insights." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "7GJEEU4icYvP" + }, + "outputs": [], + "source": [ + "# Import libraries\n", + "import numpy as np\n", + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "scrolled": true, + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "jPmX42ZDcYvP", + "outputId": "61ad24b0-0d5d-428a-801a-aa42f0b30597" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " # Name Type 1 Type 2 Total HP Attack Defense \\\n", + "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", + "1 2 Ivysaur Grass Poison 405 60 62 63 \n", + "2 3 Venusaur Grass Poison 525 80 82 83 \n", + "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", + "4 4 Charmander Fire NaN 309 39 52 43 \n", + "\n", + " Sp. Atk Sp. Def Speed Generation Legendary \n", + "0 65 65 45 1 False \n", + "1 80 80 60 1 False \n", + "2 100 100 80 1 False \n", + "3 122 120 80 1 False \n", + "4 60 50 65 1 False " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "pokemon", + "summary": "{\n \"name\": \"pokemon\",\n \"rows\": 800,\n \"fields\": [\n {\n \"column\": \"#\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 208,\n \"min\": 1,\n \"max\": 721,\n \"num_unique_values\": 721,\n \"samples\": [\n 260,\n 659,\n 78\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 800,\n \"samples\": [\n \"Hydreigon\",\n \"Beheeyem\",\n \"Growlithe\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 1\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Grass\",\n \"Fire\",\n \"Fairy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Type 2\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Poison\",\n \"Flying\",\n \"Steel\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 119,\n \"min\": 180,\n \"max\": 780,\n \"num_unique_values\": 200,\n \"samples\": [\n 700,\n 349,\n 505\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HP\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 25,\n \"min\": 1,\n \"max\": 255,\n \"num_unique_values\": 94,\n \"samples\": [\n 106,\n 81,\n 170\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Attack\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 5,\n \"max\": 190,\n \"num_unique_values\": 111,\n \"samples\": [\n 79,\n 63,\n 52\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Defense\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31,\n \"min\": 5,\n \"max\": 230,\n \"num_unique_values\": 103,\n \"samples\": [\n 20,\n 88,\n 23\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Atk\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 10,\n \"max\": 194,\n \"num_unique_values\": 105,\n \"samples\": [\n 58,\n 150,\n 160\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sp. Def\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 27,\n \"min\": 20,\n \"max\": 230,\n \"num_unique_values\": 92,\n \"samples\": [\n 154,\n 45,\n 44\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Speed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 29,\n \"min\": 5,\n \"max\": 180,\n \"num_unique_values\": 108,\n \"samples\": [\n 113,\n 50,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Generation\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 6,\n \"num_unique_values\": 6,\n \"samples\": [\n 1,\n 2,\n 6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Legendary\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "# Importing the dataset\n", + "pokemon = pd.read_csv('Pokemon.csv')\n", + "pokemon.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cJo3THM1cYvQ" + }, + "source": [ + "From the data it seems whether a pokemon is good depends on its abilities as represented in the fields of `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total`. We are not sure about `Generation` and `Legendary` because they are not necessarily the decisive factors of the pokemon abilities.\n", + "\n", + "But `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, `Speed`, and `Total` are a lot of fields! If we look at them all at once it's very complicated. This isn't Mission Impossible but it's ideal that we tackle this kind of problem after we learn Machine Learning (which you will do in Module 3). For now, is there a way to consolidate the fields we need to look into?\n", + "\n", + "Fortunately there seems to be a way. It appears the `Total` field is computed based on the other 6 fields. But we need to prove our theory. If we can approve there is a formula to compute `Total` based on the other 6 abilities, we only need to look into `Total`.\n", + "\n", + "We have the following expectation now:\n", + "\n", + "#### The `Total` field is computed based on `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed`.\n", + "\n", + "We need to collect the following information:\n", + "\n", + "* **What is the formula to compute `Total`?**\n", + "* **Does the formula work for all pokemon?**\n", + "\n", + "In the cell below, make a hypothesis on how `Total` is computed and test your hypothesis." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "LWWwiuQRcYvQ", + "outputId": "f55075d3-4790-496d-dc4b-27e8adac39e9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " Name Total Calculated Total Confirmed\n", + "0 Bulbasaur 318 318 True\n", + "1 Ivysaur 405 405 True\n", + "2 Venusaur 525 525 True\n", + "3 VenusaurMega Venusaur 625 625 True\n", + "4 Charmander 309 309 True\n", + ".. ... ... ... ...\n", + "795 Diancie 600 600 True\n", + "796 DiancieMega Diancie 700 700 True\n", + "797 HoopaHoopa Confined 600 600 True\n", + "798 HoopaHoopa Unbound 680 680 True\n", + "799 Volcanion 600 600 True\n", + "\n", + "[800 rows x 4 columns]\n" + ] + } + ], + "source": [ + "# your code here\n", + "pokemon_df = pd.read_csv('Pokemon.csv')\n", + "\n", + "# Calculate Total based on the formula\n", + "pokemon_df['Calculated Total'] = (\n", + " pokemon_df['HP'] +\n", + " pokemon_df['Attack'] +\n", + " pokemon_df['Defense'] +\n", + " pokemon_df['Sp. Atk'] +\n", + " pokemon_df['Sp. Def'] +\n", + " pokemon_df['Speed']\n", + ")\n", + "\n", + "# Compare the calculated Total with the provided Total\n", + "pokemon_df['Confirmed'] = pokemon_df['Total'] == pokemon_df['Calculated Total']\n", + "\n", + "# Display the DataFrame with the results\n", + "print(pokemon_df[['Name', 'Total', 'Calculated Total', 'Confirmed']])\n", + "#The Hypothesis is CONFIRMED!!! YAY!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HlmmHpngcYvQ" + }, + "source": [ + "## Problem Solving Iteration 2\n", + "\n", + "Now that we have consolidated the abilities fields, we can update the problem statement. The new problem statement is:\n", + "\n", + "### Which pokemon type is most likely to have the highest `Total` value?\n", + "\n", + "In the updated problem statement, we assume there is a certain relationship between the `Total` and the pokemon type. But we have two *type* fields (`Type 1` and `Type 2`) that have string values. In data analysis, string fields have to be transformed to numerical format in order to be analyzed.\n", + "\n", + "In addition, keep in mind that `Type 1` always has a value but `Type 2` is sometimes empty (having the `NaN` value). Also, the pokemon type we choose may be either in `Type 1` or `Type 2`.\n", + "\n", + "Now our expectation is:\n", + "\n", + "#### `Type 1` and `Type 2` string variables need to be converted to numerical variables in order to identify the relationship between `Total` and the pokemon type.\n", + "\n", + "The information we need to collect is:\n", + "\n", + "#### How to convert two string variables to numerical?\n", + "\n", + "Let's address the first question first. You can use a method called **One Hot Encoding** which is frequently used in machine learning to encode categorical string variables to numerical. The idea is to gather all the possible string values in a categorical field and create a numerical field for each unique string value. Each of those numerical fields uses `1` and `0` to indicate whether the data record has the corresponding categorical value. A detailed explanation of One Hot Encoding can be found in [this article](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f). You will formally learn it in Module 3.\n", + "\n", + "For instance, if a pokemon has `Type 1` as `Poison` and `Type 2` as `Fire`, then its `Poison` and `Fire` fields are `1` whereas all other fields are `0`. If a pokemon has `Type 1` as `Water` and `Type 2` as `NaN`, then its `Water` field is `1` whereas all other fields are `0`.\n", + "\n", + "#### In the next cell, use One Hot Encoding to encode `Type 1` and `Type 2`. Use the pokemon type values as the names of the numerical fields you create.\n", + "\n", + "The new numerical variables you create should look like below:\n", + "\n", + "![One Hot Encoding](../images/one-hot-encoding.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CvQne7hqcYvR", + "outputId": "cad53197-d8e7-4dde-eab1-97b5d5c0f630" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " # Name Total HP Attack Defense Sp. Atk Sp. Def \\\n", + "0 1 Bulbasaur 318 45 49 49 65 65 \n", + "1 2 Ivysaur 405 60 62 63 80 80 \n", + "2 3 Venusaur 525 80 82 83 100 100 \n", + "3 3 VenusaurMega Venusaur 625 80 100 123 122 120 \n", + "4 4 Charmander 309 39 52 43 60 50 \n", + "\n", + " Speed Generation ... Type2_Ghost Type2_Grass Type2_Ground Type2_Ice \\\n", + "0 45 1 ... False False False False \n", + "1 60 1 ... False False False False \n", + "2 80 1 ... False False False False \n", + "3 80 1 ... False False False False \n", + "4 65 1 ... False False False False \n", + "\n", + " Type2_Normal Type2_Poison Type2_Psychic Type2_Rock Type2_Steel \\\n", + "0 False True False False False \n", + "1 False True False False False \n", + "2 False True False False False \n", + "3 False True False False False \n", + "4 False False False False False \n", + "\n", + " Type2_Water \n", + "0 False \n", + "1 False \n", + "2 False \n", + "3 False \n", + "4 False \n", + "\n", + "[5 rows x 49 columns]\n" + ] + } + ], + "source": [ + "# your code here\n", + "type_1_encoded = pd.get_dummies(pokemon_df['Type 1'], prefix='Type1')\n", + "type_2_encoded = pd.get_dummies(pokemon_df['Type 2'], prefix='Type2')\n", + "\n", + "# Combine encoded features with the original DataFrame\n", + "pokemon_encoded_df = pd.concat([pokemon_df, type_1_encoded, type_2_encoded], axis=1)\n", + "\n", + "# Drop the original Type 1 and Type 2 columns\n", + "pokemon_encoded_df.drop(columns=['Type 1', 'Type 2'], inplace=True)\n", + "\n", + "# Display the resulting DataFrame\n", + "print(pokemon_encoded_df.head())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_ncqBeU7cYvR" + }, + "source": [ + "## Problem Solving Iteration 3\n", + "\n", + "Now we have encoded the pokemon types, we will identify the relationship between `Total` and the encoded fields. Our expectation is:\n", + "\n", + "#### There are relationships between `Total` and the encoded pokemon type variables and we need to identify the correlations.\n", + "\n", + "The information we need to collect is:\n", + "\n", + "#### How to identify the relationship between `Total` and the encoded pokemon type fields?\n", + "\n", + "There are multiple ways to answer this question. The easiest way is to use correlation. In the cell below, calculate the correlation of `Total` to each of the encoded fields. Rank the correlations and identify the #1 pokemon type that is most likely to have the highest `Total`." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vyuArvaGcYvR", + "outputId": "c6784789-90a3-4426-e482-e4f2b15c8bc3" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Correlations with Total:\n", + "Type1_Dragon 0.196532\n", + "Type2_Fighting 0.138726\n", + "Type2_Dragon 0.115240\n", + "Type2_Ice 0.100870\n", + "Type1_Psychic 0.094364\n", + "Type1_Steel 0.082000\n", + "Type2_Psychic 0.076054\n", + "Type2_Fire 0.073234\n", + "Type2_Steel 0.070307\n", + "Type2_Dark 0.065844\n", + "Type2_Flying 0.054048\n", + "Type1_Fire 0.050527\n", + "Type1_Rock 0.037524\n", + "Type1_Flying 0.029504\n", + "Type1_Dark 0.017818\n", + "Type1_Electric 0.016715\n", + "Type2_Ground 0.016486\n", + "Type2_Electric 0.014669\n", + "Type1_Ghost 0.007594\n", + "Type1_Ground 0.004082\n", + "Type2_Rock -0.000512\n", + "Type1_Ice -0.002412\n", + "Type2_Ghost -0.004885\n", + "Type2_Normal -0.013956\n", + "Type1_Water -0.015640\n", + "Type2_Water -0.018800\n", + "Type2_Bug -0.021375\n", + "Type2_Fairy -0.024606\n", + "Type1_Fairy -0.026948\n", + "Type1_Fighting -0.029086\n", + "Type1_Grass -0.036057\n", + "Type2_Grass -0.039224\n", + "Type1_Poison -0.057123\n", + "Type2_Poison -0.067837\n", + "Type1_Normal -0.104150\n", + "Type1_Bug -0.143957\n", + "dtype: float64\n" + ] + } + ], + "source": [ + "# Calculate correlation of Total with encoded Type 1 and Type 2 fields\n", + "type_correlations = pokemon_encoded_df[pokemon_encoded_df.columns[pokemon_encoded_df.columns.str.startswith(('Type1', 'Type2'))]].corrwith(pokemon_encoded_df['Total'])\n", + "\n", + "# Sort correlations in descending order\n", + "sorted_type_correlations = type_correlations.sort_values(ascending=False)\n", + "\n", + "print(\"Correlations with Total:\")\n", + "print(sorted_type_correlations)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xk6cLpL3cYvR" + }, + "source": [ + "# Bonus Question\n", + "\n", + "Say now you can choose both `Type 1` and `Type 2` of the pokemon. In order to receive the best pokemon, which types will you choose?" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bDST7tC5cYvR", + "outputId": "f8e9aecf-e226-4ae8-a278-9b5291ab5077" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "#1 Pokemon Type with the Highest Total:\n", + "Type1_Dragon\n" + ] + } + ], + "source": [ + "# your code here\n", + "print(\"\\n#1 Pokemon Type with the Highest Total:\")\n", + "print(sorted_type_correlations.index[0]) # Index 0 corresponds to the highest correlation" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + }, + "colab": { + "provenance": [] + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/your-code/challenge-3.ipynb b/your-code/challenge-3.ipynb index a42a586..bab1d6e 100644 --- a/your-code/challenge-3.ipynb +++ b/your-code/challenge-3.ipynb @@ -1,147 +1,730 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Challenge 3\n", - "\n", - "In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n", - "\n", - "You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n", - "\n", - "**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n", - "\n", - "**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Q1: How to identify VIP & Preferred Customers?\n", - "\n", - "We start by importing all the required libraries:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# import required libraries\n", - "import numpy as np\n", - "import pandas as pd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n", - "\n", - "## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We break down the main problem into several sub problems:\n", - "\n", - "#### Sub Problem 1: How to aggregate the `amount_spent` for unique customers?\n", - "\n", - "#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n", - "\n", - "#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", - "\n", - "*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n", - "\n", - "Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", - "\n", - "## Q2: How to identify which country has the most VIP Customers?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Q3: How to identify which country has the most VIP+Preferred Customers combined?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# your code here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "ZR7acNcgkjuA" + }, + "source": [ + "# Challenge 3\n", + "\n", + "In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.\n", + "\n", + "You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:\n", + "\n", + "**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.\n", + "\n", + "**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l0Z_R27BkjuD" + }, + "source": [ + "## Q1: How to identify VIP & Preferred Customers?\n", + "\n", + "We start by importing all the required libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": 111, + "metadata": { + "id": "7ezhgq1EkjuD" + }, + "outputs": [], + "source": [ + "# import required libraries\n", + "import numpy as np\n", + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jaabUVvtkjuE" + }, + "source": [ + "Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 112, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 310 + }, + "id": "zHCNYQ0xkjuE", + "outputId": "fcd4663d-d5db-4f2b-ee11-54412a941413" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "0 0 536365 85123A 2010 12 3 8 \n", + "1 1 536365 71053 2010 12 3 8 \n", + "2 2 536365 84406B 2010 12 3 8 \n", + "3 3 536365 84029G 2010 12 3 8 \n", + "4 4 536365 84029E 2010 12 3 8 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "0 white hanging heart t-light holder 6 2010-12-01 08:26:00 \n", + "1 white metal lantern 6 2010-12-01 08:26:00 \n", + "2 cream cupid hearts coat hanger 8 2010-12-01 08:26:00 \n", + "3 knitted union flag hot water bottle 6 2010-12-01 08:26:00 \n", + "4 red woolly hottie white heart. 6 2010-12-01 08:26:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent \n", + "0 2.55 17850 United Kingdom 15.30 \n", + "1 3.39 17850 United Kingdom 20.34 \n", + "2 2.75 17850 United Kingdom 22.00 \n", + "3 3.39 17850 United Kingdom 20.34 \n", + "4 3.39 17850 United Kingdom 20.34 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spent
0053636585123A20101238white hanging heart t-light holder62010-12-01 08:26:002.5517850United Kingdom15.30
115363657105320101238white metal lantern62010-12-01 08:26:003.3917850United Kingdom20.34
2253636584406B20101238cream cupid hearts coat hanger82010-12-01 08:26:002.7517850United Kingdom22.00
3353636584029G20101238knitted union flag hot water bottle62010-12-01 08:26:003.3917850United Kingdom20.34
4453636584029E20101238red woolly hottie white heart.62010-12-01 08:26:003.3917850United Kingdom20.34
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "orders" + } + }, + "metadata": {}, + "execution_count": 112 + } + ], + "source": [ + "# your code here\n", + "orders = pd.read_csv('Orders.zip')\n", + "orders.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T-o3NHBHkjuE" + }, + "source": [ + "---\n", + "\n", + "\"Identify VIP and Preferred Customers\" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:\n", + "\n", + "## How to label customers whose aggregated `amount_spent` is in a given quantile range?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aEPrlDj4kjuF" + }, + "source": [ + "We break down the main problem into several sub problems:\n", + "\n", + "#### Sub Problem 1: How to aggregate the `amount_spent` for unique customers?\n", + "\n", + "#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?\n", + "\n", + "#### Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", + "\n", + "*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*\n", + "\n", + "Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps." + ] + }, + { + "cell_type": "code", + "execution_count": 113, + "metadata": { + "id": "bYu23AZdkjuF" + }, + "outputs": [], + "source": [ + "# your code here\n", + "# Sub Problem 1: How to aggregate the amount_spent for unique customers?\n", + "customer_total = orders.groupby('CustomerID').agg({'amount_spent':'sum'})\n", + "\n", + "# Sub Problem 2: How to select customers whose aggregated amount_spent is in a given quantile range?\n", + "vip_threshold = customer_total['amount_spent'].quantile(0.95)\n", + "preferred_threshold = customer_total['amount_spent'].quantile(0.75)\n", + "\n", + "vip_customers = customer_total[customer_total['amount_spent'] > vip_threshold]\n", + "preferred_customers = customer_total[(customer_total['amount_spent'] > preferred_threshold) & (customer_total['amount_spent'] <= vip_threshold)]\n", + "\n", + "# Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", + "customer_total['customer_label'] = 'Normal'\n", + "customer_total.loc[customer_total.index.isin(vip_customers.index), 'customer_label'] = 'VIP'\n", + "customer_total.loc[customer_total.index.isin(preferred_customers.index), 'customer_label'] = 'Preferred'" + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "Knvj7C3C3Fq5" + }, + "execution_count": 113, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Upvo1qK6kjuF" + }, + "source": [ + "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", + "\n", + "## Q2: How to identify which country has the most VIP Customers?" + ] + }, + { + "source": [ + "# Filter the DataFrame to include only VIP customers\n", + "vip_customers_df = orders[orders['CustomerID'].isin(vip_customers.index)]\n", + "\n", + "# Count unique VIP customers by country\n", + "vip_customers_by_country = vip_customers_df.groupby('Country')['CustomerID'].nunique()\n", + "\n", + "# Print VIP customer counts by country\n", + "print(\"VIP customers by country:\")\n", + "print(vip_customers_by_country)\n", + "\n", + "# A country with the mist VIPs:\n", + "print(f'A country with the most VIP customers is: {vip_customers_by_country.idxmax()} with {vip_customers_by_country.max()} customers')\n" + ], + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "cbqhACL-nP8V", + "outputId": "71f5e65d-7e20-42ad-aed8-316f97c88e98" + }, + "execution_count": 119, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "VIP customers by country:\n", + "Country\n", + "Australia 2\n", + "Belgium 1\n", + "Channel Islands 1\n", + "Cyprus 1\n", + "Denmark 1\n", + "EIRE 2\n", + "Finland 1\n", + "France 9\n", + "Germany 10\n", + "Japan 2\n", + "Netherlands 1\n", + "Norway 1\n", + "Portugal 2\n", + "Singapore 1\n", + "Spain 2\n", + "Sweden 1\n", + "Switzerland 3\n", + "United Kingdom 177\n", + "Name: CustomerID, dtype: int64\n", + "A country with the most VIP customers is: United Kingdom with 177 customers\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9e9Jc_mnkjuG" + }, + "source": [ + "## Q3: How to identify which country has the most VIP+Preferred Customers\n", + "\n", + "* List item\n", + "* List item\n", + "\n", + "combined?" + ] + }, + { + "cell_type": "code", + "execution_count": 120, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "S1g_S834kjuG", + "outputId": "8e267f9b-3f79-4ab9-fcd6-333a41ed00d2" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Preferred customers by country:\n", + "Country\n", + "Australia 2\n", + "Austria 3\n", + "Belgium 11\n", + "Canada 1\n", + "Channel Islands 3\n", + "Cyprus 3\n", + "Denmark 2\n", + "EIRE 1\n", + "Finland 4\n", + "France 20\n", + "Germany 29\n", + "Greece 1\n", + "Iceland 1\n", + "Israel 2\n", + "Italy 5\n", + "Japan 2\n", + "Lebanon 1\n", + "Malta 1\n", + "Norway 6\n", + "Poland 1\n", + "Portugal 5\n", + "Spain 7\n", + "Sweden 1\n", + "Switzerland 6\n", + "United Kingdom 755\n", + "Name: CustomerID, dtype: int64\n", + "A country with the most preferred customers is: United Kingdom with 755 customers\n" + ] + } + ], + "source": [ + "# Filter the DataFrame to include only Preferred customers\n", + "preferred_customers_df = orders[orders['CustomerID'].isin(preferred_customers.index)]\n", + "\n", + "# Count unique Preferred customers by country\n", + "preferred_customers_by_country = preferred_customers_df.groupby('Country')['CustomerID'].nunique()\n", + "\n", + "# Print Preferred customer counts by country\n", + "print(\"Preferred customers by country:\")\n", + "print(preferred_customers_by_country)\n", + "\n", + "# Identify and print the country with the most Preferred customers\n", + "print(f'A country with the most preferred customers is: {preferred_customers_by_country.idxmax()} with {preferred_customers_by_country.max()} customers')\n" + ] + }, + { + "cell_type": "code", + "source": [ + "#seeing that both VIP and preferred customers are mostly located in United Kingdom we can jsut sun up the values\n", + "most_vip_and_preferred_customers = vip_customers_by_country.max() + preferred_customers_by_country.max()\n", + "print(f'The total number of most VIP and preferred customers is: {most_vip_and_preferred_customers}')" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_QzHe2IMwWCG", + "outputId": "bf78965a-9796-4799-9cba-1063d1eb6148" + }, + "execution_count": 121, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The total number of most VIP and preferred customers is: 932\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Extract counts for the United Kingdom\n", + "uk_vip_count = vip_customers_by_country.get('United Kingdom', 0)\n", + "uk_preferred_count = preferred_customers_by_country.get('United Kingdom', 0)\n", + "\n", + "# Calculate the total for the United Kingdom\n", + "total_uk_customers = uk_vip_count + uk_preferred_count\n", + "\n", + "print(f'The total number of VIP and Preferred customers in the United Kingdom is: {total_uk_customers}')\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "MnM3Tvf0PIvh", + "outputId": "c1895a3d-d2b3-4833-ba14-f133ed2845f5" + }, + "execution_count": 122, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The total number of VIP and Preferred customers in the United Kingdom is: 932\n" + ] + } + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + }, + "colab": { + "provenance": [] + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file From 0cb56d495c57ef1909e1a37f5e17707ae9c79074 Mon Sep 17 00:00:00 2001 From: Alex Husiev Date: Sat, 24 Aug 2024 06:44:49 +0200 Subject: [PATCH 2/2] Alex --- your-code/challenge-2.ipynb | 1353 ++++++++++++++--- your-code/challenge-3.ipynb | 2735 +++++++++++++++++++++++++++++++++-- 2 files changed, 3735 insertions(+), 353 deletions(-) diff --git a/your-code/challenge-2.ipynb b/your-code/challenge-2.ipynb index 8194e68..b6ed1d0 100644 --- a/your-code/challenge-2.ipynb +++ b/your-code/challenge-2.ipynb @@ -37,7 +37,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 3, "metadata": { "id": "7GJEEU4icYvP" }, @@ -50,7 +50,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 4, "metadata": { "scrolled": true, "colab": { @@ -58,7 +58,7 @@ "height": 206 }, "id": "jPmX42ZDcYvP", - "outputId": "61ad24b0-0d5d-428a-801a-aa42f0b30597" + "outputId": "e8550267-51da-4527-f30b-7e4227791ad1" }, "outputs": [ { @@ -81,7 +81,7 @@ ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameTotalCalculated TotalConfirmed
0Bulbasaur318318True
1Ivysaur405405True
2Venusaur525525True
3VenusaurMega Venusaur625625True
4Charmander309309True
...............
795Diancie600600True
796DiancieMega Diancie700700True
797HoopaHoopa Confined600600True
798HoopaHoopa Unbound680680True
799Volcanion600600True
\n", + "

800 rows × 4 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"#The Hypothesis is CONFIRMED!!! YAY!\",\n \"rows\": 800,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 800,\n \"samples\": [\n \"Hydreigon\",\n \"Beheeyem\",\n \"Growlithe\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 119,\n \"min\": 180,\n \"max\": 780,\n \"num_unique_values\": 200,\n \"samples\": [\n 700,\n 349,\n 505\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Calculated Total\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 119,\n \"min\": 180,\n \"max\": 780,\n \"num_unique_values\": 200,\n \"samples\": [\n 700,\n 349,\n 505\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Confirmed\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 1,\n \"samples\": [\n true\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {} + } + ], + "source": [ + "# your code here\n", + "pokemon_df = pd.read_csv('Pokemon.csv')\n", + "\n", + "# Calculate Total based on the formula\n", + "pokemon_df['Calculated Total'] = (\n", + " pokemon_df['HP'] +\n", + " pokemon_df['Attack'] +\n", + " pokemon_df['Defense'] +\n", + " pokemon_df['Sp. Atk'] +\n", + " pokemon_df['Sp. Def'] +\n", + " pokemon_df['Speed']\n", + ")\n", + "\n", + "# Compare the calculated Total with the provided Total\n", + "pokemon_df['Confirmed'] = pokemon_df['Total'] == pokemon_df['Calculated Total']\n", + "\n", + "# Display the DataFrame with the results\n", + "display(pokemon_df[['Name', 'Total', 'Calculated Total', 'Confirmed']])\n", + "#The Hypothesis is CONFIRMED!!! YAY!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HlmmHpngcYvQ" + }, + "source": [ + "## Problem Solving Iteration 2\n", + "\n", + "Now that we have consolidated the abilities fields, we can update the problem statement. The new problem statement is:\n", + "\n", + "### Which pokemon type is most likely to have the highest `Total` value?\n", + "\n", + "In the updated problem statement, we assume there is a certain relationship between the `Total` and the pokemon type. But we have two *type* fields (`Type 1` and `Type 2`) that have string values. In data analysis, string fields have to be transformed to numerical format in order to be analyzed.\n", + "\n", + "In addition, keep in mind that `Type 1` always has a value but `Type 2` is sometimes empty (having the `NaN` value). Also, the pokemon type we choose may be either in `Type 1` or `Type 2`.\n", + "\n", + "Now our expectation is:\n", + "\n", + "#### `Type 1` and `Type 2` string variables need to be converted to numerical variables in order to identify the relationship between `Total` and the pokemon type.\n", + "\n", + "The information we need to collect is:\n", + "\n", + "#### How to convert two string variables to numerical?\n", + "\n", + "Let's address the first question first. You can use a method called **One Hot Encoding** which is frequently used in machine learning to encode categorical string variables to numerical. The idea is to gather all the possible string values in a categorical field and create a numerical field for each unique string value. Each of those numerical fields uses `1` and `0` to indicate whether the data record has the corresponding categorical value. A detailed explanation of One Hot Encoding can be found in [this article](https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f). You will formally learn it in Module 3.\n", + "\n", + "For instance, if a pokemon has `Type 1` as `Poison` and `Type 2` as `Fire`, then its `Poison` and `Fire` fields are `1` whereas all other fields are `0`. If a pokemon has `Type 1` as `Water` and `Type 2` as `NaN`, then its `Water` field is `1` whereas all other fields are `0`.\n", + "\n", + "#### In the next cell, use One Hot Encoding to encode `Type 1` and `Type 2`. Use the pokemon type values as the names of the numerical fields you create.\n", + "\n", + "The new numerical variables you create should look like below:\n", + "\n", + "![One Hot Encoding](../images/one-hot-encoding.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "CvQne7hqcYvR", + "outputId": "04b62a2c-be0b-4a28-a1c5-110bf42362bd" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Bug Dark Dragon Electric Fairy Fighting Fire Flying Ghost \\\n", + "0 False False False False False False False False False \n", + "1 False False False False False False False False False \n", + "2 False False False False False False False False False \n", + "3 False False False False False False False False False \n", + "4 False False False False False False True False False \n", + ".. ... ... ... ... ... ... ... ... ... \n", + "795 False False False False True False False False False \n", + "796 False False False False True False False False False \n", + "797 False False False False False False False False True \n", + "798 False True False False False False False False False \n", + "799 False False False False False False True False False \n", + "\n", + " Grass Ground Ice Normal Poison Psychic Rock Steel Water \n", + "0 True False False False True False False False False \n", + "1 True False False False True False False False False \n", + "2 True False False False True False False False False \n", + "3 True False False False True False False False False \n", + "4 False False False False False False False False False \n", + ".. ... ... ... ... ... ... ... ... ... \n", + "795 False False False False False False True False False \n", + "796 False False False False False False True False False \n", + "797 False False False False False True False False False \n", + "798 False False False False False True False False False \n", + "799 False False False False False False False False True \n", + "\n", + "[800 rows x 18 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
BugDarkDragonElectricFairyFightingFireFlyingGhostGrassGroundIceNormalPoisonPsychicRockSteelWater
0FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
1FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
2FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
3FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalse
4FalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
.........................................................
795FalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalse
796FalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalse
797FalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseTrueFalseFalseFalse
798FalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalse
799FalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrue
\n", + "

800 rows × 18 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "pokemon_types", + "summary": "{\n \"name\": \"pokemon_types\",\n \"rows\": 800,\n \"fields\": [\n {\n \"column\": \"Bug\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Dark\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Dragon\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Electric\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Fairy\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Fighting\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Fire\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Flying\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Ghost\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Grass\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n false,\n true\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Ground\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Ice\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Normal\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Poison\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n false,\n true\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Psychic\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Rock\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Steel\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Water\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 14 } ], "source": [ @@ -597,7 +1472,8 @@ "pokemon_encoded_df.drop(columns=['Type 1', 'Type 2'], inplace=True)\n", "\n", "# Display the resulting DataFrame\n", - "print(pokemon_encoded_df.head())" + "pokemon_types = pd.get_dummies(pokemon['Type 1'])+pd.get_dummies(pokemon['Type 2'])\n", + "pokemon_types" ] }, { @@ -621,69 +1497,156 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 18, "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 649 }, "id": "vyuArvaGcYvR", - "outputId": "c6784789-90a3-4426-e482-e4f2b15c8bc3" + "outputId": "28329195-5174-4903-cf7d-a5aa9b513ef5" }, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Correlations with Total:\n", - "Type1_Dragon 0.196532\n", - "Type2_Fighting 0.138726\n", - "Type2_Dragon 0.115240\n", - "Type2_Ice 0.100870\n", - "Type1_Psychic 0.094364\n", - "Type1_Steel 0.082000\n", - "Type2_Psychic 0.076054\n", - "Type2_Fire 0.073234\n", - "Type2_Steel 0.070307\n", - "Type2_Dark 0.065844\n", - "Type2_Flying 0.054048\n", - "Type1_Fire 0.050527\n", - "Type1_Rock 0.037524\n", - "Type1_Flying 0.029504\n", - "Type1_Dark 0.017818\n", - "Type1_Electric 0.016715\n", - "Type2_Ground 0.016486\n", - "Type2_Electric 0.014669\n", - "Type1_Ghost 0.007594\n", - "Type1_Ground 0.004082\n", - "Type2_Rock -0.000512\n", - "Type1_Ice -0.002412\n", - "Type2_Ghost -0.004885\n", - "Type2_Normal -0.013956\n", - "Type1_Water -0.015640\n", - "Type2_Water -0.018800\n", - "Type2_Bug -0.021375\n", - "Type2_Fairy -0.024606\n", - "Type1_Fairy -0.026948\n", - "Type1_Fighting -0.029086\n", - "Type1_Grass -0.036057\n", - "Type2_Grass -0.039224\n", - "Type1_Poison -0.057123\n", - "Type2_Poison -0.067837\n", - "Type1_Normal -0.104150\n", - "Type1_Bug -0.143957\n", - "dtype: float64\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "Dragon 0.229705\n", + "Psychic 0.124688\n", + "Steel 0.109703\n", + "Fire 0.078726\n", + "Fighting 0.077786\n", + "Ice 0.060248\n", + "Flying 0.059383\n", + "Dark 0.056154\n", + "Rock 0.032731\n", + "Electric 0.020971\n", + "Ground 0.015060\n", + "Ghost 0.003641\n", + "Water -0.021665\n", + "Fairy -0.036698\n", + "Grass -0.052592\n", + "Poison -0.090441\n", + "Normal -0.105331\n", + "Bug -0.145781\n", + "Name: Total, dtype: float64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Total
Dragon0.229705
Psychic0.124688
Steel0.109703
Fire0.078726
Fighting0.077786
Ice0.060248
Flying0.059383
Dark0.056154
Rock0.032731
Electric0.020971
Ground0.015060
Ghost0.003641
Water-0.021665
Fairy-0.036698
Grass-0.052592
Poison-0.090441
Normal-0.105331
Bug-0.145781
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 18 } ], "source": [ - "# Calculate correlation of Total with encoded Type 1 and Type 2 fields\n", - "type_correlations = pokemon_encoded_df[pokemon_encoded_df.columns[pokemon_encoded_df.columns.str.startswith(('Type1', 'Type2'))]].corrwith(pokemon_encoded_df['Total'])\n", + "# Step 1: Add the 'Total' column to pokemon_types\n", + "pokemon_types['Total'] = pokemon_df['Total']\n", "\n", - "# Sort correlations in descending order\n", - "sorted_type_correlations = type_correlations.sort_values(ascending=False)\n", + "# Step 2: Calculate the correlation of 'Total' with each type column\n", + "correlations = pokemon_types.corr()['Total'].drop('Total')\n", "\n", - "print(\"Correlations with Total:\")\n", - "print(sorted_type_correlations)\n" + "# Step 3: Sort the correlations in descending order\n", + "sorted_correlations = correlations.sort_values(ascending=False)\n", + "\n", + "# Display the sorted correlations\n", + "sorted_correlations" ] }, { @@ -699,29 +1662,27 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bDST7tC5cYvR", - "outputId": "f8e9aecf-e226-4ae8-a278-9b5291ab5077" + "outputId": "fe336075-2f0a-4c90-d5c0-f8bc07e4688e" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "\n", - "#1 Pokemon Type with the Highest Total:\n", - "Type1_Dragon\n" + "The Pokémon type most likely to have the highest Total is: Dragon\n" ] } ], "source": [ - "# your code here\n", - "print(\"\\n#1 Pokemon Type with the Highest Total:\")\n", - "print(sorted_type_correlations.index[0]) # Index 0 corresponds to the highest correlation" + "# Identify the #1 Pokémon type with the highest correlation to 'Total'\n", + "top_type = sorted_correlations.idxmax()\n", + "print(f\"The Pokémon type most likely to have the highest Total is: {top_type}\")" ] } ], diff --git a/your-code/challenge-3.ipynb b/your-code/challenge-3.ipynb index bab1d6e..a092d13 100644 --- a/your-code/challenge-3.ipynb +++ b/your-code/challenge-3.ipynb @@ -30,7 +30,7 @@ }, { "cell_type": "code", - "execution_count": 111, + "execution_count": 35, "metadata": { "id": "7ezhgq1EkjuD" }, @@ -52,14 +52,14 @@ }, { "cell_type": "code", - "execution_count": 112, + "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 310 + "height": 206 }, "id": "zHCNYQ0xkjuE", - "outputId": "fcd4663d-d5db-4f2b-ee11-54412a941413" + "outputId": "bba22535-8bb9-45cc-b087-89d7b2974ff5" }, "outputs": [ { @@ -89,7 +89,7 @@ ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
amount_spent
CustomerID
1234677183.60
123474310.00
123481797.24
123491757.55
12350334.40
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "customer_total", + "summary": "{\n \"name\": \"customer_total\",\n \"rows\": 4339,\n \"fields\": [\n {\n \"column\": \"CustomerID\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1721,\n \"min\": 12346,\n \"max\": 18287,\n \"num_unique_values\": 4339,\n \"samples\": [\n 17785,\n 14317,\n 15977\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"amount_spent\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 8988.248381377658,\n \"min\": 0.0,\n \"max\": 280206.02,\n \"num_unique_values\": 4249,\n \"samples\": [\n 1048.85,\n 80.7,\n 104.35\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 37 + } + ], + "source": [ + "# your code here\n", + "# Sub Problem 1: How to aggregate the amount_spent for unique customers?\n", + "customer_total = orders.groupby('CustomerID').agg({'amount_spent':'sum'})\n", + "customer_total.head()" + ] + }, + { + "cell_type": "code", + "source": [ + "# Sub Problem 2: How to select customers whose aggregated amount_spent is in a given quantile range?\n", + "vip_threshold = customer_total['amount_spent'].quantile(0.95)\n", + "preferred_threshold = customer_total['amount_spent'].quantile(0.75)\n", + "#verifying\n", + "vip_threshold, preferred_threshold" + ], + "metadata": { + "id": "Knvj7C3C3Fq5", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e3bd3f49-5a03-4942-82c3-45f6c5eeab9c" + }, + "execution_count": 38, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(5840.181999999982, 1661.64)" + ] + }, + "metadata": {}, + "execution_count": 38 + } + ] + }, + { + "cell_type": "code", + "source": [ + "vip_customers = customer_total[customer_total['amount_spent'] > vip_threshold]\n", + "preferred_customers = customer_total[(customer_total['amount_spent'] > preferred_threshold) & (customer_total['amount_spent'] <= vip_threshold)]\n", + "#verifying\n", + "display(vip_customers.head())\n", + "display(preferred_customers.head())" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 458 + }, + "id": "aW0u-n72nk6m", + "outputId": "8ea35c4e-4871-41b7-df31-8768d333f31e" + }, + "execution_count": 39, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + " amount_spent\n", + "CustomerID \n", + "12346 77183.60\n", + "12357 6207.67\n", + "12359 6372.58\n", + "12409 11072.67\n", + "12415 124914.53" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
amount_spent
CustomerID
1234677183.60
123576207.67
123596372.58
1240911072.67
12415124914.53
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"display(preferred_customers\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"CustomerID\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 32,\n \"min\": 12346,\n \"max\": 12415,\n \"num_unique_values\": 5,\n \"samples\": [\n 12357,\n 12415,\n 12359\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"amount_spent\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 53781.94297920679,\n \"min\": 6207.67,\n \"max\": 124914.53,\n \"num_unique_values\": 5,\n \"samples\": [\n 6207.67,\n 124914.53,\n 6372.58\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + " amount_spent\n", + "CustomerID \n", + "12347 4310.00\n", + "12348 1797.24\n", + "12349 1757.55\n", + "12352 2506.04\n", + "12356 2811.43" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
amount_spent
CustomerID
123474310.00
123481797.24
123491757.55
123522506.04
123562811.43
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"display(preferred_customers\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"CustomerID\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 12347,\n \"max\": 12356,\n \"num_unique_values\": 5,\n \"samples\": [\n 12348,\n 12356,\n 12349\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"amount_spent\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1039.8477058059993,\n \"min\": 1757.55,\n \"max\": 4310.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 1797.24,\n 2811.43,\n 1757.55\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", + "customer_total['customer_label'] = 'Normal'\n", + "customer_total.loc[customer_total.index.isin(vip_customers.index), 'customer_label'] = 'VIP'\n", + "customer_total.loc[customer_total.index.isin(preferred_customers.index), 'customer_label'] = 'Preferred'\n", + "#verifying\n", + "customer_total.head()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 238 + }, + "id": "3fhRcb7wmBKV", + "outputId": "cc2b734f-316e-4353-8173-4cf5586aa641" + }, + "execution_count": 40, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " amount_spent customer_label\n", + "CustomerID \n", + "12346 77183.60 VIP\n", + "12347 4310.00 Preferred\n", + "12348 1797.24 Preferred\n", + "12349 1757.55 Preferred\n", + "12350 334.40 Normal" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
amount_spentcustomer_label
CustomerID
1234677183.60VIP
123474310.00Preferred
123481797.24Preferred
123491757.55Preferred
12350334.40Normal
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "customer_total", + "summary": "{\n \"name\": \"customer_total\",\n \"rows\": 4339,\n \"fields\": [\n {\n \"column\": \"CustomerID\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1721,\n \"min\": 12346,\n \"max\": 18287,\n \"num_unique_values\": 4339,\n \"samples\": [\n 17785,\n 14317,\n 15977\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"amount_spent\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 8988.248381377658,\n \"min\": 0.0,\n \"max\": 280206.02,\n \"num_unique_values\": 4249,\n \"samples\": [\n 1048.85,\n 80.7,\n 104.35\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"customer_label\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"VIP\",\n \"Preferred\",\n \"Normal\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 40 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Upvo1qK6kjuF" + }, + "source": [ + "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", + "\n", + "## Q2: How to identify which country has the most VIP Customers?" + ] + }, + { + "source": [ + "# Filter the DataFrame to include only VIP customers\n", + "vip_customers_df = orders[orders['CustomerID'].isin(vip_customers.index)]\n", + "vip_customers_df.tail()" + ], + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "collapsed": true, + "id": "cbqhACL-nP8V", + "outputId": "f23db9e1-b63a-4d3b-c17d-ea49ea4ff9a4" + }, + "execution_count": 50, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "397883 541868 581584 85038 2011 12 5 12 \n", + "397905 541890 581586 22061 2011 12 5 12 \n", + "397906 541891 581586 23275 2011 12 5 12 \n", + "397907 541892 581586 21217 2011 12 5 12 \n", + "397908 541893 581586 20685 2011 12 5 12 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "397883 6 chocolate love heart t-lights 48 2011-12-09 12:25:00 \n", + "397905 large cake stand hanging strawbery 8 2011-12-09 12:49:00 \n", + "397906 set of 3 hanging owls ollie beak 24 2011-12-09 12:49:00 \n", + "397907 red retrospot round cake tins 24 2011-12-09 12:49:00 \n", + "397908 doormat red retrospot 10 2011-12-09 12:49:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent \n", + "397883 1.85 13777 United Kingdom 88.8 \n", + "397905 2.95 13113 United Kingdom 23.6 \n", + "397906 1.25 13113 United Kingdom 30.0 \n", + "397907 8.95 13113 United Kingdom 214.8 \n", + "397908 7.08 13113 United Kingdom 70.8 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spent
397883541868581584850382011125126 chocolate love heart t-lights482011-12-09 12:25:001.8513777United Kingdom88.8
39790554189058158622061201112512large cake stand hanging strawbery82011-12-09 12:49:002.9513113United Kingdom23.6
39790654189158158623275201112512set of 3 hanging owls ollie beak242011-12-09 12:49:001.2513113United Kingdom30.0
39790754189258158621217201112512red retrospot round cake tins242011-12-09 12:49:008.9513113United Kingdom214.8
39790854189358158620685201112512doormat red retrospot102011-12-09 12:49:007.0813113United Kingdom70.8
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "repr_error": "0" + } + }, + "metadata": {}, + "execution_count": 50 + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Count unique VIP customers by country\n", + "vip_customers_by_country = vip_customers_df.groupby('Country')['CustomerID'].nunique()\n", + "vip_customers_by_country.head(20)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 680 + }, + "id": "uUBmsNLeoHnf", + "outputId": "33208639-d6f1-43c1-99d0-9dc121ccade0" + }, + "execution_count": 58, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Country\n", + "Australia 2\n", + "Belgium 1\n", + "Channel Islands 1\n", + "Cyprus 1\n", + "Denmark 1\n", + "EIRE 2\n", + "Finland 1\n", + "France 9\n", + "Germany 10\n", + "Japan 2\n", + "Netherlands 1\n", + "Norway 1\n", + "Portugal 2\n", + "Singapore 1\n", + "Spain 2\n", + "Sweden 1\n", + "Switzerland 3\n", + "United Kingdom 177\n", + "Name: CustomerID, dtype: int64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CustomerID
Country
Australia2
Belgium1
Channel Islands1
Cyprus1
Denmark1
EIRE2
Finland1
France9
Germany10
Japan2
Netherlands1
Norway1
Portugal2
Singapore1
Spain2
Sweden1
Switzerland3
United Kingdom177
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 58 + } + ] + }, + { + "cell_type": "code", + "source": [ + "# A country with the most VIPs:\n", + "print(f'A country with the most VIP customers is: {vip_customers_by_country.idxmax()} with {vip_customers_by_country.max()} customers')" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hDQ9rXqToIH-", + "outputId": "4e8e7228-24ed-4d3e-df7a-295d0aa6abba" + }, + "execution_count": 59, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "A country with the most VIP customers is: United Kingdom with 177 customers\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9e9Jc_mnkjuG" + }, + "source": [ + "## Q3: How to identify which country has the most VIP+Preferred Customers\n", + "\n", + "* List item\n", + "* List item\n", + "\n", + "combined?" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "S1g_S834kjuG", + "outputId": "fb3a305c-3c6d-4140-bd4e-a0851e3aec54" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Unnamed: 0 InvoiceNo StockCode year month day hour \\\n", + "0 0 536365 85123A 2010 12 3 8 \n", + "1 1 536365 71053 2010 12 3 8 \n", + "2 2 536365 84406B 2010 12 3 8 \n", + "3 3 536365 84029G 2010 12 3 8 \n", + "4 4 536365 84029E 2010 12 3 8 \n", + "... ... ... ... ... ... ... ... \n", + "397900 541885 581585 21684 2011 12 5 12 \n", + "397901 541886 581585 22398 2011 12 5 12 \n", + "397902 541887 581585 23328 2011 12 5 12 \n", + "397903 541888 581585 23145 2011 12 5 12 \n", + "397904 541889 581585 22466 2011 12 5 12 \n", + "\n", + " Description Quantity InvoiceDate \\\n", + "0 white hanging heart t-light holder 6 2010-12-01 08:26:00 \n", + "1 white metal lantern 6 2010-12-01 08:26:00 \n", + "2 cream cupid hearts coat hanger 8 2010-12-01 08:26:00 \n", + "3 knitted union flag hot water bottle 6 2010-12-01 08:26:00 \n", + "4 red woolly hottie white heart. 6 2010-12-01 08:26:00 \n", + "... ... ... ... \n", + "397900 small medina stamped metal bowl 12 2011-12-09 12:31:00 \n", + "397901 magnets pack of 4 swallows 12 2011-12-09 12:31:00 \n", + "397902 set 6 school milk bottles in crate 4 2011-12-09 12:31:00 \n", + "397903 zinc t-light holder star large 12 2011-12-09 12:31:00 \n", + "397904 fairy tale cottage night light 12 2011-12-09 12:31:00 \n", + "\n", + " UnitPrice CustomerID Country amount_spent \n", + "0 2.55 17850 United Kingdom 15.30 \n", + "1 3.39 17850 United Kingdom 20.34 \n", + "2 2.75 17850 United Kingdom 22.00 \n", + "3 3.39 17850 United Kingdom 20.34 \n", + "4 3.39 17850 United Kingdom 20.34 \n", + "... ... ... ... ... \n", + "397900 0.85 15804 United Kingdom 10.20 \n", + "397901 0.39 15804 United Kingdom 4.68 \n", + "397902 3.75 15804 United Kingdom 15.00 \n", + "397903 0.95 15804 United Kingdom 11.40 \n", + "397904 1.95 15804 United Kingdom 23.40 \n", + "\n", + "[151781 rows x 14 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0InvoiceNoStockCodeyearmonthdayhourDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryamount_spent
0053636585123A20101238white hanging heart t-light holder62010-12-01 08:26:002.5517850United Kingdom15.30
115363657105320101238white metal lantern62010-12-01 08:26:003.3917850United Kingdom20.34
2253636584406B20101238cream cupid hearts coat hanger82010-12-01 08:26:002.7517850United Kingdom22.00
3353636584029G20101238knitted union flag hot water bottle62010-12-01 08:26:003.3917850United Kingdom20.34
4453636584029E20101238red woolly hottie white heart.62010-12-01 08:26:003.3917850United Kingdom20.34
.............................................
39790054188558158521684201112512small medina stamped metal bowl122011-12-09 12:31:000.8515804United Kingdom10.20
39790154188658158522398201112512magnets pack of 4 swallows122011-12-09 12:31:000.3915804United Kingdom4.68
39790254188758158523328201112512set 6 school milk bottles in crate42011-12-09 12:31:003.7515804United Kingdom15.00
39790354188858158523145201112512zinc t-light holder star large122011-12-09 12:31:000.9515804United Kingdom11.40
39790454188958158522466201112512fairy tale cottage night light122011-12-09 12:31:001.9515804United Kingdom23.40
\n", + "

151781 rows × 14 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "preferred_customers_df" + } + }, + "metadata": {}, + "execution_count": 62 + } + ], "source": [ - "# your code here\n", - "# Sub Problem 1: How to aggregate the amount_spent for unique customers?\n", - "customer_total = orders.groupby('CustomerID').agg({'amount_spent':'sum'})\n", - "\n", - "# Sub Problem 2: How to select customers whose aggregated amount_spent is in a given quantile range?\n", - "vip_threshold = customer_total['amount_spent'].quantile(0.95)\n", - "preferred_threshold = customer_total['amount_spent'].quantile(0.75)\n", - "\n", - "vip_customers = customer_total[customer_total['amount_spent'] > vip_threshold]\n", - "preferred_customers = customer_total[(customer_total['amount_spent'] > preferred_threshold) & (customer_total['amount_spent'] <= vip_threshold)]\n", - "\n", - "# Sub Problem 3: How to label selected customers as \"VIP\" or \"Preferred\"?\n", - "customer_total['customer_label'] = 'Normal'\n", - "customer_total.loc[customer_total.index.isin(vip_customers.index), 'customer_label'] = 'VIP'\n", - "customer_total.loc[customer_total.index.isin(preferred_customers.index), 'customer_label'] = 'Preferred'" + "# Filter the DataFrame to include only Preferred customers\n", + "preferred_customers_df = orders[orders['CustomerID'].isin(preferred_customers.index)]\n", + "preferred_customers_df" ] }, { "cell_type": "code", - "source": [], - "metadata": { - "id": "Knvj7C3C3Fq5" - }, - "execution_count": 113, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Upvo1qK6kjuF" - }, - "source": [ - "Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:\n", - "\n", - "## Q2: How to identify which country has the most VIP Customers?" - ] - }, - { "source": [ - "# Filter the DataFrame to include only VIP customers\n", - "vip_customers_df = orders[orders['CustomerID'].isin(vip_customers.index)]\n", - "\n", - "# Count unique VIP customers by country\n", - "vip_customers_by_country = vip_customers_df.groupby('Country')['CustomerID'].nunique()\n", - "\n", - "# Print VIP customer counts by country\n", - "print(\"VIP customers by country:\")\n", - "print(vip_customers_by_country)\n", + "# Count unique Preferred customers by country\n", + "preferred_customers_by_country = preferred_customers_df.groupby('Country')['CustomerID'].nunique()\n", "\n", - "# A country with the mist VIPs:\n", - "print(f'A country with the most VIP customers is: {vip_customers_by_country.idxmax()} with {vip_customers_by_country.max()} customers')\n" + "# Print Preferred customer counts by country\n", + "preferred_customers_by_country" ], - "cell_type": "code", "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 899 }, - "collapsed": true, - "id": "cbqhACL-nP8V", - "outputId": "71f5e65d-7e20-42ad-aed8-316f97c88e98" + "id": "T4Hd_8zxql5K", + "outputId": "d5217fbd-a4f6-4add-fb89-08ca28a0c7a3" }, - "execution_count": 119, + "execution_count": 63, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "VIP customers by country:\n", - "Country\n", - "Australia 2\n", - "Belgium 1\n", - "Channel Islands 1\n", - "Cyprus 1\n", - "Denmark 1\n", - "EIRE 2\n", - "Finland 1\n", - "France 9\n", - "Germany 10\n", - "Japan 2\n", - "Netherlands 1\n", - "Norway 1\n", - "Portugal 2\n", - "Singapore 1\n", - "Spain 2\n", - "Sweden 1\n", - "Switzerland 3\n", - "United Kingdom 177\n", - "Name: CustomerID, dtype: int64\n", - "A country with the most VIP customers is: United Kingdom with 177 customers\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "Country\n", + "Australia 2\n", + "Austria 3\n", + "Belgium 11\n", + "Canada 1\n", + "Channel Islands 3\n", + "Cyprus 3\n", + "Denmark 2\n", + "EIRE 1\n", + "Finland 4\n", + "France 20\n", + "Germany 29\n", + "Greece 1\n", + "Iceland 1\n", + "Israel 2\n", + "Italy 5\n", + "Japan 2\n", + "Lebanon 1\n", + "Malta 1\n", + "Norway 6\n", + "Poland 1\n", + "Portugal 5\n", + "Spain 7\n", + "Sweden 1\n", + "Switzerland 6\n", + "United Kingdom 755\n", + "Name: CustomerID, dtype: int64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CustomerID
Country
Australia2
Austria3
Belgium11
Canada1
Channel Islands3
Cyprus3
Denmark2
EIRE1
Finland4
France20
Germany29
Greece1
Iceland1
Israel2
Italy5
Japan2
Lebanon1
Malta1
Norway6
Poland1
Portugal5
Spain7
Sweden1
Switzerland6
United Kingdom755
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 63 } ] }, - { - "cell_type": "markdown", - "metadata": { - "id": "9e9Jc_mnkjuG" - }, - "source": [ - "## Q3: How to identify which country has the most VIP+Preferred Customers\n", - "\n", - "* List item\n", - "* List item\n", - "\n", - "combined?" - ] - }, { "cell_type": "code", - "execution_count": 120, + "source": [ + "# Identify and print the country with the most Preferred customers\n", + "print(f'A country with the most preferred customers is: {preferred_customers_by_country.idxmax()} with {preferred_customers_by_country.max()} customers')" + ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "S1g_S834kjuG", - "outputId": "8e267f9b-3f79-4ab9-fcd6-333a41ed00d2" + "id": "TodeG89sqmEZ", + "outputId": "cd321113-ea11-44b1-eb2a-01b782de6922" }, + "execution_count": 64, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Preferred customers by country:\n", - "Country\n", - "Australia 2\n", - "Austria 3\n", - "Belgium 11\n", - "Canada 1\n", - "Channel Islands 3\n", - "Cyprus 3\n", - "Denmark 2\n", - "EIRE 1\n", - "Finland 4\n", - "France 20\n", - "Germany 29\n", - "Greece 1\n", - "Iceland 1\n", - "Israel 2\n", - "Italy 5\n", - "Japan 2\n", - "Lebanon 1\n", - "Malta 1\n", - "Norway 6\n", - "Poland 1\n", - "Portugal 5\n", - "Spain 7\n", - "Sweden 1\n", - "Switzerland 6\n", - "United Kingdom 755\n", - "Name: CustomerID, dtype: int64\n", "A country with the most preferred customers is: United Kingdom with 755 customers\n" ] } - ], - "source": [ - "# Filter the DataFrame to include only Preferred customers\n", - "preferred_customers_df = orders[orders['CustomerID'].isin(preferred_customers.index)]\n", - "\n", - "# Count unique Preferred customers by country\n", - "preferred_customers_by_country = preferred_customers_df.groupby('Country')['CustomerID'].nunique()\n", - "\n", - "# Print Preferred customer counts by country\n", - "print(\"Preferred customers by country:\")\n", - "print(preferred_customers_by_country)\n", - "\n", - "# Identify and print the country with the most Preferred customers\n", - "print(f'A country with the most preferred customers is: {preferred_customers_by_country.idxmax()} with {preferred_customers_by_country.max()} customers')\n" ] }, { "cell_type": "code", "source": [ - "#seeing that both VIP and preferred customers are mostly located in United Kingdom we can jsut sun up the values\n", + "#seeing that both VIP and preferred customers are mostly located in United Kingdom we can jsut suь up the values\n", "most_vip_and_preferred_customers = vip_customers_by_country.max() + preferred_customers_by_country.max()\n", "print(f'The total number of most VIP and preferred customers is: {most_vip_and_preferred_customers}')" ], @@ -659,9 +3077,9 @@ "base_uri": "https://localhost:8080/" }, "id": "_QzHe2IMwWCG", - "outputId": "bf78965a-9796-4799-9cba-1063d1eb6148" + "outputId": "0dc6aa97-1f31-4410-cc5a-bdf046cef381" }, - "execution_count": 121, + "execution_count": 65, "outputs": [ { "output_type": "stream", @@ -675,6 +3093,7 @@ { "cell_type": "code", "source": [ + "#checking the VIPs & Preferred and sum em up\n", "# Extract counts for the United Kingdom\n", "uk_vip_count = vip_customers_by_country.get('United Kingdom', 0)\n", "uk_preferred_count = preferred_customers_by_country.get('United Kingdom', 0)\n", @@ -682,16 +3101,18 @@ "# Calculate the total for the United Kingdom\n", "total_uk_customers = uk_vip_count + uk_preferred_count\n", "\n", - "print(f'The total number of VIP and Preferred customers in the United Kingdom is: {total_uk_customers}')\n" + "print(f'The total number of VIP and Preferred customers in the United Kingdom is: {total_uk_customers}')\n", + "\n", + "#seems correct, 932 = 932" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "MnM3Tvf0PIvh", - "outputId": "c1895a3d-d2b3-4833-ba14-f133ed2845f5" + "outputId": "1daf5426-4696-48dc-9036-dbcd171da9df" }, - "execution_count": 122, + "execution_count": 66, "outputs": [ { "output_type": "stream",