diff --git a/module-1/lab-resolving-git-conflicts/your-code/about-me.md b/module-1/lab-resolving-git-conflicts/your-code/about-me.md index 30a999d..9c3d7f9 100755 --- a/module-1/lab-resolving-git-conflicts/your-code/about-me.md +++ b/module-1/lab-resolving-git-conflicts/your-code/about-me.md @@ -1,7 +1,3 @@ Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque viverra laoreet lorem et dapibus. Integer auctor dignissim egestas. Ut id purus neque. Pellentesque imperdiet lacus in libero laoreet, at tempus felis tristique. Cras fermentum erat a dui vulputate gravida. Nulla aliquet nisi interdum nulla pretium, ac vestibulum diam congue. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Phasellus lacus risus, sodales vitae viverra quis, maximus ac ipsum. Sed consequat viverra mattis. Curabitur iaculis varius mollis. -Ut porttitor iaculis tellus bibendum euismod. Morbi porta, ante nec tempus porta, felis mi faucibus lacus, sed tristique purus nunc sed est. Aenean pulvinar urna ut lacus interdum aliquam. Pellentesque sit amet magna accumsan, sagittis metus a, volutpat velit. Mauris vitae ex vehicula, posuere nisi sed, sagittis nunc. Ut scelerisque, mi non tristique tristique, mi enim luctus nunc, eu mattis sem quam auctor nunc. Donec lobortis tellus eget blandit ultricies. Vivamus euismod metus eget leo blandit, at malesuada magna efficitur. Praesent sodales faucibus mi, ullamcorper ultrices orci. Vivamus maximus malesuada massa, nec placerat leo feugiat vel. Nam vitae eleifend enim. Nullam interdum ipsum velit, vitae faucibus lectus blandit euismod. - -Suspendisse ut malesuada ex. Nulla ultricies nisl et nisi rhoncus sollicitudin. Vestibulum maximus iaculis ligula, nec commodo nunc ullamcorper nec. Duis quis condimentum sapien. Cras vestibulum interdum felis eu auctor. Quisque semper, magna at dapibus faucibus, felis risus semper ligula, id aliquam lectus ligula vel nisi. In hac habitasse platea dictumst. Donec arcu sapien, suscipit ac dictum et, imperdiet id tortor. Maecenas ornare sodales interdum. Mauris dictum felis eu eros vestibulum cursus. Phasellus accumsan, turpis ut malesuada sollicitudin, augue leo venenatis ante, vel convallis tellus diam sit amet lacus. Aenean eu mauris eros. Praesent ante lacus, gravida sit amet tellus nec, laoreet ultrices lacus. Integer commodo semper vestibulum. Fusce felis massa, consectetur facilisis rutrum nec, pulvinar et nisi. - -Morbi fermentum ultricies tortor, vehicula ultrices eros elementum a. Duis ornare aliquam facilisis. Proin aliquam tincidunt odio vitae dignissim. Sed malesuada lacinia massa, nec blandit urna auctor elementum. Duis auctor non tortor in consequat. Mauris id vestibulum risus. In eget erat sed lacus efficitur viverra sed eu est. Aliquam interdum consequat molestie. Aliquam metus nisi, blandit non semper ut, blandit vel leo. Cras dictum turpis erat, sed iaculis ligula facilisis dapibus. Aliquam posuere dignissim fermentum. Praesent at neque sit amet lectus ornare iaculis. Curabitur id urna quis lorem varius ultrices eu sit amet sapien. Curabitur maximus volutpat suscipit. Proin imperdiet elementum lacus a eleifend. Sed tempor lacus posuere diam vehicula iaculis. +QUITÉ 3 TEXTOS PARA HACER ALGO EN ESTE README! LO HICE YO SAÚL ROMERO QUE AYER JUEVES 9 DE ABRIL GANÓ EL TERCER LUGAR DEL KAHOOT )(LO ESCRIBO PA QUE ME CREAN QUE YO LO HICE JEJE) diff --git a/module-2/lab-subsetting-and-descriptive-stats/your-code/.ipynb_checkpoints/main-checkpoint.ipynb b/module-2/lab-subsetting-and-descriptive-stats/your-code/.ipynb_checkpoints/main-checkpoint.ipynb new file mode 100755 index 0000000..1c160cc --- /dev/null +++ b/module-2/lab-subsetting-and-descriptive-stats/your-code/.ipynb_checkpoints/main-checkpoint.ipynb @@ -0,0 +1,2179 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Before you start :\n", + " - These exercises are related to the Subsetting and Descriptive Stats lessons.\n", + " - Keep in mind that you need to use some of the functions you learned in the previous lessons.\n", + " - All datasets are provided in the `your-code` folder of this lab.\n", + " - Elaborate your codes and outputs as much as you can.\n", + " - Try your best to answer the questions and complete the tasks and most importantly enjoy the process!!!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Import all the libraries that are necessary" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# import libraries here\n", + "import pandas as pd\n", + "import numpy as np " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Challenge 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### In this challenge we will use the `Temp_States.csv` file. \n", + "\n", + "#### First import it into a data frame called `temp`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# your answer here\n", + "temp = pd.read_csv('Temp_States.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Print `temp`" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CityStateTemperature
0NYCNew York19.444444
1AlbanyNew York9.444444
2BuffaloNew York3.333333
3HartfordConnecticut17.222222
4BridgeportConnecticut14.444444
5TretonNew Jersey22.222222
6NewarkNew Jersey20.000000
\n", + "
" + ], + "text/plain": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "1 Albany New York 9.444444\n", + "2 Buffalo New York 3.333333\n", + "3 Hartford Connecticut 17.222222\n", + "4 Bridgeport Connecticut 14.444444\n", + "5 Treton New Jersey 22.222222\n", + "6 Newark New Jersey 20.000000" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Explore the data types of the Temp dataframe. What type of data do we have? Comment your result." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "City object\n", + "State object\n", + "Temperature float64\n", + "dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "temp.dtypes\n", + "#los tipos de datos que tenemos son object y float64" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select the rows where state is New York" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "1 Albany New York 9.444444\n", + "2 Buffalo New York 3.333333\n" + ] + } + ], + "source": [ + "# your answer here\n", + "NY = temp[temp['State']=='New York']\n", + "print(NY)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What is the average of the temperature of cities in New York?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "10.74074074074074" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "NY.Temperature.mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### We want to know cities and states with Temperature above 15 degress Celcius" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CityStateTemperature
0NYCNew York19.444444
3HartfordConnecticut17.222222
5TretonNew Jersey22.222222
6NewarkNew Jersey20.000000
\n", + "
" + ], + "text/plain": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "3 Hartford Connecticut 17.222222\n", + "5 Treton New Jersey 22.222222\n", + "6 Newark New Jersey 20.000000" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "city = temp[temp['Temperature'] > 15]\n", + "city" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Now, return only the cities that have a temperature above 15 degress Celcius" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 NYC\n", + "3 Hartford\n", + "5 Treton\n", + "6 Newark\n", + "Name: City, dtype: object" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "city = temp[temp['Temperature'] > 15]\n", + "city.City" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### We want to know which cities have a temperature above 15 degrees Celcius and below 20 degrees Celcius\n", + "\n", + "*Hint: First write the condition then select the rows.*" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CityStateTemperature
0NYCNew York19.444444
3HartfordConnecticut17.222222
\n", + "
" + ], + "text/plain": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "3 Hartford Connecticut 17.222222" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "city2 = temp[(temp['Temperature'] > 15) & (temp['Temperature'] < 20)]\n", + "city2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Find the mean and the standard deviation of the temperature of each state.\n", + "\n", + "*Hint: Use functions from Data Manipulation lesson*" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateSts dev
0Connecticut1.964186
1New Jersey1.571348
2New York8.133404
\n", + "
" + ], + "text/plain": [ + " State Sts dev\n", + "0 Connecticut 1.964186\n", + "1 New Jersey 1.571348\n", + "2 New York 8.133404" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "std = temp.groupby('State', as_index=False).agg({'Temperature':[np.std]})\n", + "std.columns = ['State', 'Sts dev']\n", + "std\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "# Challenge 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Load the `employee.csv` file into a DataFrame. Call the dataframe `employee`" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "# your answer here\n", + "employee = pd.read_csv('Employee.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Explore the data types of the Temp dataframe. Comment your results" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 9 entries, 0 to 8\n", + "Data columns (total 7 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Name 9 non-null object\n", + " 1 Department 9 non-null object\n", + " 2 Education 9 non-null object\n", + " 3 Gender 9 non-null object\n", + " 4 Title 9 non-null object\n", + " 5 Years 9 non-null int64 \n", + " 6 Salary 9 non-null int64 \n", + "dtypes: int64(2), object(5)\n", + "memory usage: 632.0+ bytes\n" + ] + } + ], + "source": [ + "# your answer here\n", + "employee.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearsSalary
count9.0000009.000000
mean4.11111148.888889
std2.80376716.541194
min1.00000030.000000
25%2.00000035.000000
50%3.00000055.000000
75%7.00000060.000000
max8.00000070.000000
\n", + "
" + ], + "text/plain": [ + " Years Salary\n", + "count 9.000000 9.000000\n", + "mean 4.111111 48.888889\n", + "std 2.803767 16.541194\n", + "min 1.000000 30.000000\n", + "25% 2.000000 35.000000\n", + "50% 3.000000 55.000000\n", + "75% 7.000000 60.000000\n", + "max 8.000000 70.000000" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "employee.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Show visually the frequency distribution (histogram) of the employee dataset. In few words describe these histograms?" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[,\n", + " ]],\n", + " dtype=object)" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEICAYAAABRSj9aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAdBElEQVR4nO3dfZRcdZ3n8ffHEB4MCmiclglIcM2ssmQFJgu4uBpEMbAMmT3r7IRFBY4eZh1UdKIrsmdlRM8MjqKuiihqBBVBB1CzEIWskEFHQQgiAQISkZHEKEggEGDQxs/+cW+w6NTDrerqquqbz+ucOl11H799H759+1f3d7+yTURE1Nczhh1ARERMrST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuinyYk3SPp1cOOIyKmnyT6AZP0ckk/kLRZ0iZJ/yzpPww7roipJukrkr44YdgrJT0gac9hxbU9SKIfIEnPBi4HPgk8B5gDvB94YgrXucNULTuiS6cCR0l6DYCknYHPAUttb5zswnOst5ZEP1h/AmD7IttP2n7c9lW2b5H0byRdXV7d/EbShZJ2b7YQSQdL+qGkhyRtlPQpSTs2jLekUyTdBdwl6RxJZ09YxnJJ75zS3zaige0HgLcB50maBZwB/Ay4o/wv9yFJP5G0cOs8kk6StFbSI5LulvRXDeMWSlov6T2SfgV8UdJsSZeXy9ok6XuStvs8t91vgAH7KfCkpAskHSVpj4ZxAv4e+GPgJcDewN+2WM6TwDuB2cDLgCOAv54wzZ8DhwD7ARcAx2094CXNBl4NfLUPv1NEZbb/EbgJuAg4GfgfwBXAByn+y30XcKmk55Wz3AccAzwbOAn4mKSDGhb5/HK+fcrlLQXWA88DxoDTge3+OS9J9ANk+2Hg5RQH3ueA+8sr6zHb62yvtP2E7fuBjwKvbLGc1bavsz1u+x7gs02m/Xvbm8r/Gn4EbKb4gwCwBFhl+9f9/y0jOvpr4FXAmRTH4grbK2z/3vZK4EbgaADbV9j+mQv/BFwF/KeGZf0eOKM8bx4HfgfsCexj+3e2v+c80CuJftBsr7V9ou29gP0pruA/LmlM0sWSNkh6GPgKxRX7NiT9Sfnv6a/Kaf+uybT3Tvh8AfD68v3rgS/363eK6EZ5gfEb4DaKK/G/KJtaHpL0EMXF0J4A5X++15XNMA9R/AFoPNbvt/2vDZ8/DKwDriqbek4bxO806pLoh8j2HcD5FAn/7yiu9OfbfjZFMlaLWc8F7gDmldOe3mTaiVcxXwEWS3opRdPQN/vxO0RM0r3Al23v3vCaZfssSTsBlwIfAcZs7w6s4OnH+tOOc9uP2F5q+4XAscDfSDqC7VwS/QBJerGkpZL2Kj/vDRwHXAc8C9gCbJY0B3h3m0U9C3gY2CLpxcBbOq3b9nrgBoor+UvLf3Mjhu0rwJ9Jeq2kGZJ2Lr9k3QvYEdgJuB8Yl3QUcGS7hUk6RtKLJImiufJJiuad7VoS/WA9QvEF6fWSHqVI8LdSfIH0fuAgioPzCuCyNst5F/Dfy+V9DvhaxfVfAMwnzTYxImzfCyym+K/0foor/HcDz7D9CPB24OvAgxTH/PIOi5wH/D+Ki6YfAp+2fc3URD99KN9TbD8kvYLiCmqffEEVsf3IFf12QtJMig4rn0+Sj9i+JNFvByS9BHiI4k6Gjw85nIgYsDTdRETUXK7oIyJqbiQfAjR79mzPnTt32GE09eijjzJr1qxhhzF0o74dVq9e/Rvbz+s85WgYxDE/qvsscXWnVVztjvmRTPRz587lxhtvHHYYTa1atYqFCxcOO4yhG/XtIOlfhh1DNwZxzI/qPktc3WkVV7tjPk03ERE1l0QfEVFzSfQRETWXRB8RUXNJ9BERNZdEHxFRcx0TvaS9JV0j6XZJt0k6tck0kvQJSesk3dJY6kvSCZLuKl8n9PsXiBik8jG6Pyprm94m6f1NptlJ0tfK8+F6SXMHH2nEH1S5j36cokr7TZKeBayWtNL27Q3THEXxeNB5FI/hPRc4RNJzKAoAL6AoELBa0nLbD/b1t4gYnCeAV9neUj4o7vuSvm37uoZp3gQ8aPtFkpYAHwL+chjBRkCFK3rbG23fVL5/BFgLzJkw2WLgS2Vdx+uA3SXtCbwWWFnWLn0QWAks6utvEDFA5TG+pfw4s3xNfGDUYopn/wNcAhxRFsKIGIquesaW/4IeCFw/YdQcnl6jdH05rNXwZss+maKKO2NjY6xatWqbadZs2NxNuJMyf85uTYdv2bKlaWzTQS/bb1Dbodd92yq+qSRpBrAaeBFwju2W54PtcUmbgedS1EltXE7HY75Xzbbn2C7wyQu/1XKeYWxLGN1zqk5xVU70knalqN/4DtsPdxdaZ7bPA84DWLBggZt18T3xtCv6vdqW7jl+2/XD6HaLrqKX7Teo7dDrvm0V31Sy/SRwgKTdgW9I2t/2rT0sp+Mx36tm23Pp/HHOXtP6lB/GtoTRPafqFFelu27KtshLgQttNytxtwHYu+HzXuWwVsMjpj3bDwHXsG1z5FPHvaQdgN2ABwYbXcQfVLnrRsAXgLW2P9pisuXAG8u7bw4FNtveCFwJHClpD0l7UBT2vbJPsUcMnKTnlVfySNoFeA1wx4TJlgNb7zB7HXB1qnrFMFVpujkMeAOwRtLN5bDTgRcA2P4MsAI4GlgHPAacVI7bJOkDwA3lfGfa3tS/8CMGbk/ggrKd/hnA121fLulM4EbbyykujL4saR2wCVgyvHAjKiR6298H2t4xUF6tnNJi3DJgWU/RRYwY27dQ3JAwcfj7Gt7/K/AXg4wrop30jI2IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIia61h4RNIy4BjgPtv7Nxn/buD4huW9BHheWV3qHuAR4Elg3PaCfgUeERHVVLmiP59tix8/xfaHbR9g+wDgvcA/TSgXeHg5Pkk+ImIIOiZ629dS1L2s4jjgoklFFBERfdW3NnpJz6S48r+0YbCBqyStlnRyv9YVERHVdWyj78KfAf88odnm5bY3SPojYKWkO8r/ELZR/iE4GWBsbIxVq1ZtM83S+eN9DLe9ZusH2LJlS8txo66X7Teo7dDrvp2u+yJikPqZ6JcwodnG9oby532SvgEcDDRN9LbPA84DWLBggRcuXLjNNCeedkUfw23vnuO3XT8UiaVZbNNBL9tvUNuh133bKr6I+IO+NN1I2g14JfCthmGzJD1r63vgSODWfqwvIiKqq3J75UXAQmC2pPXAGcBMANufKSf7L8BVth9tmHUM+Iakrev5qu3v9C/0iIioomOit31chWnOp7gNs3HY3cBLew0sIiL6Iz1jI7ogaW9J10i6XdJtkk5tMs1CSZsl3Vy+3jeMWCO26ueXsRHbg3Fgqe2byu+gVktaafv2CdN9z/YxQ4gvYhu5oo/ogu2Ntm8q3z8CrAXmDDeqiPZyRR/RI0lzgQOB65uMfpmknwC/BN5l+7Ym83fsO9KrZv0SxnZp319hWH0SRrVvSp3iSqKP6IGkXSl6gb/D9sMTRt8E7GN7i6SjgW8C8yYuo0rfkV4165ewdP44Z69pfcoPq0/CqPZNqVNcabqJ6JKkmRRJ/kLbl00cb/th21vK9yuAmZJmDzjMiKck0Ud0QUXHkC8Aa21/tMU0zy+nQ9LBFOfZA4OLMuLp0nQT0Z3DgDcAayTdXA47HXgBPNWJ8HXAWySNA48DS2x7GMFGQBJ9RFdsfx9Qh2k+BXxqMBFFdJamm4iImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouY6JXtIySfdJalrvtV2RBUmLJN0paZ2k0/oZeEREVFPliv58YFGHab5n+4DydSaApBnAOcBRwH7AcZL2m0ywERHRvY6J3va1wKYeln0wsM723bZ/C1wMLO5hORERMQn9etZNsyILc4B7G6ZZDxzSagFVijC0K5rQb60e7D+qxQiq6GX7DWo79Lpvp+u+iBikfiT6SkUWOqlShKFZMYWp0qoIw6gWI6iil+03qO3Q674dVrGMiOlk0nfdtCmysAHYu2HSvcphERExQJNO9G2KLNwAzJO0r6QdgSXA8smuLyIiutOx6UbSRcBCYLak9cAZwEzoWGRhXNJbgSuBGcCyZgWSIyJianVM9LaP6zC+ZZGFsilnRW+hRUREP6RnbEREzSXRR0TUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FwSfUREzSXRR0TUXBJ9RETNJdFHdEHS3pKukXS7pNskndpkGkn6RFlZ7RZJBw0j1oit+vU8+ojtxTiw1PZNkp4FrJa00vbtDdMcRfGo7nkUNRjOpU0thoipliv6iC7Y3mj7pvL9I8BaiiI7jRYDX3LhOmB3SXsOONSIp+SKPqJHkuYCBwLXTxjVrLraHGDjhPk7VlXrVbOKXWO7tK/kNaxqXaNatW0Qca3ZsLnrefbdbUbXcSXRR/RA0q7ApcA7bD/cyzKqVFXrVbOKXUvnj3P2mtan/LCqdY1q1bZBxNVLZbXzF83qOq403UR0SdJMiiR/oe3LmkyS6moxUpLoI7pQVlP7ArDW9kdbTLYceGN5982hwGbbG1tMGzHl0nQT0Z3DgDcAayTdXA47HXgBPFV1bQVwNLAOeAw4aQhxRjylSinBZcAxwH22928y/njgPYCAR4C32P5JOe6ectiTwLjtBf0LPWLwbH+f4lhvN42BUwYTUURnVZpuzgcWtRn/c+CVtucDH6D8cqnB4bYPSJKPiBiOKjVjry1vI2s1/gcNH6+j+OIpIiJGRL/b6N8EfLvhs4GrJBn4bHk7WVNV7iludw9wv7W6T3VU7/mtopftN6jt0Ou+na77ImKQ+pboJR1Okehf3jD45bY3SPojYKWkO2xf22z+KvcU93LPaa9a3VM8qvf8VtHL9hvUduh13w7r3u+I6aQvt1dK+vfA54HFth/YOtz2hvLnfcA3gIP7sb6IiKhu0ole0guAy4A32P5pw/BZ5UOfkDQLOBK4dbLri4iI7lS5vfIiYCEwW9J64AxgJjx1z/D7gOcCny76kjx1G+UY8I1y2A7AV21/Zwp+h4iIaKPKXTfHdRj/ZuDNTYbfDby099AiIqIf8giEiIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImquU6CUtk3SfpKY1X1X4hKR1km6RdFDDuBMk3VW+TuhX4BHDUOFcWChps6Sby9f7Bh1jxERVr+jPBxa1GX8UMK98nQycCyDpORQ1Zg8BDgbOkLRHr8FGjIDzaX8uAHzP9gHl68wBxBTRVqVEb/taYFObSRYDX3LhOmB3SXsCrwVW2t5k+0FgJZ1PkoiRVeFciBg5sl1tQmkucLnt/ZuMuxw4y/b3y8/fBd4DLAR2tv3Bcvj/Bh63/ZEmyziZ4r8BxsbG/vTiiy/eJoY1GzZXirUf5s/ZrenwLVu2sOuuuw4sjn7q5/Yb2wV+/Xjzca22XTu9xtZqXYcffvhq2wt6WmgHHc6FhcClwHrgl8C7bN/WYjkdj/leNdue7fYZ9Lbf+mFUz6lBxNXLcb/vbjOaxtXumN+h+9Cmhu3zgPMAFixY4IULF24zzYmnXTGweO45ftv1A6xatYpmsU0H/dx+S+ePc/aa5odPq23XTq+x9bKuKXYTsI/tLZKOBr5J0aS5jSrHfK+abc92+wyGty1H9ZwaRFy9HPfnL5rVdVz9uutmA7B3w+e9ymGthkfUku2HbW8p368AZkqaPeSwYjvXr0S/HHhjeffNocBm2xuBK4EjJe1Rfgl7ZDksopYkPV+SyvcHU5xjDww3qtjeVWq6kXQRRXv7bEnrKe6kmQlg+zPACuBoYB3wGHBSOW6TpA8AN5SLOtN2vsiKaavCufA64C2SxoHHgSWu+kVYxBSplOhtH9dhvIFTWoxbBizrPrSI0VPhXPgU8KkBhRNRSXrGRkTUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FwSfUREzSXRR0TUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FwSfUREzSXRR0TUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FylRC9pkaQ7Ja2TdFqT8R+TdHP5+qmkhxrGPdkwbnk/g4+IiM46lhKUNAM4B3gNsB64QdJy27dvncb2OxumfxtwYMMiHrd9QP9CjoiIblS5oj8YWGf7btu/BS4GFreZ/jjgon4EFxERk1elOPgc4N6Gz+uBQ5pNKGkfYF/g6obBO0u6ERgHzrL9zRbzngycDDA2NsaqVau2mWbp/PEK4fZHs/UDbNmypeW4UdfP7Te2S+vl9bJ9eo1tuu6LiEGqkui7sQS4xPaTDcP2sb1B0guBqyWtsf2ziTPaPg84D2DBggVeuHDhNgs/8bQr+hxua/ccv+36oUgszWKbDvq5/ZbOH+fsNc0Pn1bbrp1eY+tlXRHbmypNNxuAvRs+71UOa2YJE5ptbG8of94NrOLp7fcRETHFqiT6G4B5kvaVtCNFMt/m7hlJLwb2AH7YMGwPSTuV72cDhwG3T5w3IiKmTsemG9vjkt4KXAnMAJbZvk3SmcCNtrcm/SXAxbbdMPtLgM9K+j3FH5WzGu/WiYiIqVepjd72CmDFhGHvm/D5b5vM9wNg/iTii4iISUrP2IguSFom6T5Jt7YYL0mfKDsX3iLpoEHHGDFREn1Ed84HFrUZfxQwr3ydDJw7gJgi2kqij+iC7WuBTW0mWQx8yYXrgN0l7TmY6CKa6/d99BHbu2YdDOcAGydOWKWT4JoNm3sKYmmTb8badXID+OSF3+p6PfPn7Nb1PBN/p7Fdqq27l3VNRjedI/u5nzrppdNmEn3EkAy6k2C7Tm696kfnuKpxDbpzXDedIwfZmfP8RbO67rSZppuI/uqmg2HEQCTRR/TXcuCN5d03hwKbbW/TbBMxSGm6ieiCpIuAhcBsSeuBM4CZALY/Q9Hf5GhgHfAYcNJwIo34gyT6iC7YPq7DeAOnDCiciErSdBMRUXNJ9BERNZdEHxFRc0n0ERE1l0QfEVFzSfQRETWXRB8RUXNJ9BERNVcp0UtaJOnOspjCaU3Gnyjpfkk3l683N4w7QdJd5euEfgYfERGddewZK2kGcA7wGopHrt4gaXmT2q9fs/3WCfM+h6KL+ALAwOpy3gf7En1ERHRU5Yr+YGCd7btt/xa4mKK4QhWvBVba3lQm95W0r84TERF9VuVZN80KKRzSZLr/KukVwE+Bd9q+t8W8c5qtpEoRhnZFE/qt1YP9e3no/6jo5/ZrV8Sil+3Ta2zTdV9EDFK/Hmr2f4GLbD8h6a+AC4BXdbOAQRdh6KRVkYNuihGMmkEVsehHMYqqBl2MImI6qtJ007GQgu0HbD9Rfvw88KdV542IiKlVJdHfAMyTtK+kHYElFMUVnjKh+PGxwNry/ZXAkZL2kLQHcGQ5LCIiBqRj043tcUlvpUjQM4Bltm+TdCZwo+3lwNslHQuMA5uAE8t5N0n6AMUfC4AzbW+agt8jIiJaqNRGb3sFReWcxmHva3j/XuC9LeZdBiybRIwRETEJ6RkbEVFzSfQRETWXRB8RUXNJ9BERNZdEHxFRc0n0ERE1l0QfEVFzSfQRETWXRB/RpckU4okYhn49vTJiuzCZQjwRw5Ir+ojuTKYQT8RQ5Io+ojuTKcTzNIMuttOuWEyv+lFkpmpcgy4y002RoUEWReql+FESfUT/VSrEM+hiO+2KxfSqH0VmqsY16CIz3RQZGmRRpPMXzeq6+FGabiK6M5lCPBFDkUQf0Z3JFOKJGIo03UR0YTKFeCKGJYk+okuTKcQTMQyVmm4qdBD5G0m3S7pF0ncl7dMw7smGjiPLJ84bERFTq+MVfcUOIj8GFth+TNJbgH8A/rIc97jtA/ocd0REVFTlir5jBxHb19h+rPx4HcWdCBERMQKqtNFX7SCy1ZuAbzd83lnSjRRfTJ1l+5vNZhp055FOWnVI6KWzwqgYVOebfnSiqWq67ouIQerrl7GSXg8sAF7ZMHgf2xskvRC4WtIa2z+bOO+gO4900qpzRjedKEbNoDrf9KMTTVWD7kQTMR1Vabrp2EEEQNKrgf8FHNvQWQTbG8qfdwOrgAMnEW9ERHSpSqKv0kHkQOCzFEn+vobhe0jaqXw/GzgMmPiUv4iImEIdm24qdhD5MLAr8I+SAH5h+1jgJcBnJf2e4o/KWU0e5xoREVOoUht9hQ4ir24x3w+A+ZMJMCIiJifPuomIqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5Sole0iJJd0paJ+m0JuN3kvS1cvz1kuY2jHtvOfxOSa/tX+gRwzGZ8yFiGDomekkzgHOAo4D9gOMk7TdhsjcBD9p+EfAx4EPlvPtRFBP/d8Ai4NPl8iKmpcmcDxHDUuWK/mBgne27bf8WuBhYPGGaxcAF5ftLgCNUVAlfDFxs+wnbPwfWlcuLmK4mcz5EDEWV4uBzgHsbPq8HDmk1je1xSZuB55bDr5sw75xmK5F0MnBy+XGLpDsrxDZl1PoabDbwm8FFMpre3mY7tNl2fddmXftM0Soncz48bXsN+phvt8961Y99XTWuQR5XpZE81w//UMu4Wh7zVRL9QNg+Dzhv2HF0IulG2wuGHcewZTtM3qCP+VHdZ4mrO73EVaXpZgOwd8PnvcphTaeRtAOwG/BAxXkjppPJnA8RQ1El0d8AzJO0r6QdKb5cXT5hmuXACeX71wFX23Y5fEl5F8K+wDzgR/0JPWIoJnM+RAxFx6abso3xrcCVwAxgme3bJJ0J3Gh7OfAF4MuS1gGbKA5+yum+DtwOjAOn2H5yin6XQRn55qUB2S63w2TOhxEwqvsscXWn67iUC42IiHpLz9iIiJpLoo+IqLkk+g4kzZD0Y0mXl5/3Lbu1ryu7ue847BgHQdLuki6RdIektZJeJuk5klZKuqv8ucew44xtSdpb0jWSbpd0m6RThx3TVhPPr1HR7HgfgZjeWe6/WyVdJGnnqvMm0Xd2KrC24fOHgI+V3dsfpOjuvj34P8B3bL8YeCnFNjkN+K7tecB3y88xesaBpbb3Aw4FTmny2IZhmXh+jYpmx/vQSJoDvB1YYHt/ihsBKn/Jn0TfhqS9gP8MfL78LOBVFN3aoejm/ufDiW5wJO0GvILibhJs/9b2Qzy9q/92sS2mI9sbbd9Uvn+EImk17aE+SBPPr1HR5ngfth2AXcq+Gc8Efll1xiT69j4O/E/g9+Xn5wIP2R4vP7d8pEPN7AvcD3yx/Df785JmAWO2N5bT/AoYG1qEUUn5JM0DgeuHGwmw7fk1Klod70NjewPwEeAXwEZgs+2rqs6fRN+CpGOA+2yvHnYsI2AH4CDgXNsHAo8yoZmm7BCUe3VHmKRdgUuBd9h+eMixjPL51fF4H7Ty+6/FFH+E/hiYJen1VedPom/tMOBYSfdQPKHwVRTtdruX/zrB9vNIh/XAettbrwIvoTgRfi1pT4Dy531Dii86kDSTIslfaPuyYcdDk/NL0leGG9JTWh3vw/Rq4Oe277f9O+Ay4D9WnTmJvgXb77W9l+25FF96XG37eOAaim7tUHRz/9aQQhwY278C7pX0b8tBR1D0dm7s6r9dbIvpqPxu6QvAWtsfHXY80PL8qnyFOpXaHO/D9AvgUEnPLPfnEXTxBfHIPL1yGnkPcLGkDwI/pvzCZjvwNuDC8nbSu4GTKC4Uvi7pTcC/AP9tiPFFa4cBbwDWSLq5HHa67RVDjGnUNTveh8b29ZIuAW6iuIvqx3TxKIQ8AiEioubSdBMRUXNJ9BERNZdEHxFRc0n0ERE1l0QfEVFzSfQRETWXRB8RUXP/H03vhiiqg486AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# your answer here\n", + "freqd = employee.hist()\n", + "freqd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What's the average salary in this company?" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "48.888888888888886" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "employee.Salary.mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What's the highest salary?" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "70" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "employee.Salary.max()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What's the lowest salary?" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "30" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "employee.Salary.min()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Who are the employees with the lowest salary?" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
1MariaITMasterFanalyst230
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "1 Maria IT Master F analyst 2 30\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "lowest = employee[employee['Salary'] == employee['Salary'].min()]\n", + "lowest" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Could you give all the information about an employee called David?" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "\n", + "employee[employee['Name'] == 'David']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Could you give only David's salary?" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2 30\n", + "Name: Salary, dtype: int64" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "employee[employee['Name'] == 'David']['Salary']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Print all the rows where job title is associate" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
4SamuelSalesMasterMassociate355
5EvaSalesBachelorFassociate255
7PedroITPhdMassociate760
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "4 Samuel Sales Master M associate 3 55\n", + "5 Eva Sales Bachelor F associate 2 55\n", + "7 Pedro IT Phd M associate 7 60" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "employee[employee['Title'] == 'associate']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Print the first 3 rows of your dataframe\n", + "\n", + "##### Tip : There are 2 ways to do it. Do it both ways" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
0JoseITBachelorManalyst135
1MariaITMasterFanalyst230
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "0 Jose IT Bachelor M analyst 1 35\n", + "1 Maria IT Master F analyst 2 30\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here- 1 method\n", + "employee.head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
0JoseITBachelorManalyst135
1MariaITMasterFanalyst230
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "0 Jose IT Bachelor M analyst 1 35\n", + "1 Maria IT Master F analyst 2 30\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here- 2nd method\n", + "employee.iloc[:3:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Find the employees who's title is associate and the salary above 55?" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
7PedroITPhdMassociate760
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "7 Pedro IT Phd M associate 7 60" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Group the employees based on their number of years of employment. What are the average salaries in each group?" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearsAvrg Salary
0135.000000
1238.333333
2355.000000
3435.000000
4760.000000
5870.000000
\n", + "
" + ], + "text/plain": [ + " Years Avrg Salary\n", + "0 1 35.000000\n", + "1 2 38.333333\n", + "2 3 55.000000\n", + "3 4 35.000000\n", + "4 7 60.000000\n", + "5 8 70.000000" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "empl = employee.groupby('Years', as_index=False).agg({'Salary': np.mean})\n", + "empl.columns = ['Years', 'Avrg Salary']\n", + "empl\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What is the average Salary per title?" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TitleAvrg Salary
0VP70.000000
1analyst32.500000
2associate56.666667
\n", + "
" + ], + "text/plain": [ + " Title Avrg Salary\n", + "0 VP 70.000000\n", + "1 analyst 32.500000\n", + "2 associate 56.666667" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "sal = employee.groupby('Title', as_index=False).agg({'Salary': np.mean})\n", + "sal.columns = ['Title', 'Avrg Salary']\n", + "sal" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Show a visual summary of the data using boxplot. What Are the First and Third Quartiles? Comment your results.\n", + "##### * Hint : Quantiles vs Quartiles*\n", + "##### - `In Probability and Statistics, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities. When division is into four parts the values of the variate corresponding to 25%, 50% and 75% of the total distribution are called quartiles.`" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAQYklEQVR4nO3df4xlZX3H8fdHFgJZkR9ib7ZgXRIpirUgTqnW/pgVMSimS1KL2MZuDcm2jTU0bVK3NbGltc0Skypt9Y9V1G39BSIE6ppVsu5ta9Ogu4rKDxVEiOCyK5a1zGoV9Ns/5qyMszM7987MnZln9/1Kbu45z3nOnO/cffLZk2fOuSdVhSSpPU9Z7gIkSfNjgEtSowxwSWqUAS5JjTLAJalRq5byYKeddlqtXbt2KQ95RDtw4ACrV69e7jKkQzg2F9fu3bsfqapnTG9f0gBfu3Ytu3btWspDHtH6/T7j4+PLXYZ0CMfm4krywEztTqFIUqMMcElqlAEuSY0ywCWpUQa4JDVqzgBPcnaS26e8/jfJnyQ5NcmtSe7p3k9ZioIlSZPmDPCq+mpVnVdV5wEvBL4H3ARsAnZU1VnAjm5dkrREhp1CuRD4elU9AKwHtnbtW4FLF7MwSdLhDXsjz+XAh7vlXlXt6ZYfBnoz7ZBkI7ARoNfr0e/351GmZjIxMeHnqWW1bt26offZuXPnCCo5OmXQBzokOQ74FvC8qtqbZH9VnTxl+6NVddh58LGxsfJOzMXj3W5aqdZu2sb9my9Z7jKOGEl2V9XY9PZhplBeAXy+qvZ263uTrOl++Bpg38LLlCQNapgAfy1PTp8A3AJs6JY3ADcvVlGSpLkNFOBJVgMXATdOad4MXJTkHuBl3bokaYkM9EfMqjoAPH1a23eYvCpFkrQMvBNTkhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNGvSp9CcnuSHJV5LcneTFSU5NcmuSe7r3U0ZdrCTpSYOegV8DbK+q5wDnAncDm4AdVXUWsKNblyQtkTkDPMlJwK8D1wJU1Q+raj+wHtjaddsKXDqqIiVJh1o1QJ8zgW8D70tyLrAbuBLoVdWers/DQG+mnZNsBDYC9Ho9+v3+QmtWZ2Jiws9TI/eGHQc48Pjw+63dtG3gvquPhXdeuHr4gxzlBgnwVcD5wBur6rYk1zBtuqSqKknNtHNVbQG2AIyNjdX4+PjCKtZP9Pt9/Dw1age2b+P+zZcMtc+wY3Ptpm2O5XkYZA78QeDBqrqtW7+ByUDfm2QNQPe+bzQlSpJmMmeAV9XDwDeTnN01XQjcBdwCbOjaNgA3j6RCSdKMBplCAXgj8MEkxwH3Aa9nMvyvT3IF8ABw2WhKlCTNZKAAr6rbgbEZNl24uOVIkgblnZiS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRg16J6ako9SJz93E87fO4+v+t87d5cljAAz3hVkywCXN4bG7Ny/JtxFqeE6hSFKjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowa6EzPJ/cBjwI+AJ6pqLMmpwHXAWuB+4LKqenQ0ZUqSphvmDHxdVZ1XVQcfbrwJ2FFVZwE7unVJ0hJZyBTKep78upqtwKULL0eSNKhBA7yATyXZnWRj19arqj3d8sNAb9GrkyTNatBvI/zVqnooyc8Atyb5ytSNVVVJaqYdu8DfCNDr9ej3+wupV1NMTEz4eWpJDDvO5jM2HcvDGyjAq+qh7n1fkpuAC4C9SdZU1Z4ka4B9s+y7BdgCMDY2VsN8xaQOb9iv7JTmZfu2ocfZ0GNzHsfQAFMoSVYnOfHgMvBy4A7gFmBD120DcPOoipQkHWqQM/AecFOSg/0/VFXbk3wOuD7JFcADwGWjK1OSNN2cAV5V9wHnztD+HeDCURQlSZqbd2JKUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRAwd4kmOSfCHJx7v1M5PcluTeJNclOW50ZUqSphvmDPxK4O4p61cDb6+qZwOPAlcsZmGSpMMbKMCTnAFcArynWw/wUuCGrstW4NJRFChJmtmqAfu9A/hz4MRu/enA/qp6olt/EDh9ph2TbAQ2AvR6Pfr9/ryL1U+bmJjw89SSGHaczWdsOpaHN2eAJ3kVsK+qdicZH/YAVbUF2AIwNjZW4+ND/wjNot/v4+epkdu+behxNvTYnMcxNNgZ+EuA30zySuB44GnANcDJSVZ1Z+FnAA+NrkxJ0nRzzoFX1V9U1RlVtRa4HPh0Vf0usBN4dddtA3DzyKqUJB1iIdeBvwn40yT3Mjknfu3ilCRJGsSgf8QEoKr6QL9bvg+4YPFLkiQNwjsxJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUUPdiSnp6LR207bhd9o++D4nnXDs8D9fBrikw7t/8yVD77N207Z57afhOIUiSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNmjPAkxyf5LNJvpjkziRXde1nJrktyb1Jrkty3OjLlSQdNMgZ+A+Al1bVucB5wMVJXgRcDby9qp4NPApcMboyJUnTzRngNWmiWz22exXwUuCGrn0rcOlIKpQkzWigL7NKcgywG3g28E7g68D+qnqi6/IgcPos+24ENgL0ej36/f4CS9ZBExMTfp5aVuvWrZt1W66euX3nzp0jquboM1CAV9WPgPOSnAzcBDxn0ANU1RZgC8DY2FiNj4/Po0zNpN/v4+ep5VRVM7Y7NpfGUFehVNV+YCfwYuDkJAf/AzgDeGiRa5MkHcYgV6E8ozvzJskJwEXA3UwG+au7bhuAm0dVpCTpUINMoawBtnbz4E8Brq+qjye5C/hIkrcCXwCuHWGdkqRp5gzwqvoS8IIZ2u8DLhhFUZKkuXknpiQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktSoQZ5K/8wkO5PcleTOJFd27acmuTXJPd37KaMvV5J00CBn4E8Af1ZV5wAvAt6Q5BxgE7Cjqs4CdnTrkqQlMmeAV9Weqvp8t/wYcDdwOrAe2Np12wpcOqoiJUmHWjVM5yRrgRcAtwG9qtrTbXoY6M2yz0ZgI0Cv16Pf78+zVE03MTHh56kVybG5NFJVg3VMngr8O/B3VXVjkv1VdfKU7Y9W1WHnwcfGxmrXrl0LKlhP6vf7jI+PL3cZ0iEcm4srye6qGpvePtBVKEmOBT4GfLCqbuya9yZZ021fA+xbrGIlSXMb5CqUANcCd1fVP0zZdAuwoVveANy8+OVJkmYzyBz4S4DXAV9OcnvX9pfAZuD6JFcADwCXjaZESdJM5gzwqvoMkFk2X7i45UiSBuWdmJLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJatScAZ7kvUn2JbljStupSW5Nck/3fspoy5QkTTfIGfj7gYuntW0CdlTVWcCObl2StITmDPCq+g/gf6Y1rwe2dstbgUsXuS5J0hxWzXO/XlXt6ZYfBnqzdUyyEdgI0Ov16Pf78zykppuYmPDz1Irk2Fwa8w3wn6iqSlKH2b4F2AIwNjZW4+PjCz2kOv1+Hz9PrUSOzaUx36tQ9iZZA9C971u8kiRJg5hvgN8CbOiWNwA3L045kqRBDXIZ4YeB/wbOTvJgkiuAzcBFSe4BXtatS5KW0Jxz4FX12lk2XbjItUiShrDgP2JqtJLMa7+qWf+uLOkI4a30K1xVzfp61ps+Pus2SUc+A1ySGuUUygpx7lWf4rvff3zo/dZu2jZw35NOOJYv/tXLhz6GpJXJAF8hvvv9x7l/8yVD7TPszRLDhL2klc8pFElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapR3Yq4QJz53E8/fumn4HbfO3eXJYwAMd7enpJXLAF8hHrt7s7fSSxqKUyiS1CjPwFeQeZ0hbx/u2wglHTkM8BVi2OkTmAz8+ewn6cjgFIokNWpBZ+BJLgauAY4B3lNVPp1+kc31TMxcPXO7j1WTjnzzPgNPcgzwTuAVwDnAa5Ocs1iFadLhnom5c+dOn4kpHcUWMoVyAXBvVd1XVT8EPgKsX5yyJElzWcgUyunAN6esPwj88vROSTYCGwF6vR79fn8Bh9RUExMTfp5akRybS2PkV6FU1RZgC8DY2FgNc+OJDm/YG3mkpeLYXBoLmUJ5CHjmlPUzujZJ0hJYSIB/DjgryZlJjgMuB25ZnLIkSXOZ9xRKVT2R5I+BTzJ5GeF7q+rORatMknRYC5oDr6pPAJ9YpFokSUPwTkxJalSW8qaPJN8GHliyAx75TgMeWe4ipBk4NhfXs6rqGdMblzTAtbiS7KqqseWuQ5rOsbk0nEKRpEYZ4JLUKAO8bVuWuwBpFo7NJeAcuCQ1yjNwSWqUAS5JjTLAV4BM+kySV0xp++0k25ezLinJm5PcmeRLSW5PcshXRk/p+/4kr17K+o52PtR4BaiqSvKHwEeT7GTy3+XvgYvn8/OSrKqqJxazRh19krwYeBVwflX9IMlpwHGL+PMdpwvkGfgKUVV3AP8GvAl4C/AB4M1JPpvkC0nWAyRZm+Q/k3y+e/1K1z7etd8C3JVkdZJtSb6Y5I4kr1mu303NWgM8UlU/AKiqR6rqW0nekuRz3bjakhke3DpbnyT9JO9IsovJ8f2NJMd22542dV1zM8BXlquA32HyOaPHA5+uqguAdcDbkqwG9gEXVdX5wGuAf5yy//nAlVX180yevX+rqs6tql8AnI7RsD4FPDPJ15K8K8lvdO3/XFW/1I2rE5g8S5/ucH2Oq6qxqroK6AOXdO2XAzdW1eMj+W2OQAb4ClJVB4DrgH8FLgI2JbmdyUF+PPBzwLHAu5N8Gfgokw+UPuizVfWNbvnLwEVJrk7ya1X13SX6NXSEqKoJ4IVMPhLx28B1SX4fWJfktm4MvhR43gy7H67PdVOW3wO8vlt+PfC+xf0tjmzOga88P+5eAX6rqr46dWOSvwb2Aucy+R/w/03ZfODgQlV9Lcn5wCuBtybZUVV/M+LadYSpqh8xeQLR78L4D4BfBMaq6pvdeDx+6j5JjgfedZg+U8fpf3XTguPAMd1UogbkGfjK9UngjVPmDl/QtZ8E7KmqHwOvY/JhGodI8rPA96rqA8DbmJxekQaW5OwkZ01pOg84eELxSJKnAjNddXL8AH2m+hfgQ3j2PTTPwFeuvwXeAXwpyVOAbzA5j/gu4GNJfo/Jee0Ds+z/fCbnzX8MPA780ehL1hHmqcA/JTkZeAK4l8nplP3AHcDDTD5a8adU1f4k7z5cn2k+CLwV+PDilX508FZ6Scuqu3Z8fVW9brlraY1n4JKWTZJ/YvKqq1cudy0t8gxckhrlHzElqVEGuCQ1ygCXpEYZ4JLUKANckhr1/3QqpiraXvY1AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# draw boxplot here\n", + "employee.boxplot()" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Years 2.0\n", + "Salary 35.0\n", + "Name: 25%, dtype: float64" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# print first quartile here\n", + "employee.describe().loc['25%']" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Years 7.0\n", + "Salary 60.0\n", + "Name: 75%, dtype: float64" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# print third quartile here\n", + "employee.describe().loc['75%']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Is the mean salary per gender different?" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
GenderAverage Salary
0F47.5
1M50.0
\n", + "
" + ], + "text/plain": [ + " Gender Average Salary\n", + "0 F 47.5\n", + "1 M 50.0" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "salger = employee.groupby('Gender', as_index=False).agg({'Salary':np.mean})\n", + "salger.columns = ['Gender', 'Average Salary']\n", + "salger" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Find the minimum, mean and the maximum of all numeric columns for each Department.\n", + "\n", + "##### Hint: Use functions from Data Manipulation lesson" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DepartmentSalari.minSalary.meanSalary.maxYears.minYears.meanYears.max
Department
HR3045.007024.6666678
IT3048.757014.5000008
Sales5555.005522.5000003
\n", + "
" + ], + "text/plain": [ + " Department Salari.min Salary.mean Salary.maxYears.min \\\n", + "Department \n", + "HR 30 45.00 70 2 \n", + "IT 30 48.75 70 1 \n", + "Sales 55 55.00 55 2 \n", + "\n", + " Years.mean Years.max \n", + "Department \n", + "HR 4.666667 8 \n", + "IT 4.500000 8 \n", + "Sales 2.500000 3 " + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "stats = employee.groupby('Department').agg({'Salary':[np.min, np.mean, np.max],\n", + " 'Years':[np.min, np.mean, np.max]})\n", + "stats.columns = ['Department',\n", + " 'Salari.min', 'Salary.mean', 'Salary.max'\n", + " 'Years.min', 'Years.mean', 'Years.max']\n", + "stats" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Bonus Question\n", + "\n", + "#### For each department, compute the difference between the maximal salary and the minimal salary.\n", + "\n", + "##### * Hint: try using `agg` or `apply` and `lambda`*" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "# Challenge 3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Open the Orders.csv dataset. Name your dataset orders" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Explore your dataset by looking at the data types and the summary statistics. Comment your results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What is the average Purchase Price?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What were the highest and lowest purchase prices? " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select all the customers we have in Spain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### How many customers do we have in Spain?\n", + "##### Hint : Use value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select all the customers who have bought more than 50 items ?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select orders from Spain that are above 50 items" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select all free orders" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select all orders that are 'lunch bag'\n", + "#### Hint: Use string functions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select all orders that are made in 2011 and are 'lunch bag' " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Show the frequency distribution of the amount spent in Spain." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select all orders made in the month of August" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Select how many orders are made by countries in the month of August\n", + "##### Hint: Use value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What's the average amount of money spent by country" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What's the most expensive item?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What was the average amount spent per year ?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# your answer here" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/module-2/lab-subsetting-and-descriptive-stats/your-code/main.ipynb b/module-2/lab-subsetting-and-descriptive-stats/your-code/main.ipynb index 6a2cca3..1c160cc 100755 --- a/module-2/lab-subsetting-and-descriptive-stats/your-code/main.ipynb +++ b/module-2/lab-subsetting-and-descriptive-stats/your-code/main.ipynb @@ -22,12 +22,12 @@ { "cell_type": "code", "execution_count": 1, - "metadata": { - "collapsed": true - }, + "metadata": {}, "outputs": [], "source": [ - "# import libraries here" + "# import libraries here\n", + "import pandas as pd\n", + "import numpy as np " ] }, { @@ -49,12 +49,11 @@ { "cell_type": "code", "execution_count": 2, - "metadata": { - "collapsed": true - }, + "metadata": {}, "outputs": [], "source": [ - "# your answer here\n" + "# your answer here\n", + "temp = pd.read_csv('Temp_States.csv')" ] }, { @@ -66,10 +65,101 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CityStateTemperature
0NYCNew York19.444444
1AlbanyNew York9.444444
2BuffaloNew York3.333333
3HartfordConnecticut17.222222
4BridgeportConnecticut14.444444
5TretonNew Jersey22.222222
6NewarkNew Jersey20.000000
\n", + "
" + ], + "text/plain": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "1 Albany New York 9.444444\n", + "2 Buffalo New York 3.333333\n", + "3 Hartford Connecticut 17.222222\n", + "4 Bridgeport Connecticut 14.444444\n", + "5 Treton New Jersey 22.222222\n", + "6 Newark New Jersey 20.000000" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp" + ] }, { "cell_type": "markdown", @@ -80,11 +170,27 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "City object\n", + "State object\n", + "Temperature float64\n", + "dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# your answer here\n" + "# your answer here\n", + "temp.dtypes\n", + "#los tipos de datos que tenemos son object y float64" ] }, { @@ -96,11 +202,24 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "1 Albany New York 9.444444\n", + "2 Buffalo New York 3.333333\n" + ] + } + ], "source": [ - "# your answer here" + "# your answer here\n", + "NY = temp[temp['State']=='New York']\n", + "print(NY)" ] }, { @@ -112,11 +231,23 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 10, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "10.74074074074074" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# your answer here\n" + "# your answer here\n", + "NY.Temperature.mean()" ] }, { @@ -128,11 +259,81 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 14, "metadata": {}, - "outputs": [], - "source": [ - "# your answer here\n" + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CityStateTemperature
0NYCNew York19.444444
3HartfordConnecticut17.222222
5TretonNew Jersey22.222222
6NewarkNew Jersey20.000000
\n", + "
" + ], + "text/plain": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "3 Hartford Connecticut 17.222222\n", + "5 Treton New Jersey 22.222222\n", + "6 Newark New Jersey 20.000000" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "city = temp[temp['Temperature'] > 15]\n", + "city" ] }, { @@ -144,11 +345,28 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 15, "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "outputs": [ + { + "data": { + "text/plain": [ + "0 NYC\n", + "3 Hartford\n", + "5 Treton\n", + "6 Newark\n", + "Name: City, dtype: object" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "city = temp[temp['Temperature'] > 15]\n", + "city.City" ] }, { @@ -162,11 +380,67 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 28, "metadata": {}, - "outputs": [], - "source": [ - "# your answer here\n" + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CityStateTemperature
0NYCNew York19.444444
3HartfordConnecticut17.222222
\n", + "
" + ], + "text/plain": [ + " City State Temperature\n", + "0 NYC New York 19.444444\n", + "3 Hartford Connecticut 17.222222" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "city2 = temp[(temp['Temperature'] > 15) & (temp['Temperature'] < 20)]\n", + "city2" ] }, { @@ -180,11 +454,72 @@ }, { "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here\n" + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StateSts dev
0Connecticut1.964186
1New Jersey1.571348
2New York8.133404
\n", + "
" + ], + "text/plain": [ + " State Sts dev\n", + "0 Connecticut 1.964186\n", + "1 New Jersey 1.571348\n", + "2 New York 8.133404" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "std = temp.groupby('State', as_index=False).agg({'Temperature':[np.std]})\n", + "std.columns = ['State', 'Sts dev']\n", + "std\n", + "\n" ] }, { @@ -205,13 +540,12 @@ }, { "cell_type": "code", - "execution_count": 11, - "metadata": { - "collapsed": true - }, + "execution_count": 37, + "metadata": {}, "outputs": [], "source": [ - "# your answer here" + "# your answer here\n", + "employee = pd.read_csv('Employee.csv')" ] }, { @@ -223,12 +557,137 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 9 entries, 0 to 8\n", + "Data columns (total 7 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Name 9 non-null object\n", + " 1 Department 9 non-null object\n", + " 2 Education 9 non-null object\n", + " 3 Gender 9 non-null object\n", + " 4 Title 9 non-null object\n", + " 5 Years 9 non-null int64 \n", + " 6 Salary 9 non-null int64 \n", + "dtypes: int64(2), object(5)\n", + "memory usage: 632.0+ bytes\n" + ] + } + ], + "source": [ + "# your answer here\n", + "employee.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearsSalary
count9.0000009.000000
mean4.11111148.888889
std2.80376716.541194
min1.00000030.000000
25%2.00000035.000000
50%3.00000055.000000
75%7.00000060.000000
max8.00000070.000000
\n", + "
" + ], + "text/plain": [ + " Years Salary\n", + "count 9.000000 9.000000\n", + "mean 4.111111 48.888889\n", + "std 2.803767 16.541194\n", + "min 1.000000 30.000000\n", + "25% 2.000000 35.000000\n", + "50% 3.000000 55.000000\n", + "75% 7.000000 60.000000\n", + "max 8.000000 70.000000" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "employee.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# your answer here\n" - ] + "source": [] }, { "cell_type": "markdown", @@ -239,11 +698,38 @@ }, { "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[,\n", + " ]],\n", + " dtype=object)" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEICAYAAABRSj9aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAdBElEQVR4nO3dfZRcdZ3n8ffHEB4MCmiclglIcM2ssmQFJgu4uBpEMbAMmT3r7IRFBY4eZh1UdKIrsmdlRM8MjqKuiihqBBVBB1CzEIWskEFHQQgiAQISkZHEKEggEGDQxs/+cW+w6NTDrerqquqbz+ucOl11H799H759+1f3d7+yTURE1Nczhh1ARERMrST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuinyYk3SPp1cOOIyKmnyT6AZP0ckk/kLRZ0iZJ/yzpPww7roipJukrkr44YdgrJT0gac9hxbU9SKIfIEnPBi4HPgk8B5gDvB94YgrXucNULTuiS6cCR0l6DYCknYHPAUttb5zswnOst5ZEP1h/AmD7IttP2n7c9lW2b5H0byRdXV7d/EbShZJ2b7YQSQdL+qGkhyRtlPQpSTs2jLekUyTdBdwl6RxJZ09YxnJJ75zS3zaige0HgLcB50maBZwB/Ay4o/wv9yFJP5G0cOs8kk6StFbSI5LulvRXDeMWSlov6T2SfgV8UdJsSZeXy9ok6XuStvs8t91vgAH7KfCkpAskHSVpj4ZxAv4e+GPgJcDewN+2WM6TwDuB2cDLgCOAv54wzZ8DhwD7ARcAx2094CXNBl4NfLUPv1NEZbb/EbgJuAg4GfgfwBXAByn+y30XcKmk55Wz3AccAzwbOAn4mKSDGhb5/HK+fcrlLQXWA88DxoDTge3+OS9J9ANk+2Hg5RQH3ueA+8sr6zHb62yvtP2E7fuBjwKvbLGc1bavsz1u+x7gs02m/Xvbm8r/Gn4EbKb4gwCwBFhl+9f9/y0jOvpr4FXAmRTH4grbK2z/3vZK4EbgaADbV9j+mQv/BFwF/KeGZf0eOKM8bx4HfgfsCexj+3e2v+c80CuJftBsr7V9ou29gP0pruA/LmlM0sWSNkh6GPgKxRX7NiT9Sfnv6a/Kaf+uybT3Tvh8AfD68v3rgS/363eK6EZ5gfEb4DaKK/G/KJtaHpL0EMXF0J4A5X++15XNMA9R/AFoPNbvt/2vDZ8/DKwDriqbek4bxO806pLoh8j2HcD5FAn/7yiu9OfbfjZFMlaLWc8F7gDmldOe3mTaiVcxXwEWS3opRdPQN/vxO0RM0r3Al23v3vCaZfssSTsBlwIfAcZs7w6s4OnH+tOOc9uP2F5q+4XAscDfSDqC7VwS/QBJerGkpZL2Kj/vDRwHXAc8C9gCbJY0B3h3m0U9C3gY2CLpxcBbOq3b9nrgBoor+UvLf3Mjhu0rwJ9Jeq2kGZJ2Lr9k3QvYEdgJuB8Yl3QUcGS7hUk6RtKLJImiufJJiuad7VoS/WA9QvEF6fWSHqVI8LdSfIH0fuAgioPzCuCyNst5F/Dfy+V9DvhaxfVfAMwnzTYxImzfCyym+K/0foor/HcDz7D9CPB24OvAgxTH/PIOi5wH/D+Ki6YfAp+2fc3URD99KN9TbD8kvYLiCmqffEEVsf3IFf12QtJMig4rn0+Sj9i+JNFvByS9BHiI4k6Gjw85nIgYsDTdRETUXK7oIyJqbiQfAjR79mzPnTt32GE09eijjzJr1qxhhzF0o74dVq9e/Rvbz+s85WgYxDE/qvsscXWnVVztjvmRTPRz587lxhtvHHYYTa1atYqFCxcOO4yhG/XtIOlfhh1DNwZxzI/qPktc3WkVV7tjPk03ERE1l0QfEVFzSfQRETWXRB8RUXNJ9BERNZdEHxFRcx0TvaS9JV0j6XZJt0k6tck0kvQJSesk3dJY6kvSCZLuKl8n9PsXiBik8jG6Pyprm94m6f1NptlJ0tfK8+F6SXMHH2nEH1S5j36cokr7TZKeBayWtNL27Q3THEXxeNB5FI/hPRc4RNJzKAoAL6AoELBa0nLbD/b1t4gYnCeAV9neUj4o7vuSvm37uoZp3gQ8aPtFkpYAHwL+chjBRkCFK3rbG23fVL5/BFgLzJkw2WLgS2Vdx+uA3SXtCbwWWFnWLn0QWAks6utvEDFA5TG+pfw4s3xNfGDUYopn/wNcAhxRFsKIGIquesaW/4IeCFw/YdQcnl6jdH05rNXwZss+maKKO2NjY6xatWqbadZs2NxNuJMyf85uTYdv2bKlaWzTQS/bb1Dbodd92yq+qSRpBrAaeBFwju2W54PtcUmbgedS1EltXE7HY75Xzbbn2C7wyQu/1XKeYWxLGN1zqk5xVU70knalqN/4DtsPdxdaZ7bPA84DWLBggZt18T3xtCv6vdqW7jl+2/XD6HaLrqKX7Teo7dDrvm0V31Sy/SRwgKTdgW9I2t/2rT0sp+Mx36tm23Pp/HHOXtP6lB/GtoTRPafqFFelu27KtshLgQttNytxtwHYu+HzXuWwVsMjpj3bDwHXsG1z5FPHvaQdgN2ABwYbXcQfVLnrRsAXgLW2P9pisuXAG8u7bw4FNtveCFwJHClpD0l7UBT2vbJPsUcMnKTnlVfySNoFeA1wx4TJlgNb7zB7HXB1qnrFMFVpujkMeAOwRtLN5bDTgRcA2P4MsAI4GlgHPAacVI7bJOkDwA3lfGfa3tS/8CMGbk/ggrKd/hnA121fLulM4EbbyykujL4saR2wCVgyvHAjKiR6298H2t4xUF6tnNJi3DJgWU/RRYwY27dQ3JAwcfj7Gt7/K/AXg4wrop30jI2IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIia61h4RNIy4BjgPtv7Nxn/buD4huW9BHheWV3qHuAR4Elg3PaCfgUeERHVVLmiP59tix8/xfaHbR9g+wDgvcA/TSgXeHg5Pkk+ImIIOiZ629dS1L2s4jjgoklFFBERfdW3NnpJz6S48r+0YbCBqyStlnRyv9YVERHVdWyj78KfAf88odnm5bY3SPojYKWkO8r/ELZR/iE4GWBsbIxVq1ZtM83S+eN9DLe9ZusH2LJlS8txo66X7Teo7dDrvp2u+yJikPqZ6JcwodnG9oby532SvgEcDDRN9LbPA84DWLBggRcuXLjNNCeedkUfw23vnuO3XT8UiaVZbNNBL9tvUNuh133bKr6I+IO+NN1I2g14JfCthmGzJD1r63vgSODWfqwvIiKqq3J75UXAQmC2pPXAGcBMANufKSf7L8BVth9tmHUM+Iakrev5qu3v9C/0iIioomOit31chWnOp7gNs3HY3cBLew0sIiL6Iz1jI7ogaW9J10i6XdJtkk5tMs1CSZsl3Vy+3jeMWCO26ueXsRHbg3Fgqe2byu+gVktaafv2CdN9z/YxQ4gvYhu5oo/ogu2Ntm8q3z8CrAXmDDeqiPZyRR/RI0lzgQOB65uMfpmknwC/BN5l+7Ym83fsO9KrZv0SxnZp319hWH0SRrVvSp3iSqKP6IGkXSl6gb/D9sMTRt8E7GN7i6SjgW8C8yYuo0rfkV4165ewdP44Z69pfcoPq0/CqPZNqVNcabqJ6JKkmRRJ/kLbl00cb/th21vK9yuAmZJmDzjMiKck0Ud0QUXHkC8Aa21/tMU0zy+nQ9LBFOfZA4OLMuLp0nQT0Z3DgDcAayTdXA47HXgBPNWJ8HXAWySNA48DS2x7GMFGQBJ9RFdsfx9Qh2k+BXxqMBFFdJamm4iImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouY6JXtIySfdJalrvtV2RBUmLJN0paZ2k0/oZeEREVFPliv58YFGHab5n+4DydSaApBnAOcBRwH7AcZL2m0ywERHRvY6J3va1wKYeln0wsM723bZ/C1wMLO5hORERMQn9etZNsyILc4B7G6ZZDxzSagFVijC0K5rQb60e7D+qxQiq6GX7DWo79Lpvp+u+iBikfiT6SkUWOqlShKFZMYWp0qoIw6gWI6iil+03qO3Q674dVrGMiOlk0nfdtCmysAHYu2HSvcphERExQJNO9G2KLNwAzJO0r6QdgSXA8smuLyIiutOx6UbSRcBCYLak9cAZwEzoWGRhXNJbgSuBGcCyZgWSIyJianVM9LaP6zC+ZZGFsilnRW+hRUREP6RnbEREzSXRR0TUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FwSfUREzSXRR0TUXBJ9RETNJdFHdEHS3pKukXS7pNskndpkGkn6RFlZ7RZJBw0j1oit+vU8+ojtxTiw1PZNkp4FrJa00vbtDdMcRfGo7nkUNRjOpU0thoipliv6iC7Y3mj7pvL9I8BaiiI7jRYDX3LhOmB3SXsOONSIp+SKPqJHkuYCBwLXTxjVrLraHGDjhPk7VlXrVbOKXWO7tK/kNaxqXaNatW0Qca3ZsLnrefbdbUbXcSXRR/RA0q7ApcA7bD/cyzKqVFXrVbOKXUvnj3P2mtan/LCqdY1q1bZBxNVLZbXzF83qOq403UR0SdJMiiR/oe3LmkyS6moxUpLoI7pQVlP7ArDW9kdbTLYceGN5982hwGbbG1tMGzHl0nQT0Z3DgDcAayTdXA47HXgBPFV1bQVwNLAOeAw4aQhxRjylSinBZcAxwH22928y/njgPYCAR4C32P5JOe6ectiTwLjtBf0LPWLwbH+f4lhvN42BUwYTUURnVZpuzgcWtRn/c+CVtucDH6D8cqnB4bYPSJKPiBiOKjVjry1vI2s1/gcNH6+j+OIpIiJGRL/b6N8EfLvhs4GrJBn4bHk7WVNV7iludw9wv7W6T3VU7/mtopftN6jt0Ou+na77ImKQ+pboJR1Okehf3jD45bY3SPojYKWkO2xf22z+KvcU93LPaa9a3VM8qvf8VtHL9hvUduh13w7r3u+I6aQvt1dK+vfA54HFth/YOtz2hvLnfcA3gIP7sb6IiKhu0ole0guAy4A32P5pw/BZ5UOfkDQLOBK4dbLri4iI7lS5vfIiYCEwW9J64AxgJjx1z/D7gOcCny76kjx1G+UY8I1y2A7AV21/Zwp+h4iIaKPKXTfHdRj/ZuDNTYbfDby099AiIqIf8giEiIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImquU6CUtk3SfpKY1X1X4hKR1km6RdFDDuBMk3VW+TuhX4BHDUOFcWChps6Sby9f7Bh1jxERVr+jPBxa1GX8UMK98nQycCyDpORQ1Zg8BDgbOkLRHr8FGjIDzaX8uAHzP9gHl68wBxBTRVqVEb/taYFObSRYDX3LhOmB3SXsCrwVW2t5k+0FgJZ1PkoiRVeFciBg5sl1tQmkucLnt/ZuMuxw4y/b3y8/fBd4DLAR2tv3Bcvj/Bh63/ZEmyziZ4r8BxsbG/vTiiy/eJoY1GzZXirUf5s/ZrenwLVu2sOuuuw4sjn7q5/Yb2wV+/Xjzca22XTu9xtZqXYcffvhq2wt6WmgHHc6FhcClwHrgl8C7bN/WYjkdj/leNdue7fYZ9Lbf+mFUz6lBxNXLcb/vbjOaxtXumN+h+9Cmhu3zgPMAFixY4IULF24zzYmnXTGweO45ftv1A6xatYpmsU0H/dx+S+ePc/aa5odPq23XTq+x9bKuKXYTsI/tLZKOBr5J0aS5jSrHfK+abc92+wyGty1H9ZwaRFy9HPfnL5rVdVz9uutmA7B3w+e9ymGthkfUku2HbW8p368AZkqaPeSwYjvXr0S/HHhjeffNocBm2xuBK4EjJe1Rfgl7ZDksopYkPV+SyvcHU5xjDww3qtjeVWq6kXQRRXv7bEnrKe6kmQlg+zPACuBoYB3wGHBSOW6TpA8AN5SLOtN2vsiKaavCufA64C2SxoHHgSWu+kVYxBSplOhtH9dhvIFTWoxbBizrPrSI0VPhXPgU8KkBhRNRSXrGRkTUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FwSfUREzSXRR0TUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FwSfUREzSXRR0TUXBJ9RETNJdFHRNRcEn1ERM0l0UdE1FylRC9pkaQ7Ja2TdFqT8R+TdHP5+qmkhxrGPdkwbnk/g4+IiM46lhKUNAM4B3gNsB64QdJy27dvncb2OxumfxtwYMMiHrd9QP9CjoiIblS5oj8YWGf7btu/BS4GFreZ/jjgon4EFxERk1elOPgc4N6Gz+uBQ5pNKGkfYF/g6obBO0u6ERgHzrL9zRbzngycDDA2NsaqVau2mWbp/PEK4fZHs/UDbNmypeW4UdfP7Te2S+vl9bJ9eo1tuu6LiEGqkui7sQS4xPaTDcP2sb1B0guBqyWtsf2ziTPaPg84D2DBggVeuHDhNgs/8bQr+hxua/ccv+36oUgszWKbDvq5/ZbOH+fsNc0Pn1bbrp1eY+tlXRHbmypNNxuAvRs+71UOa2YJE5ptbG8of94NrOLp7fcRETHFqiT6G4B5kvaVtCNFMt/m7hlJLwb2AH7YMGwPSTuV72cDhwG3T5w3IiKmTsemG9vjkt4KXAnMAJbZvk3SmcCNtrcm/SXAxbbdMPtLgM9K+j3FH5WzGu/WiYiIqVepjd72CmDFhGHvm/D5b5vM9wNg/iTii4iISUrP2IguSFom6T5Jt7YYL0mfKDsX3iLpoEHHGDFREn1Ed84HFrUZfxQwr3ydDJw7gJgi2kqij+iC7WuBTW0mWQx8yYXrgN0l7TmY6CKa6/d99BHbu2YdDOcAGydOWKWT4JoNm3sKYmmTb8badXID+OSF3+p6PfPn7Nb1PBN/p7Fdqq27l3VNRjedI/u5nzrppdNmEn3EkAy6k2C7Tm696kfnuKpxDbpzXDedIwfZmfP8RbO67rSZppuI/uqmg2HEQCTRR/TXcuCN5d03hwKbbW/TbBMxSGm6ieiCpIuAhcBsSeuBM4CZALY/Q9Hf5GhgHfAYcNJwIo34gyT6iC7YPq7DeAOnDCiciErSdBMRUXNJ9BERNZdEHxFRc0n0ERE1l0QfEVFzSfQRETWXRB8RUXNJ9BERNVcp0UtaJOnOspjCaU3Gnyjpfkk3l683N4w7QdJd5euEfgYfERGddewZK2kGcA7wGopHrt4gaXmT2q9fs/3WCfM+h6KL+ALAwOpy3gf7En1ERHRU5Yr+YGCd7btt/xa4mKK4QhWvBVba3lQm95W0r84TERF9VuVZN80KKRzSZLr/KukVwE+Bd9q+t8W8c5qtpEoRhnZFE/qt1YP9e3no/6jo5/ZrV8Sil+3Ta2zTdV9EDFK/Hmr2f4GLbD8h6a+AC4BXdbOAQRdh6KRVkYNuihGMmkEVsehHMYqqBl2MImI6qtJ007GQgu0HbD9Rfvw88KdV542IiKlVJdHfAMyTtK+kHYElFMUVnjKh+PGxwNry/ZXAkZL2kLQHcGQ5LCIiBqRj043tcUlvpUjQM4Bltm+TdCZwo+3lwNslHQuMA5uAE8t5N0n6AMUfC4AzbW+agt8jIiJaqNRGb3sFReWcxmHva3j/XuC9LeZdBiybRIwRETEJ6RkbEVFzSfQRETWXRB8RUXNJ9BERNZdEHxFRc0n0ERE1l0QfEVFzSfQRETWXRB/RpckU4okYhn49vTJiuzCZQjwRw5Ir+ojuTKYQT8RQ5Io+ojuTKcTzNIMuttOuWEyv+lFkpmpcgy4y002RoUEWReql+FESfUT/VSrEM+hiO+2KxfSqH0VmqsY16CIz3RQZGmRRpPMXzeq6+FGabiK6M5lCPBFDkUQf0Z3JFOKJGIo03UR0YTKFeCKGJYk+okuTKcQTMQyVmm4qdBD5G0m3S7pF0ncl7dMw7smGjiPLJ84bERFTq+MVfcUOIj8GFth+TNJbgH8A/rIc97jtA/ocd0REVFTlir5jBxHb19h+rPx4HcWdCBERMQKqtNFX7SCy1ZuAbzd83lnSjRRfTJ1l+5vNZhp055FOWnVI6KWzwqgYVOebfnSiqWq67ouIQerrl7GSXg8sAF7ZMHgf2xskvRC4WtIa2z+bOO+gO4900qpzRjedKEbNoDrf9KMTTVWD7kQTMR1Vabrp2EEEQNKrgf8FHNvQWQTbG8qfdwOrgAMnEW9ERHSpSqKv0kHkQOCzFEn+vobhe0jaqXw/GzgMmPiUv4iImEIdm24qdhD5MLAr8I+SAH5h+1jgJcBnJf2e4o/KWU0e5xoREVOoUht9hQ4ir24x3w+A+ZMJMCIiJifPuomIqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5JPqIiJpLoo+IqLkk+oiImkuij4iouST6iIiaS6KPiKi5Sole0iJJd0paJ+m0JuN3kvS1cvz1kuY2jHtvOfxOSa/tX+gRwzGZ8yFiGDomekkzgHOAo4D9gOMk7TdhsjcBD9p+EfAx4EPlvPtRFBP/d8Ai4NPl8iKmpcmcDxHDUuWK/mBgne27bf8WuBhYPGGaxcAF5ftLgCNUVAlfDFxs+wnbPwfWlcuLmK4mcz5EDEWV4uBzgHsbPq8HDmk1je1xSZuB55bDr5sw75xmK5F0MnBy+XGLpDsrxDZl1PoabDbwm8FFMpre3mY7tNl2fddmXftM0Soncz48bXsN+phvt8961Y99XTWuQR5XpZE81w//UMu4Wh7zVRL9QNg+Dzhv2HF0IulG2wuGHcewZTtM3qCP+VHdZ4mrO73EVaXpZgOwd8PnvcphTaeRtAOwG/BAxXkjppPJnA8RQ1El0d8AzJO0r6QdKb5cXT5hmuXACeX71wFX23Y5fEl5F8K+wDzgR/0JPWIoJnM+RAxFx6abso3xrcCVwAxgme3bJJ0J3Gh7OfAF4MuS1gGbKA5+yum+DtwOjAOn2H5yin6XQRn55qUB2S63w2TOhxEwqvsscXWn67iUC42IiHpLz9iIiJpLoo+IqLkk+g4kzZD0Y0mXl5/3Lbu1ryu7ue847BgHQdLuki6RdIektZJeJuk5klZKuqv8ucew44xtSdpb0jWSbpd0m6RThx3TVhPPr1HR7HgfgZjeWe6/WyVdJGnnqvMm0Xd2KrC24fOHgI+V3dsfpOjuvj34P8B3bL8YeCnFNjkN+K7tecB3y88xesaBpbb3Aw4FTmny2IZhmXh+jYpmx/vQSJoDvB1YYHt/ihsBKn/Jn0TfhqS9gP8MfL78LOBVFN3aoejm/ufDiW5wJO0GvILibhJs/9b2Qzy9q/92sS2mI9sbbd9Uvn+EImk17aE+SBPPr1HR5ngfth2AXcq+Gc8Efll1xiT69j4O/E/g9+Xn5wIP2R4vP7d8pEPN7AvcD3yx/Df785JmAWO2N5bT/AoYG1qEUUn5JM0DgeuHGwmw7fk1Klod70NjewPwEeAXwEZgs+2rqs6fRN+CpGOA+2yvHnYsI2AH4CDgXNsHAo8yoZmm7BCUe3VHmKRdgUuBd9h+eMixjPL51fF4H7Ty+6/FFH+E/hiYJen1VedPom/tMOBYSfdQPKHwVRTtdruX/zrB9vNIh/XAettbrwIvoTgRfi1pT4Dy531Dii86kDSTIslfaPuyYcdDk/NL0leGG9JTWh3vw/Rq4Oe277f9O+Ay4D9WnTmJvgXb77W9l+25FF96XG37eOAaim7tUHRz/9aQQhwY278C7pX0b8tBR1D0dm7s6r9dbIvpqPxu6QvAWtsfHXY80PL8qnyFOpXaHO/D9AvgUEnPLPfnEXTxBfHIPL1yGnkPcLGkDwI/pvzCZjvwNuDC8nbSu4GTKC4Uvi7pTcC/AP9tiPFFa4cBbwDWSLq5HHa67RVDjGnUNTveh8b29ZIuAW6iuIvqx3TxKIQ8AiEioubSdBMRUXNJ9BERNZdEHxFRc0n0ERE1l0QfEVFzSfQRETWXRB8RUXP/H03vhiiqg486AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# your answer here\n", + "freqd = employee.hist()\n", + "freqd" ] }, { @@ -255,11 +741,23 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 44, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "48.888888888888886" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# your answer here" + "# your answer here\n", + "employee.Salary.mean()" ] }, { @@ -271,11 +769,23 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 45, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "70" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# your answer here" + "# your answer here\n", + "employee.Salary.max()" ] }, { @@ -287,11 +797,23 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 46, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "30" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# your answer here\n" + "# your answer here\n", + "employee.Salary.min()" ] }, { @@ -303,11 +825,79 @@ }, { "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
1MariaITMasterFanalyst230
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "1 Maria IT Master F analyst 2 30\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "lowest = employee[employee['Salary'] == employee['Salary'].min()]\n", + "lowest" ] }, { @@ -319,11 +909,68 @@ }, { "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here\n" + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "\n", + "employee[employee['Name'] == 'David']" ] }, { @@ -335,11 +982,24 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 52, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "2 30\n", + "Name: Salary, dtype: int64" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# your answer here\n" + "# your answer here\n", + "employee[employee['Name'] == 'David']['Salary']" ] }, { @@ -351,11 +1011,89 @@ }, { "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
4SamuelSalesMasterMassociate355
5EvaSalesBachelorFassociate255
7PedroITPhdMassociate760
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "4 Samuel Sales Master M associate 3 55\n", + "5 Eva Sales Bachelor F associate 2 55\n", + "7 Pedro IT Phd M associate 7 60" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "employee[employee['Title'] == 'associate']" ] }, { @@ -369,20 +1107,176 @@ }, { "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here- 1 method\n" + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
0JoseITBachelorManalyst135
1MariaITMasterFanalyst230
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "0 Jose IT Bachelor M analyst 1 35\n", + "1 Maria IT Master F analyst 2 30\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here- 1 method\n", + "employee.head(3)" ] }, { "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here- 2nd method\n" + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
0JoseITBachelorManalyst135
1MariaITMasterFanalyst230
2DavidHRMasterManalyst230
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "0 Jose IT Bachelor M analyst 1 35\n", + "1 Maria IT Master F analyst 2 30\n", + "2 David HR Master M analyst 2 30" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here- 2nd method\n", + "employee.iloc[:3:]" ] }, { @@ -394,9 +1288,64 @@ }, { "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameDepartmentEducationGenderTitleYearsSalary
7PedroITPhdMassociate760
\n", + "
" + ], + "text/plain": [ + " Name Department Education Gender Title Years Salary\n", + "7 Pedro IT Phd M associate 7 60" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# your answer here\n" ] @@ -410,11 +1359,89 @@ }, { "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearsAvrg Salary
0135.000000
1238.333333
2355.000000
3435.000000
4760.000000
5870.000000
\n", + "
" + ], + "text/plain": [ + " Years Avrg Salary\n", + "0 1 35.000000\n", + "1 2 38.333333\n", + "2 3 55.000000\n", + "3 4 35.000000\n", + "4 7 60.000000\n", + "5 8 70.000000" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "empl = employee.groupby('Years', as_index=False).agg({'Salary': np.mean})\n", + "empl.columns = ['Years', 'Avrg Salary']\n", + "empl\n" ] }, { @@ -426,11 +1453,71 @@ }, { "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TitleAvrg Salary
0VP70.000000
1analyst32.500000
2associate56.666667
\n", + "
" + ], + "text/plain": [ + " Title Avrg Salary\n", + "0 VP 70.000000\n", + "1 analyst 32.500000\n", + "2 associate 56.666667" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "sal = employee.groupby('Title', as_index=False).agg({'Salary': np.mean})\n", + "sal.columns = ['Title', 'Avrg Salary']\n", + "sal" ] }, { @@ -444,29 +1531,81 @@ }, { "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "# draw boxplot here" + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAQYklEQVR4nO3df4xlZX3H8fdHFgJZkR9ib7ZgXRIpirUgTqnW/pgVMSimS1KL2MZuDcm2jTU0bVK3NbGltc0Skypt9Y9V1G39BSIE6ppVsu5ta9Ogu4rKDxVEiOCyK5a1zGoV9Ns/5qyMszM7987MnZln9/1Kbu45z3nOnO/cffLZk2fOuSdVhSSpPU9Z7gIkSfNjgEtSowxwSWqUAS5JjTLAJalRq5byYKeddlqtXbt2KQ95RDtw4ACrV69e7jKkQzg2F9fu3bsfqapnTG9f0gBfu3Ytu3btWspDHtH6/T7j4+PLXYZ0CMfm4krywEztTqFIUqMMcElqlAEuSY0ywCWpUQa4JDVqzgBPcnaS26e8/jfJnyQ5NcmtSe7p3k9ZioIlSZPmDPCq+mpVnVdV5wEvBL4H3ARsAnZU1VnAjm5dkrREhp1CuRD4elU9AKwHtnbtW4FLF7MwSdLhDXsjz+XAh7vlXlXt6ZYfBnoz7ZBkI7ARoNfr0e/351GmZjIxMeHnqWW1bt26offZuXPnCCo5OmXQBzokOQ74FvC8qtqbZH9VnTxl+6NVddh58LGxsfJOzMXj3W5aqdZu2sb9my9Z7jKOGEl2V9XY9PZhplBeAXy+qvZ263uTrOl++Bpg38LLlCQNapgAfy1PTp8A3AJs6JY3ADcvVlGSpLkNFOBJVgMXATdOad4MXJTkHuBl3bokaYkM9EfMqjoAPH1a23eYvCpFkrQMvBNTkhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNGvSp9CcnuSHJV5LcneTFSU5NcmuSe7r3U0ZdrCTpSYOegV8DbK+q5wDnAncDm4AdVXUWsKNblyQtkTkDPMlJwK8D1wJU1Q+raj+wHtjaddsKXDqqIiVJh1o1QJ8zgW8D70tyLrAbuBLoVdWers/DQG+mnZNsBDYC9Ho9+v3+QmtWZ2Jiws9TI/eGHQc48Pjw+63dtG3gvquPhXdeuHr4gxzlBgnwVcD5wBur6rYk1zBtuqSqKknNtHNVbQG2AIyNjdX4+PjCKtZP9Pt9/Dw1age2b+P+zZcMtc+wY3Ptpm2O5XkYZA78QeDBqrqtW7+ByUDfm2QNQPe+bzQlSpJmMmeAV9XDwDeTnN01XQjcBdwCbOjaNgA3j6RCSdKMBplCAXgj8MEkxwH3Aa9nMvyvT3IF8ABw2WhKlCTNZKAAr6rbgbEZNl24uOVIkgblnZiS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRg16J6ako9SJz93E87fO4+v+t87d5cljAAz3hVkywCXN4bG7Ny/JtxFqeE6hSFKjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowa6EzPJ/cBjwI+AJ6pqLMmpwHXAWuB+4LKqenQ0ZUqSphvmDHxdVZ1XVQcfbrwJ2FFVZwE7unVJ0hJZyBTKep78upqtwKULL0eSNKhBA7yATyXZnWRj19arqj3d8sNAb9GrkyTNatBvI/zVqnooyc8Atyb5ytSNVVVJaqYdu8DfCNDr9ej3+wupV1NMTEz4eWpJDDvO5jM2HcvDGyjAq+qh7n1fkpuAC4C9SdZU1Z4ka4B9s+y7BdgCMDY2VsN8xaQOb9iv7JTmZfu2ocfZ0GNzHsfQAFMoSVYnOfHgMvBy4A7gFmBD120DcPOoipQkHWqQM/AecFOSg/0/VFXbk3wOuD7JFcADwGWjK1OSNN2cAV5V9wHnztD+HeDCURQlSZqbd2JKUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRAwd4kmOSfCHJx7v1M5PcluTeJNclOW50ZUqSphvmDPxK4O4p61cDb6+qZwOPAlcsZmGSpMMbKMCTnAFcArynWw/wUuCGrstW4NJRFChJmtmqAfu9A/hz4MRu/enA/qp6olt/EDh9ph2TbAQ2AvR6Pfr9/ryL1U+bmJjw89SSGHaczWdsOpaHN2eAJ3kVsK+qdicZH/YAVbUF2AIwNjZW4+ND/wjNot/v4+epkdu+behxNvTYnMcxNNgZ+EuA30zySuB44GnANcDJSVZ1Z+FnAA+NrkxJ0nRzzoFX1V9U1RlVtRa4HPh0Vf0usBN4dddtA3DzyKqUJB1iIdeBvwn40yT3Mjknfu3ilCRJGsSgf8QEoKr6QL9bvg+4YPFLkiQNwjsxJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUUPdiSnp6LR207bhd9o++D4nnXDs8D9fBrikw7t/8yVD77N207Z57afhOIUiSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNmjPAkxyf5LNJvpjkziRXde1nJrktyb1Jrkty3OjLlSQdNMgZ+A+Al1bVucB5wMVJXgRcDby9qp4NPApcMboyJUnTzRngNWmiWz22exXwUuCGrn0rcOlIKpQkzWigL7NKcgywG3g28E7g68D+qnqi6/IgcPos+24ENgL0ej36/f4CS9ZBExMTfp5aVuvWrZt1W66euX3nzp0jquboM1CAV9WPgPOSnAzcBDxn0ANU1RZgC8DY2FiNj4/Po0zNpN/v4+ep5VRVM7Y7NpfGUFehVNV+YCfwYuDkJAf/AzgDeGiRa5MkHcYgV6E8ozvzJskJwEXA3UwG+au7bhuAm0dVpCTpUINMoawBtnbz4E8Brq+qjye5C/hIkrcCXwCuHWGdkqRp5gzwqvoS8IIZ2u8DLhhFUZKkuXknpiQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktSoQZ5K/8wkO5PcleTOJFd27acmuTXJPd37KaMvV5J00CBn4E8Af1ZV5wAvAt6Q5BxgE7Cjqs4CdnTrkqQlMmeAV9Weqvp8t/wYcDdwOrAe2Np12wpcOqoiJUmHWjVM5yRrgRcAtwG9qtrTbXoY6M2yz0ZgI0Cv16Pf78+zVE03MTHh56kVybG5NFJVg3VMngr8O/B3VXVjkv1VdfKU7Y9W1WHnwcfGxmrXrl0LKlhP6vf7jI+PL3cZ0iEcm4srye6qGpvePtBVKEmOBT4GfLCqbuya9yZZ021fA+xbrGIlSXMb5CqUANcCd1fVP0zZdAuwoVveANy8+OVJkmYzyBz4S4DXAV9OcnvX9pfAZuD6JFcADwCXjaZESdJM5gzwqvoMkFk2X7i45UiSBuWdmJLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJatScAZ7kvUn2JbljStupSW5Nck/3fspoy5QkTTfIGfj7gYuntW0CdlTVWcCObl2StITmDPCq+g/gf6Y1rwe2dstbgUsXuS5J0hxWzXO/XlXt6ZYfBnqzdUyyEdgI0Ov16Pf78zykppuYmPDz1Irk2Fwa8w3wn6iqSlKH2b4F2AIwNjZW4+PjCz2kOv1+Hz9PrUSOzaUx36tQ9iZZA9C971u8kiRJg5hvgN8CbOiWNwA3L045kqRBDXIZ4YeB/wbOTvJgkiuAzcBFSe4BXtatS5KW0Jxz4FX12lk2XbjItUiShrDgP2JqtJLMa7+qWf+uLOkI4a30K1xVzfp61ps+Pus2SUc+A1ySGuUUygpx7lWf4rvff3zo/dZu2jZw35NOOJYv/tXLhz6GpJXJAF8hvvv9x7l/8yVD7TPszRLDhL2klc8pFElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapR3Yq4QJz53E8/fumn4HbfO3eXJYwAMd7enpJXLAF8hHrt7s7fSSxqKUyiS1CjPwFeQeZ0hbx/u2wglHTkM8BVi2OkTmAz8+ewn6cjgFIokNWpBZ+BJLgauAY4B3lNVPp1+kc31TMxcPXO7j1WTjnzzPgNPcgzwTuAVwDnAa5Ocs1iFadLhnom5c+dOn4kpHcUWMoVyAXBvVd1XVT8EPgKsX5yyJElzWcgUyunAN6esPwj88vROSTYCGwF6vR79fn8Bh9RUExMTfp5akRybS2PkV6FU1RZgC8DY2FgNc+OJDm/YG3mkpeLYXBoLmUJ5CHjmlPUzujZJ0hJYSIB/DjgryZlJjgMuB25ZnLIkSXOZ9xRKVT2R5I+BTzJ5GeF7q+rORatMknRYC5oDr6pPAJ9YpFokSUPwTkxJalSW8qaPJN8GHliyAx75TgMeWe4ipBk4NhfXs6rqGdMblzTAtbiS7KqqseWuQ5rOsbk0nEKRpEYZ4JLUKAO8bVuWuwBpFo7NJeAcuCQ1yjNwSWqUAS5JjTLAV4BM+kySV0xp++0k25ezLinJm5PcmeRLSW5PcshXRk/p+/4kr17K+o52PtR4BaiqSvKHwEeT7GTy3+XvgYvn8/OSrKqqJxazRh19krwYeBVwflX9IMlpwHGL+PMdpwvkGfgKUVV3AP8GvAl4C/AB4M1JPpvkC0nWAyRZm+Q/k3y+e/1K1z7etd8C3JVkdZJtSb6Y5I4kr1mu303NWgM8UlU/AKiqR6rqW0nekuRz3bjakhke3DpbnyT9JO9IsovJ8f2NJMd22542dV1zM8BXlquA32HyOaPHA5+uqguAdcDbkqwG9gEXVdX5wGuAf5yy//nAlVX180yevX+rqs6tql8AnI7RsD4FPDPJ15K8K8lvdO3/XFW/1I2rE5g8S5/ucH2Oq6qxqroK6AOXdO2XAzdW1eMj+W2OQAb4ClJVB4DrgH8FLgI2JbmdyUF+PPBzwLHAu5N8Gfgokw+UPuizVfWNbvnLwEVJrk7ya1X13SX6NXSEqKoJ4IVMPhLx28B1SX4fWJfktm4MvhR43gy7H67PdVOW3wO8vlt+PfC+xf0tjmzOga88P+5eAX6rqr46dWOSvwb2Aucy+R/w/03ZfODgQlV9Lcn5wCuBtybZUVV/M+LadYSpqh8xeQLR78L4D4BfBMaq6pvdeDx+6j5JjgfedZg+U8fpf3XTguPAMd1UogbkGfjK9UngjVPmDl/QtZ8E7KmqHwOvY/JhGodI8rPA96rqA8DbmJxekQaW5OwkZ01pOg84eELxSJKnAjNddXL8AH2m+hfgQ3j2PTTPwFeuvwXeAXwpyVOAbzA5j/gu4GNJfo/Jee0Ds+z/fCbnzX8MPA780ehL1hHmqcA/JTkZeAK4l8nplP3AHcDDTD5a8adU1f4k7z5cn2k+CLwV+PDilX508FZ6Scuqu3Z8fVW9brlraY1n4JKWTZJ/YvKqq1cudy0t8gxckhrlHzElqVEGuCQ1ygCXpEYZ4JLUKANckhr1/3QqpiraXvY1AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# draw boxplot here\n", + "employee.boxplot()" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 61, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "Years 2.0\n", + "Salary 35.0\n", + "Name: 25%, dtype: float64" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# print first quartile here" + "# print first quartile here\n", + "employee.describe().loc['25%']" ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 62, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "Years 7.0\n", + "Salary 60.0\n", + "Name: 75%, dtype: float64" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "# print third quartile here" + "# print third quartile here\n", + "employee.describe().loc['75%']" ] }, { @@ -478,11 +1617,65 @@ }, { "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
GenderAverage Salary
0F47.5
1M50.0
\n", + "
" + ], + "text/plain": [ + " Gender Average Salary\n", + "0 F 47.5\n", + "1 M 50.0" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "salger = employee.groupby('Gender', as_index=False).agg({'Salary':np.mean})\n", + "salger.columns = ['Gender', 'Average Salary']\n", + "salger" ] }, { @@ -496,11 +1689,108 @@ }, { "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [], - "source": [ - "# your answer here" + "execution_count": 70, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DepartmentSalari.minSalary.meanSalary.maxYears.minYears.meanYears.max
Department
HR3045.007024.6666678
IT3048.757014.5000008
Sales5555.005522.5000003
\n", + "
" + ], + "text/plain": [ + " Department Salari.min Salary.mean Salary.maxYears.min \\\n", + "Department \n", + "HR 30 45.00 70 2 \n", + "IT 30 48.75 70 1 \n", + "Sales 55 55.00 55 2 \n", + "\n", + " Years.mean Years.max \n", + "Department \n", + "HR 4.666667 8 \n", + "IT 4.500000 8 \n", + "Sales 2.500000 3 " + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# your answer here\n", + "stats = employee.groupby('Department').agg({'Salary':[np.min, np.mean, np.max],\n", + " 'Years':[np.min, np.mean, np.max]})\n", + "stats.columns = ['Department',\n", + " 'Salari.min', 'Salary.mean', 'Salary.max'\n", + " 'Years.min', 'Years.mean', 'Years.max']\n", + "stats" ] }, { @@ -881,7 +2171,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.6.9" } }, "nbformat": 4, diff --git a/module-2/python-bi-project/BI-PROJECT/.ipynb_checkpoints/BIPROJECTsromero-checkpoint.ipynb b/module-2/python-bi-project/BI-PROJECT/.ipynb_checkpoints/BIPROJECTsromero-checkpoint.ipynb new file mode 100644 index 0000000..4ce867a --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/.ipynb_checkpoints/BIPROJECTsromero-checkpoint.ipynb @@ -0,0 +1,1759 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np \n", + "import pandas as pd \n", + "import pymongo\n", + "from pymongo import MongoClient\n", + "#Importando librerias" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "#Conectando con Mogo\n", + "client = MongoClient('mongodb://localhost:27017')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "#llamando companies\n", + "db=client.companies\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "#Guardando en curr con querys de filtrado de datos\n", + "curr_ekchuah = db.companies.find ({'$and': [\n", + " {'category_code': 'ecommerce'},\\\n", + " {'offices': {'$not': {'$size':0}}},\\\n", + " {'founded_year': {'$lt':2012}}\n", + " ]},\\\n", + " {'_id':0,'name':1, 'offices':1, 'founded_year':1, 'products':1}).sort('ipo.valuation_amount',1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 464 entries, 0 to 463\n", + "Data columns (total 4 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 464 non-null object\n", + " 1 founded_year 464 non-null int64 \n", + " 2 products 464 non-null object\n", + " 3 offices 464 non-null object\n", + "dtypes: int64(1), object(3)\n", + "memory usage: 14.6+ KB\n" + ] + } + ], + "source": [ + "#convirtiendo en df\n", + "df=pd.DataFrame(curr_ekchuah)\n", + "df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namefounded_yearproductsoffices
0Wize2006[][{'description': '', 'address1': '1110 Burling...
1Zlio2005[{'name': 'Zlio', 'permalink': 'zlio'}][{'description': '', 'address1': '55 rue Serva...
2TheFind2006[{'name': 'TheFind.com', 'permalink': 'thefind...[{'description': '', 'address1': '310 Villa St...
3Zazzle1999[{'name': 'Zazzle', 'permalink': 'zazzle-com'}][{'description': '', 'address1': '1800 Seaport...
4Kaboodle2005[{'name': 'Kaboodle', 'permalink': 'kaboodle'}][{'description': '', 'address1': '640 W. Calif...
...............
459Lift Media2008[][{'description': 'HQ', 'address1': '332 Pine S...
460Nokaut2006[][{'description': 'Grupa Nokaut HQ', 'address1'...
461Shutterfly1999[][{'description': '', 'address1': '2800 Bridge ...
462Higher One2000[][{'description': 'HQ', 'address1': '25 Science...
463Amazon1994[{'name': 'Amazon EC2', 'permalink': 'amazon-e...[{'description': None, 'address1': '1200 12th ...
\n", + "

464 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " name founded_year \\\n", + "0 Wize 2006 \n", + "1 Zlio 2005 \n", + "2 TheFind 2006 \n", + "3 Zazzle 1999 \n", + "4 Kaboodle 2005 \n", + ".. ... ... \n", + "459 Lift Media 2008 \n", + "460 Nokaut 2006 \n", + "461 Shutterfly 1999 \n", + "462 Higher One 2000 \n", + "463 Amazon 1994 \n", + "\n", + " products \\\n", + "0 [] \n", + "1 [{'name': 'Zlio', 'permalink': 'zlio'}] \n", + "2 [{'name': 'TheFind.com', 'permalink': 'thefind... \n", + "3 [{'name': 'Zazzle', 'permalink': 'zazzle-com'}] \n", + "4 [{'name': 'Kaboodle', 'permalink': 'kaboodle'}] \n", + ".. ... \n", + "459 [] \n", + "460 [] \n", + "461 [] \n", + "462 [] \n", + "463 [{'name': 'Amazon EC2', 'permalink': 'amazon-e... \n", + "\n", + " offices \n", + "0 [{'description': '', 'address1': '1110 Burling... \n", + "1 [{'description': '', 'address1': '55 rue Serva... \n", + "2 [{'description': '', 'address1': '310 Villa St... \n", + "3 [{'description': '', 'address1': '1800 Seaport... \n", + "4 [{'description': '', 'address1': '640 W. Calif... \n", + ".. ... \n", + "459 [{'description': 'HQ', 'address1': '332 Pine S... \n", + "460 [{'description': 'Grupa Nokaut HQ', 'address1'... \n", + "461 [{'description': '', 'address1': '2800 Bridge ... \n", + "462 [{'description': 'HQ', 'address1': '25 Science... \n", + "463 [{'description': None, 'address1': '1200 12th ... \n", + "\n", + "[464 rows x 4 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "new_df = pd.DataFrame(index=[], columns=df.columns)\n", + "for _, i in df.iterrows():\n", + " flattened_d = [dict(i.to_dict(), offices=c) for c in i.offices]\n", + " new_df = new_df.append(flattened_d )" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namefounded_yearproductsoffices
0Wize2006[]{'description': '', 'address1': '1110 Burlinga...
0Zlio2005[{'name': 'Zlio', 'permalink': 'zlio'}]{'description': '', 'address1': '55 rue Servan...
0TheFind2006[{'name': 'TheFind.com', 'permalink': 'thefind...{'description': '', 'address1': '310 Villa Str...
0Zazzle1999[{'name': 'Zazzle', 'permalink': 'zazzle-com'}]{'description': '', 'address1': '1800 Seaport'...
0Kaboodle2005[{'name': 'Kaboodle', 'permalink': 'kaboodle'}]{'description': '', 'address1': '640 W. Califo...
...............
0Lift Media2008[]{'description': 'HQ', 'address1': '332 Pine St...
0Nokaut2006[]{'description': 'Grupa Nokaut HQ', 'address1':...
0Shutterfly1999[]{'description': '', 'address1': '2800 Bridge P...
0Higher One2000[]{'description': 'HQ', 'address1': '25 Science ...
0Amazon1994[{'name': 'Amazon EC2', 'permalink': 'amazon-e...{'description': None, 'address1': '1200 12th A...
\n", + "

551 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " name founded_year \\\n", + "0 Wize 2006 \n", + "0 Zlio 2005 \n", + "0 TheFind 2006 \n", + "0 Zazzle 1999 \n", + "0 Kaboodle 2005 \n", + ".. ... ... \n", + "0 Lift Media 2008 \n", + "0 Nokaut 2006 \n", + "0 Shutterfly 1999 \n", + "0 Higher One 2000 \n", + "0 Amazon 1994 \n", + "\n", + " products \\\n", + "0 [] \n", + "0 [{'name': 'Zlio', 'permalink': 'zlio'}] \n", + "0 [{'name': 'TheFind.com', 'permalink': 'thefind... \n", + "0 [{'name': 'Zazzle', 'permalink': 'zazzle-com'}] \n", + "0 [{'name': 'Kaboodle', 'permalink': 'kaboodle'}] \n", + ".. ... \n", + "0 [] \n", + "0 [] \n", + "0 [] \n", + "0 [] \n", + "0 [{'name': 'Amazon EC2', 'permalink': 'amazon-e... \n", + "\n", + " offices \n", + "0 {'description': '', 'address1': '1110 Burlinga... \n", + "0 {'description': '', 'address1': '55 rue Servan... \n", + "0 {'description': '', 'address1': '310 Villa Str... \n", + "0 {'description': '', 'address1': '1800 Seaport'... \n", + "0 {'description': '', 'address1': '640 W. Califo... \n", + ".. ... \n", + "0 {'description': 'HQ', 'address1': '332 Pine St... \n", + "0 {'description': 'Grupa Nokaut HQ', 'address1':... \n", + "0 {'description': '', 'address1': '2800 Bridge P... \n", + "0 {'description': 'HQ', 'address1': '25 Science ... \n", + "0 {'description': None, 'address1': '1200 12th A... \n", + "\n", + "[551 rows x 4 columns]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_df" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "#Preparando las colunmas que tienen datos anidados, offices\n", + "new_df1 = pd.DataFrame(index=[], columns=new_df.columns)\n", + "for _, i in new_df.iterrows():\n", + " flattened_d = [dict(i.to_dict(), products=c) for c in i.products]\n", + " new_df1 = new_df1.append(flattened_d )\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "new_df1 = new_df1.reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 472 entries, 0 to 471\n", + "Data columns (total 5 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 index 472 non-null int64 \n", + " 1 name 472 non-null object\n", + " 2 founded_year 472 non-null object\n", + " 3 products 472 non-null object\n", + " 4 offices 472 non-null object\n", + "dtypes: int64(1), object(4)\n", + "memory usage: 18.6+ KB\n" + ] + } + ], + "source": [ + "new_df1.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/sromero/.local/lib/python3.6/site-packages/ipykernel_launcher.py:4: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n", + " after removing the cwd from sys.path.\n" + ] + } + ], + "source": [ + "from pandas.io.json import json_normalize\n", + "#Desanidando offices con json_normalize\n", + "\n", + "new_df2=json_normalize(new_df1['offices'])" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
descriptionaddress1address2zip_codecitystate_codecountry_codelatitudelongitude
055 rue Servan75011ParisNoneFRA48.8628252.381836
1310 Villa Street94041Mountain ViewCAUSA37.391502-122.073463
21800 Seaport94063Redwood CityCAUSA37.510854-122.201356
3640 W. California Ave.Suite 22094041SunnyvaleCAUSA37.382162-122.036301
4Vast San Francisco560 Sutter St.Suite 30094108San FranciscoCAUSA37.787183-122.397759
..............................
467None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
468None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
469None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
470None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
471None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
\n", + "

472 rows × 9 columns

\n", + "
" + ], + "text/plain": [ + " description address1 address2 zip_code \\\n", + "0 55 rue Servan 75011 \n", + "1 310 Villa Street 94041 \n", + "2 1800 Seaport 94063 \n", + "3 640 W. California Ave. Suite 220 94041 \n", + "4 Vast San Francisco 560 Sutter St. Suite 300 94108 \n", + ".. ... ... ... ... \n", + "467 None 1200 12th Ave S # 1200 98144 \n", + "468 None 1200 12th Ave S # 1200 98144 \n", + "469 None 1200 12th Ave S # 1200 98144 \n", + "470 None 1200 12th Ave S # 1200 98144 \n", + "471 None 1200 12th Ave S # 1200 98144 \n", + "\n", + " city state_code country_code latitude longitude \n", + "0 Paris None FRA 48.862825 2.381836 \n", + "1 Mountain View CA USA 37.391502 -122.073463 \n", + "2 Redwood City CA USA 37.510854 -122.201356 \n", + "3 Sunnyvale CA USA 37.382162 -122.036301 \n", + "4 San Francisco CA USA 37.787183 -122.397759 \n", + ".. ... ... ... ... ... \n", + "467 Seattle WA USA 47.592300 -122.317295 \n", + "468 Seattle WA USA 47.592300 -122.317295 \n", + "469 Seattle WA USA 47.592300 -122.317295 \n", + "470 Seattle WA USA 47.592300 -122.317295 \n", + "471 Seattle WA USA 47.592300 -122.317295 \n", + "\n", + "[472 rows x 9 columns]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_df2\n", + "#En este df ya tenemos la información desanidada de offices" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "new_df2 = new_df2.drop(columns='description')\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "#Juntando los df\n", + "dfmap = pd.concat([new_df1, new_df2], axis=1, sort=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexnamefounded_yearproductsofficesaddress1address2zip_codecitystate_codecountry_codelatitudelongitude
00Zlio2005{'name': 'Zlio', 'permalink': 'zlio'}{'description': '', 'address1': '55 rue Servan...55 rue Servan75011ParisNoneFRA48.8628252.381836
10TheFind2006{'name': 'TheFind.com', 'permalink': 'thefind-...{'description': '', 'address1': '310 Villa Str...310 Villa Street94041Mountain ViewCAUSA37.391502-122.073463
20Zazzle1999{'name': 'Zazzle', 'permalink': 'zazzle-com'}{'description': '', 'address1': '1800 Seaport'...1800 Seaport94063Redwood CityCAUSA37.510854-122.201356
30Kaboodle2005{'name': 'Kaboodle', 'permalink': 'kaboodle'}{'description': '', 'address1': '640 W. Califo...640 W. California Ave.Suite 22094041SunnyvaleCAUSA37.382162-122.036301
40Vast2001{'name': 'Vast', 'permalink': 'vast'}{'description': 'Vast San Francisco', 'address...560 Sutter St.Suite 30094108San FranciscoCAUSA37.787183-122.397759
..........................................
46736Amazon1994{'name': 'Exchange.com', 'permalink': 'exchang...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
46837Amazon1994{'name': 'Bookpages', 'permalink': 'bookpages'}{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
46938Amazon1994{'name': 'Amazon Route 53', 'permalink': 'amaz...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
47039Amazon1994{'name': 'Kindle Fire', 'permalink': 'kindle-f...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
47140Amazon1994{'name': 'Kindle Touch', 'permalink': 'kindle-...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
\n", + "

472 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " index name founded_year \\\n", + "0 0 Zlio 2005 \n", + "1 0 TheFind 2006 \n", + "2 0 Zazzle 1999 \n", + "3 0 Kaboodle 2005 \n", + "4 0 Vast 2001 \n", + ".. ... ... ... \n", + "467 36 Amazon 1994 \n", + "468 37 Amazon 1994 \n", + "469 38 Amazon 1994 \n", + "470 39 Amazon 1994 \n", + "471 40 Amazon 1994 \n", + "\n", + " products \\\n", + "0 {'name': 'Zlio', 'permalink': 'zlio'} \n", + "1 {'name': 'TheFind.com', 'permalink': 'thefind-... \n", + "2 {'name': 'Zazzle', 'permalink': 'zazzle-com'} \n", + "3 {'name': 'Kaboodle', 'permalink': 'kaboodle'} \n", + "4 {'name': 'Vast', 'permalink': 'vast'} \n", + ".. ... \n", + "467 {'name': 'Exchange.com', 'permalink': 'exchang... \n", + "468 {'name': 'Bookpages', 'permalink': 'bookpages'} \n", + "469 {'name': 'Amazon Route 53', 'permalink': 'amaz... \n", + "470 {'name': 'Kindle Fire', 'permalink': 'kindle-f... \n", + "471 {'name': 'Kindle Touch', 'permalink': 'kindle-... \n", + "\n", + " offices \\\n", + "0 {'description': '', 'address1': '55 rue Servan... \n", + "1 {'description': '', 'address1': '310 Villa Str... \n", + "2 {'description': '', 'address1': '1800 Seaport'... \n", + "3 {'description': '', 'address1': '640 W. Califo... \n", + "4 {'description': 'Vast San Francisco', 'address... \n", + ".. ... \n", + "467 {'description': None, 'address1': '1200 12th A... \n", + "468 {'description': None, 'address1': '1200 12th A... \n", + "469 {'description': None, 'address1': '1200 12th A... \n", + "470 {'description': None, 'address1': '1200 12th A... \n", + "471 {'description': None, 'address1': '1200 12th A... \n", + "\n", + " address1 address2 zip_code city state_code \\\n", + "0 55 rue Servan 75011 Paris None \n", + "1 310 Villa Street 94041 Mountain View CA \n", + "2 1800 Seaport 94063 Redwood City CA \n", + "3 640 W. California Ave. Suite 220 94041 Sunnyvale CA \n", + "4 560 Sutter St. Suite 300 94108 San Francisco CA \n", + ".. ... ... ... ... ... \n", + "467 1200 12th Ave S # 1200 98144 Seattle WA \n", + "468 1200 12th Ave S # 1200 98144 Seattle WA \n", + "469 1200 12th Ave S # 1200 98144 Seattle WA \n", + "470 1200 12th Ave S # 1200 98144 Seattle WA \n", + "471 1200 12th Ave S # 1200 98144 Seattle WA \n", + "\n", + " country_code latitude longitude \n", + "0 FRA 48.862825 2.381836 \n", + "1 USA 37.391502 -122.073463 \n", + "2 USA 37.510854 -122.201356 \n", + "3 USA 37.382162 -122.036301 \n", + "4 USA 37.787183 -122.397759 \n", + ".. ... ... ... \n", + "467 USA 47.592300 -122.317295 \n", + "468 USA 47.592300 -122.317295 \n", + "469 USA 47.592300 -122.317295 \n", + "470 USA 47.592300 -122.317295 \n", + "471 USA 47.592300 -122.317295 \n", + "\n", + "[472 rows x 13 columns]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfmap" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "dfmap = dfmap.drop(columns='offices')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "dfmap = dfmap.drop(columns='index')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "dfmap = dfmap.drop(columns='state_code')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namefounded_yearproductsaddress1address2zip_codecitycountry_codelatitudelongitude
0Zlio2005{'name': 'Zlio', 'permalink': 'zlio'}55 rue Servan75011ParisFRA48.8628252.381836
1TheFind2006{'name': 'TheFind.com', 'permalink': 'thefind-...310 Villa Street94041Mountain ViewUSA37.391502-122.073463
2Zazzle1999{'name': 'Zazzle', 'permalink': 'zazzle-com'}1800 Seaport94063Redwood CityUSA37.510854-122.201356
3Kaboodle2005{'name': 'Kaboodle', 'permalink': 'kaboodle'}640 W. California Ave.Suite 22094041SunnyvaleUSA37.382162-122.036301
4Vast2001{'name': 'Vast', 'permalink': 'vast'}560 Sutter St.Suite 30094108San FranciscoUSA37.787183-122.397759
.................................
467Amazon1994{'name': 'Exchange.com', 'permalink': 'exchang...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
468Amazon1994{'name': 'Bookpages', 'permalink': 'bookpages'}1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
469Amazon1994{'name': 'Amazon Route 53', 'permalink': 'amaz...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
470Amazon1994{'name': 'Kindle Fire', 'permalink': 'kindle-f...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
471Amazon1994{'name': 'Kindle Touch', 'permalink': 'kindle-...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
\n", + "

472 rows × 10 columns

\n", + "
" + ], + "text/plain": [ + " name founded_year products \\\n", + "0 Zlio 2005 {'name': 'Zlio', 'permalink': 'zlio'} \n", + "1 TheFind 2006 {'name': 'TheFind.com', 'permalink': 'thefind-... \n", + "2 Zazzle 1999 {'name': 'Zazzle', 'permalink': 'zazzle-com'} \n", + "3 Kaboodle 2005 {'name': 'Kaboodle', 'permalink': 'kaboodle'} \n", + "4 Vast 2001 {'name': 'Vast', 'permalink': 'vast'} \n", + ".. ... ... ... \n", + "467 Amazon 1994 {'name': 'Exchange.com', 'permalink': 'exchang... \n", + "468 Amazon 1994 {'name': 'Bookpages', 'permalink': 'bookpages'} \n", + "469 Amazon 1994 {'name': 'Amazon Route 53', 'permalink': 'amaz... \n", + "470 Amazon 1994 {'name': 'Kindle Fire', 'permalink': 'kindle-f... \n", + "471 Amazon 1994 {'name': 'Kindle Touch', 'permalink': 'kindle-... \n", + "\n", + " address1 address2 zip_code city country_code \\\n", + "0 55 rue Servan 75011 Paris FRA \n", + "1 310 Villa Street 94041 Mountain View USA \n", + "2 1800 Seaport 94063 Redwood City USA \n", + "3 640 W. California Ave. Suite 220 94041 Sunnyvale USA \n", + "4 560 Sutter St. Suite 300 94108 San Francisco USA \n", + ".. ... ... ... ... ... \n", + "467 1200 12th Ave S # 1200 98144 Seattle USA \n", + "468 1200 12th Ave S # 1200 98144 Seattle USA \n", + "469 1200 12th Ave S # 1200 98144 Seattle USA \n", + "470 1200 12th Ave S # 1200 98144 Seattle USA \n", + "471 1200 12th Ave S # 1200 98144 Seattle USA \n", + "\n", + " latitude longitude \n", + "0 48.862825 2.381836 \n", + "1 37.391502 -122.073463 \n", + "2 37.510854 -122.201356 \n", + "3 37.382162 -122.036301 \n", + "4 37.787183 -122.397759 \n", + ".. ... ... \n", + "467 47.592300 -122.317295 \n", + "468 47.592300 -122.317295 \n", + "469 47.592300 -122.317295 \n", + "470 47.592300 -122.317295 \n", + "471 47.592300 -122.317295 \n", + "\n", + "[472 rows x 10 columns]" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfmap" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 472 entries, 0 to 471\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 472 non-null object \n", + " 1 founded_year 472 non-null object \n", + " 2 products 472 non-null object \n", + " 3 address1 467 non-null object \n", + " 4 address2 463 non-null object \n", + " 5 zip_code 468 non-null object \n", + " 6 city 471 non-null object \n", + " 7 country_code 472 non-null object \n", + " 8 latitude 314 non-null float64\n", + " 9 longitude 314 non-null float64\n", + "dtypes: float64(2), object(8)\n", + "memory usage: 37.0+ KB\n" + ] + } + ], + "source": [ + "dfmap.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "#Rellenando datos nulos con Unknown\n", + "dfmap[['address1', 'address2', 'zip_code']] = dfmap[['address1', 'address2', 'zip_code']].fillna('Unknwon')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "#Rellenando datos nulos con 0.0\n", + "dfmap[['longitude', 'latitude']] = dfmap[['longitude', 'latitude']].fillna(0.0)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 472 entries, 0 to 471\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 472 non-null object \n", + " 1 founded_year 472 non-null object \n", + " 2 products 472 non-null object \n", + " 3 address1 472 non-null object \n", + " 4 address2 472 non-null object \n", + " 5 zip_code 472 non-null object \n", + " 6 city 471 non-null object \n", + " 7 country_code 472 non-null object \n", + " 8 latitude 472 non-null float64\n", + " 9 longitude 472 non-null float64\n", + "dtypes: float64(2), object(8)\n", + "memory usage: 37.0+ KB\n" + ] + } + ], + "source": [ + "dfmap.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Creando colección en mongodb\n", + "db.datos_mapa.insert_many(dfmap.to_dict('records'))" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'principal_2dsphere'" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Agregando 2dsphere\n", + "db.datos_mapa.create_index ([('principal', '2dsphere')])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 48.862825\n", + "1 37.391502\n", + "2 37.510854\n", + "3 37.382162\n", + "4 37.787183\n", + " ... \n", + "467 47.592300\n", + "468 47.592300\n", + "469 47.592300\n", + "470 47.592300\n", + "471 47.592300\n", + "Name: latitude, Length: 472, dtype: float64" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfmap.latitude" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "#Obteniendo datos que se usarán en mapas\n", + "dire = dfmap[['latitude','longitude']]\n", + "dire1= dire.values.tolist()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "import folium \n", + "from folium import plugins\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Primer mapa\n", + "#Realicé este para con la finalidad de visualidar los lugares con presencia\n", + "#de e-commerce y observamos mayor presencia en USA y una parte de Europa.\n", + "mapa_ek= folium.Map([38.500000, -98.000000], zoom_start=3)\n", + "data = dfmap[['latitude', 'longitude']].values\n", + "mapa_ek.add_child(plugins.HeatMap(data, radius=15))\n" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "mapa_ek.save('mapa1.html')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Se realizó este segundo mapa para identificar ubicaciones de e-commerce\n", + "\n", + "map_ek2 = folium.Map(\n", + " location=[38.500000, -98.000000],\n", + " tiles='Stamen Terrain',\n", + " zoom_start=4)\n", + "for point in range(0,len(dire1)):\n", + " folium.Marker(dire1[point], popup=dfmap['name'][point]).add_to(map_ek2)\n", + "map_ek2 " + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek2.save('mapa2.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Se hizo este mapa para visualizar las ubicaciones de e-commerce en London\n", + "map_ek3 = folium.Map(\n", + " location=[51.5024412,-0.0189649],\n", + " zoom_start=7)\n", + "for point in range(0,len(dire1)):\n", + " folium.Marker(dire1[point], popup=dfmap['name'][point]).add_to(map_ek3)\n", + " \n", + "map_ek3" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek3.save('mapa3.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Se estabelció una zona determinada como candidata para la ubicación del co-work\n", + "map_ek4 = folium.Map(\n", + " location=[51.509865,-0.118092],\n", + " zoom_start=10)\n", + "\n", + "folium.CircleMarker(\n", + " location=[51.509865,-0.118092],\n", + " radius=70,\n", + " popup='Laurelhurst Park',\n", + " color='#3186cc',\n", + " fill=True,\n", + " fill_color='#3186cc'\n", + " ).add_to(map_ek4)\n", + "icon=folium.Icon(color='white', icon='car', icon_color=\"blue\", prefix='fa')\n", + "for point in range(0,len(dire1)):\n", + " folium.Marker(dire1[point], popup=dfmap['name'][point]).add_to(map_ek4)\n", + "\n", + "\n", + "map_ek4 " + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek4.save('mapa4.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#El circulo rojo indica la ubicación del cowork \n", + "map_ek5 = folium.Map(\n", + " location=[51.509865,-0.118092],\n", + " zoom_start=10)\n", + "\n", + "folium.CircleMarker(\n", + " location=[51.509865,-0.118092],\n", + " radius=70,\n", + " popup='Laurelhurst Park',\n", + " color='#3186cc',\n", + " fill=True,\n", + " fill_color='#3186cc'\n", + " ).add_to(map_ek5)\n", + "\n", + "folium.Circle(\n", + " location=[51.5024412,-0.0189649],\n", + " radius=200,\n", + " popup='The Waterfront',\n", + " color='crimson',\n", + " fill=False,\n", + ").add_to(map_ek5)\n", + "\n", + "folium.Marker([51.503827, 0.049267], popup= dfmap['name'][point],\n", + " icon=folium.Icon(color='red',icon='plane')).add_to(map_ek5)\n", + "map_ek5" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek5.save('mapa5.html')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/module-2/python-bi-project/BI-PROJECT/BIPROJECTsromero.ipynb b/module-2/python-bi-project/BI-PROJECT/BIPROJECTsromero.ipynb new file mode 100644 index 0000000..bf8e355 --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/BIPROJECTsromero.ipynb @@ -0,0 +1,1754 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np \n", + "import pandas as pd \n", + "import pymongo\n", + "from pymongo import MongoClient\n", + "#Importando librerias" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "#Conectando con Mogo\n", + "client = MongoClient('mongodb://localhost:27017')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "#llamando companies\n", + "db=client.companies\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "#Guardando en curr con querys de filtrado de datos\n", + "curr_ekchuah = db.companies.find ({'$and': [\n", + " {'category_code': 'ecommerce'},\\\n", + " {'offices': {'$not': {'$size':0}}},\\\n", + " {'founded_year': {'$lt':2012}}\n", + " ]},\\\n", + " {'_id':0,'name':1, 'offices':1, 'founded_year':1, 'products':1}).sort('ipo.valuation_amount',1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 464 entries, 0 to 463\n", + "Data columns (total 4 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 464 non-null object\n", + " 1 founded_year 464 non-null int64 \n", + " 2 products 464 non-null object\n", + " 3 offices 464 non-null object\n", + "dtypes: int64(1), object(3)\n", + "memory usage: 14.6+ KB\n" + ] + } + ], + "source": [ + "#convirtiendo en df\n", + "df=pd.DataFrame(curr_ekchuah)\n", + "df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namefounded_yearproductsoffices
0Wize2006[][{'description': '', 'address1': '1110 Burling...
1Zlio2005[{'name': 'Zlio', 'permalink': 'zlio'}][{'description': '', 'address1': '55 rue Serva...
2TheFind2006[{'name': 'TheFind.com', 'permalink': 'thefind...[{'description': '', 'address1': '310 Villa St...
3Zazzle1999[{'name': 'Zazzle', 'permalink': 'zazzle-com'}][{'description': '', 'address1': '1800 Seaport...
4Kaboodle2005[{'name': 'Kaboodle', 'permalink': 'kaboodle'}][{'description': '', 'address1': '640 W. Calif...
...............
459Lift Media2008[][{'description': 'HQ', 'address1': '332 Pine S...
460Nokaut2006[][{'description': 'Grupa Nokaut HQ', 'address1'...
461Shutterfly1999[][{'description': '', 'address1': '2800 Bridge ...
462Higher One2000[][{'description': 'HQ', 'address1': '25 Science...
463Amazon1994[{'name': 'Amazon EC2', 'permalink': 'amazon-e...[{'description': None, 'address1': '1200 12th ...
\n", + "

464 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " name founded_year \\\n", + "0 Wize 2006 \n", + "1 Zlio 2005 \n", + "2 TheFind 2006 \n", + "3 Zazzle 1999 \n", + "4 Kaboodle 2005 \n", + ".. ... ... \n", + "459 Lift Media 2008 \n", + "460 Nokaut 2006 \n", + "461 Shutterfly 1999 \n", + "462 Higher One 2000 \n", + "463 Amazon 1994 \n", + "\n", + " products \\\n", + "0 [] \n", + "1 [{'name': 'Zlio', 'permalink': 'zlio'}] \n", + "2 [{'name': 'TheFind.com', 'permalink': 'thefind... \n", + "3 [{'name': 'Zazzle', 'permalink': 'zazzle-com'}] \n", + "4 [{'name': 'Kaboodle', 'permalink': 'kaboodle'}] \n", + ".. ... \n", + "459 [] \n", + "460 [] \n", + "461 [] \n", + "462 [] \n", + "463 [{'name': 'Amazon EC2', 'permalink': 'amazon-e... \n", + "\n", + " offices \n", + "0 [{'description': '', 'address1': '1110 Burling... \n", + "1 [{'description': '', 'address1': '55 rue Serva... \n", + "2 [{'description': '', 'address1': '310 Villa St... \n", + "3 [{'description': '', 'address1': '1800 Seaport... \n", + "4 [{'description': '', 'address1': '640 W. Calif... \n", + ".. ... \n", + "459 [{'description': 'HQ', 'address1': '332 Pine S... \n", + "460 [{'description': 'Grupa Nokaut HQ', 'address1'... \n", + "461 [{'description': '', 'address1': '2800 Bridge ... \n", + "462 [{'description': 'HQ', 'address1': '25 Science... \n", + "463 [{'description': None, 'address1': '1200 12th ... \n", + "\n", + "[464 rows x 4 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "new_df = pd.DataFrame(index=[], columns=df.columns)\n", + "for _, i in df.iterrows():\n", + " flattened_d = [dict(i.to_dict(), offices=c) for c in i.offices]\n", + " new_df = new_df.append(flattened_d )" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namefounded_yearproductsoffices
0Wize2006[]{'description': '', 'address1': '1110 Burlinga...
0Zlio2005[{'name': 'Zlio', 'permalink': 'zlio'}]{'description': '', 'address1': '55 rue Servan...
0TheFind2006[{'name': 'TheFind.com', 'permalink': 'thefind...{'description': '', 'address1': '310 Villa Str...
0Zazzle1999[{'name': 'Zazzle', 'permalink': 'zazzle-com'}]{'description': '', 'address1': '1800 Seaport'...
0Kaboodle2005[{'name': 'Kaboodle', 'permalink': 'kaboodle'}]{'description': '', 'address1': '640 W. Califo...
...............
0Lift Media2008[]{'description': 'HQ', 'address1': '332 Pine St...
0Nokaut2006[]{'description': 'Grupa Nokaut HQ', 'address1':...
0Shutterfly1999[]{'description': '', 'address1': '2800 Bridge P...
0Higher One2000[]{'description': 'HQ', 'address1': '25 Science ...
0Amazon1994[{'name': 'Amazon EC2', 'permalink': 'amazon-e...{'description': None, 'address1': '1200 12th A...
\n", + "

551 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " name founded_year \\\n", + "0 Wize 2006 \n", + "0 Zlio 2005 \n", + "0 TheFind 2006 \n", + "0 Zazzle 1999 \n", + "0 Kaboodle 2005 \n", + ".. ... ... \n", + "0 Lift Media 2008 \n", + "0 Nokaut 2006 \n", + "0 Shutterfly 1999 \n", + "0 Higher One 2000 \n", + "0 Amazon 1994 \n", + "\n", + " products \\\n", + "0 [] \n", + "0 [{'name': 'Zlio', 'permalink': 'zlio'}] \n", + "0 [{'name': 'TheFind.com', 'permalink': 'thefind... \n", + "0 [{'name': 'Zazzle', 'permalink': 'zazzle-com'}] \n", + "0 [{'name': 'Kaboodle', 'permalink': 'kaboodle'}] \n", + ".. ... \n", + "0 [] \n", + "0 [] \n", + "0 [] \n", + "0 [] \n", + "0 [{'name': 'Amazon EC2', 'permalink': 'amazon-e... \n", + "\n", + " offices \n", + "0 {'description': '', 'address1': '1110 Burlinga... \n", + "0 {'description': '', 'address1': '55 rue Servan... \n", + "0 {'description': '', 'address1': '310 Villa Str... \n", + "0 {'description': '', 'address1': '1800 Seaport'... \n", + "0 {'description': '', 'address1': '640 W. Califo... \n", + ".. ... \n", + "0 {'description': 'HQ', 'address1': '332 Pine St... \n", + "0 {'description': 'Grupa Nokaut HQ', 'address1':... \n", + "0 {'description': '', 'address1': '2800 Bridge P... \n", + "0 {'description': 'HQ', 'address1': '25 Science ... \n", + "0 {'description': None, 'address1': '1200 12th A... \n", + "\n", + "[551 rows x 4 columns]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_df" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "#Preparando las colunmas que tienen datos anidados, offices\n", + "new_df1 = pd.DataFrame(index=[], columns=new_df.columns)\n", + "for _, i in new_df.iterrows():\n", + " flattened_d = [dict(i.to_dict(), products=c) for c in i.products]\n", + " new_df1 = new_df1.append(flattened_d )\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "new_df1 = new_df1.reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 472 entries, 0 to 471\n", + "Data columns (total 5 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 index 472 non-null int64 \n", + " 1 name 472 non-null object\n", + " 2 founded_year 472 non-null object\n", + " 3 products 472 non-null object\n", + " 4 offices 472 non-null object\n", + "dtypes: int64(1), object(4)\n", + "memory usage: 18.6+ KB\n" + ] + } + ], + "source": [ + "new_df1.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/sromero/.local/lib/python3.6/site-packages/ipykernel_launcher.py:4: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n", + " after removing the cwd from sys.path.\n" + ] + } + ], + "source": [ + "from pandas.io.json import json_normalize\n", + "#Desanidando offices con json_normalize\n", + "\n", + "new_df2=json_normalize(new_df1['offices'])" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
descriptionaddress1address2zip_codecitystate_codecountry_codelatitudelongitude
055 rue Servan75011ParisNoneFRA48.8628252.381836
1310 Villa Street94041Mountain ViewCAUSA37.391502-122.073463
21800 Seaport94063Redwood CityCAUSA37.510854-122.201356
3640 W. California Ave.Suite 22094041SunnyvaleCAUSA37.382162-122.036301
4Vast San Francisco560 Sutter St.Suite 30094108San FranciscoCAUSA37.787183-122.397759
..............................
467None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
468None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
469None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
470None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
471None1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
\n", + "

472 rows × 9 columns

\n", + "
" + ], + "text/plain": [ + " description address1 address2 zip_code \\\n", + "0 55 rue Servan 75011 \n", + "1 310 Villa Street 94041 \n", + "2 1800 Seaport 94063 \n", + "3 640 W. California Ave. Suite 220 94041 \n", + "4 Vast San Francisco 560 Sutter St. Suite 300 94108 \n", + ".. ... ... ... ... \n", + "467 None 1200 12th Ave S # 1200 98144 \n", + "468 None 1200 12th Ave S # 1200 98144 \n", + "469 None 1200 12th Ave S # 1200 98144 \n", + "470 None 1200 12th Ave S # 1200 98144 \n", + "471 None 1200 12th Ave S # 1200 98144 \n", + "\n", + " city state_code country_code latitude longitude \n", + "0 Paris None FRA 48.862825 2.381836 \n", + "1 Mountain View CA USA 37.391502 -122.073463 \n", + "2 Redwood City CA USA 37.510854 -122.201356 \n", + "3 Sunnyvale CA USA 37.382162 -122.036301 \n", + "4 San Francisco CA USA 37.787183 -122.397759 \n", + ".. ... ... ... ... ... \n", + "467 Seattle WA USA 47.592300 -122.317295 \n", + "468 Seattle WA USA 47.592300 -122.317295 \n", + "469 Seattle WA USA 47.592300 -122.317295 \n", + "470 Seattle WA USA 47.592300 -122.317295 \n", + "471 Seattle WA USA 47.592300 -122.317295 \n", + "\n", + "[472 rows x 9 columns]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_df2\n", + "#En este df ya tenemos la información desanidada de offices" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "new_df2 = new_df2.drop(columns='description')\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "#Juntando los df\n", + "dfmap = pd.concat([new_df1, new_df2], axis=1, sort=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexnamefounded_yearproductsofficesaddress1address2zip_codecitystate_codecountry_codelatitudelongitude
00Zlio2005{'name': 'Zlio', 'permalink': 'zlio'}{'description': '', 'address1': '55 rue Servan...55 rue Servan75011ParisNoneFRA48.8628252.381836
10TheFind2006{'name': 'TheFind.com', 'permalink': 'thefind-...{'description': '', 'address1': '310 Villa Str...310 Villa Street94041Mountain ViewCAUSA37.391502-122.073463
20Zazzle1999{'name': 'Zazzle', 'permalink': 'zazzle-com'}{'description': '', 'address1': '1800 Seaport'...1800 Seaport94063Redwood CityCAUSA37.510854-122.201356
30Kaboodle2005{'name': 'Kaboodle', 'permalink': 'kaboodle'}{'description': '', 'address1': '640 W. Califo...640 W. California Ave.Suite 22094041SunnyvaleCAUSA37.382162-122.036301
40Vast2001{'name': 'Vast', 'permalink': 'vast'}{'description': 'Vast San Francisco', 'address...560 Sutter St.Suite 30094108San FranciscoCAUSA37.787183-122.397759
..........................................
46736Amazon1994{'name': 'Exchange.com', 'permalink': 'exchang...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
46837Amazon1994{'name': 'Bookpages', 'permalink': 'bookpages'}{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
46938Amazon1994{'name': 'Amazon Route 53', 'permalink': 'amaz...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
47039Amazon1994{'name': 'Kindle Fire', 'permalink': 'kindle-f...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
47140Amazon1994{'name': 'Kindle Touch', 'permalink': 'kindle-...{'description': None, 'address1': '1200 12th A...1200 12th AveS # 120098144SeattleWAUSA47.592300-122.317295
\n", + "

472 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " index name founded_year \\\n", + "0 0 Zlio 2005 \n", + "1 0 TheFind 2006 \n", + "2 0 Zazzle 1999 \n", + "3 0 Kaboodle 2005 \n", + "4 0 Vast 2001 \n", + ".. ... ... ... \n", + "467 36 Amazon 1994 \n", + "468 37 Amazon 1994 \n", + "469 38 Amazon 1994 \n", + "470 39 Amazon 1994 \n", + "471 40 Amazon 1994 \n", + "\n", + " products \\\n", + "0 {'name': 'Zlio', 'permalink': 'zlio'} \n", + "1 {'name': 'TheFind.com', 'permalink': 'thefind-... \n", + "2 {'name': 'Zazzle', 'permalink': 'zazzle-com'} \n", + "3 {'name': 'Kaboodle', 'permalink': 'kaboodle'} \n", + "4 {'name': 'Vast', 'permalink': 'vast'} \n", + ".. ... \n", + "467 {'name': 'Exchange.com', 'permalink': 'exchang... \n", + "468 {'name': 'Bookpages', 'permalink': 'bookpages'} \n", + "469 {'name': 'Amazon Route 53', 'permalink': 'amaz... \n", + "470 {'name': 'Kindle Fire', 'permalink': 'kindle-f... \n", + "471 {'name': 'Kindle Touch', 'permalink': 'kindle-... \n", + "\n", + " offices \\\n", + "0 {'description': '', 'address1': '55 rue Servan... \n", + "1 {'description': '', 'address1': '310 Villa Str... \n", + "2 {'description': '', 'address1': '1800 Seaport'... \n", + "3 {'description': '', 'address1': '640 W. Califo... \n", + "4 {'description': 'Vast San Francisco', 'address... \n", + ".. ... \n", + "467 {'description': None, 'address1': '1200 12th A... \n", + "468 {'description': None, 'address1': '1200 12th A... \n", + "469 {'description': None, 'address1': '1200 12th A... \n", + "470 {'description': None, 'address1': '1200 12th A... \n", + "471 {'description': None, 'address1': '1200 12th A... \n", + "\n", + " address1 address2 zip_code city state_code \\\n", + "0 55 rue Servan 75011 Paris None \n", + "1 310 Villa Street 94041 Mountain View CA \n", + "2 1800 Seaport 94063 Redwood City CA \n", + "3 640 W. California Ave. Suite 220 94041 Sunnyvale CA \n", + "4 560 Sutter St. Suite 300 94108 San Francisco CA \n", + ".. ... ... ... ... ... \n", + "467 1200 12th Ave S # 1200 98144 Seattle WA \n", + "468 1200 12th Ave S # 1200 98144 Seattle WA \n", + "469 1200 12th Ave S # 1200 98144 Seattle WA \n", + "470 1200 12th Ave S # 1200 98144 Seattle WA \n", + "471 1200 12th Ave S # 1200 98144 Seattle WA \n", + "\n", + " country_code latitude longitude \n", + "0 FRA 48.862825 2.381836 \n", + "1 USA 37.391502 -122.073463 \n", + "2 USA 37.510854 -122.201356 \n", + "3 USA 37.382162 -122.036301 \n", + "4 USA 37.787183 -122.397759 \n", + ".. ... ... ... \n", + "467 USA 47.592300 -122.317295 \n", + "468 USA 47.592300 -122.317295 \n", + "469 USA 47.592300 -122.317295 \n", + "470 USA 47.592300 -122.317295 \n", + "471 USA 47.592300 -122.317295 \n", + "\n", + "[472 rows x 13 columns]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfmap" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "dfmap = dfmap.drop(columns='offices')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "dfmap = dfmap.drop(columns='index')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "dfmap = dfmap.drop(columns='state_code')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namefounded_yearproductsaddress1address2zip_codecitycountry_codelatitudelongitude
0Zlio2005{'name': 'Zlio', 'permalink': 'zlio'}55 rue Servan75011ParisFRA48.8628252.381836
1TheFind2006{'name': 'TheFind.com', 'permalink': 'thefind-...310 Villa Street94041Mountain ViewUSA37.391502-122.073463
2Zazzle1999{'name': 'Zazzle', 'permalink': 'zazzle-com'}1800 Seaport94063Redwood CityUSA37.510854-122.201356
3Kaboodle2005{'name': 'Kaboodle', 'permalink': 'kaboodle'}640 W. California Ave.Suite 22094041SunnyvaleUSA37.382162-122.036301
4Vast2001{'name': 'Vast', 'permalink': 'vast'}560 Sutter St.Suite 30094108San FranciscoUSA37.787183-122.397759
.................................
467Amazon1994{'name': 'Exchange.com', 'permalink': 'exchang...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
468Amazon1994{'name': 'Bookpages', 'permalink': 'bookpages'}1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
469Amazon1994{'name': 'Amazon Route 53', 'permalink': 'amaz...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
470Amazon1994{'name': 'Kindle Fire', 'permalink': 'kindle-f...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
471Amazon1994{'name': 'Kindle Touch', 'permalink': 'kindle-...1200 12th AveS # 120098144SeattleUSA47.592300-122.317295
\n", + "

472 rows × 10 columns

\n", + "
" + ], + "text/plain": [ + " name founded_year products \\\n", + "0 Zlio 2005 {'name': 'Zlio', 'permalink': 'zlio'} \n", + "1 TheFind 2006 {'name': 'TheFind.com', 'permalink': 'thefind-... \n", + "2 Zazzle 1999 {'name': 'Zazzle', 'permalink': 'zazzle-com'} \n", + "3 Kaboodle 2005 {'name': 'Kaboodle', 'permalink': 'kaboodle'} \n", + "4 Vast 2001 {'name': 'Vast', 'permalink': 'vast'} \n", + ".. ... ... ... \n", + "467 Amazon 1994 {'name': 'Exchange.com', 'permalink': 'exchang... \n", + "468 Amazon 1994 {'name': 'Bookpages', 'permalink': 'bookpages'} \n", + "469 Amazon 1994 {'name': 'Amazon Route 53', 'permalink': 'amaz... \n", + "470 Amazon 1994 {'name': 'Kindle Fire', 'permalink': 'kindle-f... \n", + "471 Amazon 1994 {'name': 'Kindle Touch', 'permalink': 'kindle-... \n", + "\n", + " address1 address2 zip_code city country_code \\\n", + "0 55 rue Servan 75011 Paris FRA \n", + "1 310 Villa Street 94041 Mountain View USA \n", + "2 1800 Seaport 94063 Redwood City USA \n", + "3 640 W. California Ave. Suite 220 94041 Sunnyvale USA \n", + "4 560 Sutter St. Suite 300 94108 San Francisco USA \n", + ".. ... ... ... ... ... \n", + "467 1200 12th Ave S # 1200 98144 Seattle USA \n", + "468 1200 12th Ave S # 1200 98144 Seattle USA \n", + "469 1200 12th Ave S # 1200 98144 Seattle USA \n", + "470 1200 12th Ave S # 1200 98144 Seattle USA \n", + "471 1200 12th Ave S # 1200 98144 Seattle USA \n", + "\n", + " latitude longitude \n", + "0 48.862825 2.381836 \n", + "1 37.391502 -122.073463 \n", + "2 37.510854 -122.201356 \n", + "3 37.382162 -122.036301 \n", + "4 37.787183 -122.397759 \n", + ".. ... ... \n", + "467 47.592300 -122.317295 \n", + "468 47.592300 -122.317295 \n", + "469 47.592300 -122.317295 \n", + "470 47.592300 -122.317295 \n", + "471 47.592300 -122.317295 \n", + "\n", + "[472 rows x 10 columns]" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfmap" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 472 entries, 0 to 471\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 472 non-null object \n", + " 1 founded_year 472 non-null object \n", + " 2 products 472 non-null object \n", + " 3 address1 467 non-null object \n", + " 4 address2 463 non-null object \n", + " 5 zip_code 468 non-null object \n", + " 6 city 471 non-null object \n", + " 7 country_code 472 non-null object \n", + " 8 latitude 314 non-null float64\n", + " 9 longitude 314 non-null float64\n", + "dtypes: float64(2), object(8)\n", + "memory usage: 37.0+ KB\n" + ] + } + ], + "source": [ + "dfmap.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "#Rellenando datos nulos con Unknown\n", + "dfmap[['address1', 'address2', 'zip_code']] = dfmap[['address1', 'address2', 'zip_code']].fillna('Unknwon')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "#Rellenando datos nulos con 0.0\n", + "dfmap[['longitude', 'latitude']] = dfmap[['longitude', 'latitude']].fillna(0.0)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 472 entries, 0 to 471\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 472 non-null object \n", + " 1 founded_year 472 non-null object \n", + " 2 products 472 non-null object \n", + " 3 address1 472 non-null object \n", + " 4 address2 472 non-null object \n", + " 5 zip_code 472 non-null object \n", + " 6 city 471 non-null object \n", + " 7 country_code 472 non-null object \n", + " 8 latitude 472 non-null float64\n", + " 9 longitude 472 non-null float64\n", + "dtypes: float64(2), object(8)\n", + "memory usage: 37.0+ KB\n" + ] + } + ], + "source": [ + "dfmap.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Creando colección en mongodb\n", + "db.datos_mapa.insert_many(dfmap.to_dict('records'))" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'principal_2dsphere'" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Agregando 2dsphere\n", + "db.datos_mapa.create_index ([('principal', '2dsphere')])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 48.862825\n", + "1 37.391502\n", + "2 37.510854\n", + "3 37.382162\n", + "4 37.787183\n", + " ... \n", + "467 47.592300\n", + "468 47.592300\n", + "469 47.592300\n", + "470 47.592300\n", + "471 47.592300\n", + "Name: latitude, Length: 472, dtype: float64" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfmap.latitude" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "#Obteniendo datos que se usarán en mapas\n", + "dire = dfmap[['latitude','longitude']]\n", + "dire1= dire.values.tolist()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "import folium \n", + "from folium import plugins\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Primer mapa\n", + "#Realicé este mapa para con la finalidad de visualidar los lugares con presencia\n", + "#de e-commerce y observamos mayor presencia en USA y una parte de Europa.\n", + "mapa_ek= folium.Map([38.500000, -98.000000], zoom_start=3)\n", + "data = dfmap[['latitude', 'longitude']].values\n", + "mapa_ek.add_child(plugins.HeatMap(data, radius=15))\n" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "mapa_ek.save('mapa1.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Se realizó este segundo mapa para identificar ubicaciones de e-commerce\n", + "\n", + "map_ek2 = folium.Map(\n", + " location=[38.500000, -98.000000],\n", + " tiles='Stamen Terrain',\n", + " zoom_start=4)\n", + "for point in range(0,len(dire1)):\n", + " folium.Marker(dire1[point], popup=dfmap['name'][point]).add_to(map_ek2)\n", + "map_ek2 " + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek2.save('mapa2.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Se hizo este mapa para visualizar las ubicaciones de e-commerce en London\n", + "map_ek3 = folium.Map(\n", + " location=[51.5024412,-0.0189649],\n", + " zoom_start=7)\n", + "for point in range(0,len(dire1)):\n", + " folium.Marker(dire1[point], popup=dfmap['name'][point]).add_to(map_ek3)\n", + " \n", + "map_ek3" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek3.save('mapa3.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Se estabelció una zona determinada como candidata para la ubicación del co-work\n", + "map_ek4 = folium.Map(\n", + " location=[51.509865,-0.118092],\n", + " zoom_start=10)\n", + "\n", + "folium.CircleMarker(\n", + " location=[51.509865,-0.118092],\n", + " radius=70,\n", + " popup='Laurelhurst Park',\n", + " color='#3186cc',\n", + " fill=True,\n", + " fill_color='#3186cc'\n", + " ).add_to(map_ek4)\n", + "icon=folium.Icon(color='white', icon='car', icon_color=\"blue\", prefix='fa')\n", + "for point in range(0,len(dire1)):\n", + " folium.Marker(dire1[point], popup=dfmap['name'][point]).add_to(map_ek4)\n", + "\n", + "\n", + "map_ek4 " + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek4.save('mapa4.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#El circulo rojo indica la ubicación del cowork \n", + "map_ek5 = folium.Map(\n", + " location=[51.509865,-0.118092],\n", + " zoom_start=10)\n", + "\n", + "folium.CircleMarker(\n", + " location=[51.509865,-0.118092],\n", + " radius=70,\n", + " popup='Laurelhurst Park',\n", + " color='#3186cc',\n", + " fill=True,\n", + " fill_color='#3186cc'\n", + " ).add_to(map_ek5)\n", + "\n", + "folium.Circle(\n", + " location=[51.5024412,-0.0189649],\n", + " radius=200,\n", + " popup='The Waterfront',\n", + " color='crimson',\n", + " fill=False,\n", + ").add_to(map_ek5)\n", + "\n", + "folium.Marker([51.503827, 0.049267], popup= dfmap['name'][point],\n", + " icon=folium.Icon(color='red',icon='plane')).add_to(map_ek5)\n", + "map_ek5" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "map_ek5.save('mapa5.html')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/module-2/python-bi-project/BI-PROJECT/README.md b/module-2/python-bi-project/BI-PROJECT/README.md new file mode 100644 index 0000000..2bc908a --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/README.md @@ -0,0 +1,32 @@ +# **BI PROJECT** + +Saúl Romero Bárcenas + + + +## **Cliente** + +Ek-Chuah es una app de e-commerce que pertenece al brazo de Ciencia y Tecnología de RUMEC. + +Con el lanzamiento de la aplicación RUMEC se encuentra interezado en entablecerse en un cowork fuera de México para tener alcance internacional. Algunas de las peticiones fué que de preferencia no se ubicara en USA, tuviera infraestructura suficiente para trabajar (escritorio, silla, ventana, etc..), que se encontrara cerca de empresas e-commerce, con un presupuesto de $40,000 mensuales, facil traslado desde Aeropuerto. + + + +## **Metodo** + +Para tener una solución al cliente, se realizaron los pasos siguientes: + +- Se adquirieron datos desde el documento companies localizada en MongoDB. Estos datos fueron filtrados, por lo que solo trabajamos con aquellos que estvieran en la categoria de e-commerce y que estuvieran establecidas por más de 5 años. +- Se visualizaron los datos en mapas de calor utilizando la libreria 'folium' por medio de python, ubicando todas los e-commerce. Se determinó que la mayoria de estas empresas se encuentran en USA y parte de Europa. +- Con las peticiones del cliente nos enfocamos en las ubicaciones de e-commerce europeas. +- Dentro de U.K. se encontraron puntos de interes para las ubicaciones de cowork que se pudieran interesar. Se realizó una consulta con respecto a los precios de renta de coworks que se encontraban en zonas conjuntas a donde se encontraban las e-commerce. +- Se estimó que Servcorp Canary Wharf es el lugar de consideración + +## **Conclusiones** + +Mediante el uso de una base de datos e implementando librerias de python, se llevó a cabo una geolocalización en un mapa considerando peticiones del cliente. La uicación de Servcorp Canary Wharf es en U.K. (51.5024412,-0.0189649) + + + +**** + diff --git a/module-2/python-bi-project/BI-PROJECT/mapa1.html b/module-2/python-bi-project/BI-PROJECT/mapa1.html new file mode 100644 index 0000000..4d94020 --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/mapa1.html @@ -0,0 +1,70 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + \ No newline at end of file diff --git a/module-2/python-bi-project/BI-PROJECT/mapa2.html b/module-2/python-bi-project/BI-PROJECT/mapa2.html new file mode 100644 index 0000000..a9abd62 --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/mapa2.html @@ -0,0 +1,9031 @@ + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + \ No newline at end of file diff --git a/module-2/python-bi-project/BI-PROJECT/mapa3.html b/module-2/python-bi-project/BI-PROJECT/mapa3.html new file mode 100644 index 0000000..031c54a --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/mapa3.html @@ -0,0 +1,9031 @@ + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + \ No newline at end of file diff --git a/module-2/python-bi-project/BI-PROJECT/mapa4.html b/module-2/python-bi-project/BI-PROJECT/mapa4.html new file mode 100644 index 0000000..3ba132b --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/mapa4.html @@ -0,0 +1,9050 @@ + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + \ No newline at end of file diff --git a/module-2/python-bi-project/BI-PROJECT/mapa5.html b/module-2/python-bi-project/BI-PROJECT/mapa5.html new file mode 100644 index 0000000..09e3dad --- /dev/null +++ b/module-2/python-bi-project/BI-PROJECT/mapa5.html @@ -0,0 +1,126 @@ + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + \ No newline at end of file