diff --git a/your-code/main.ipynb b/your-code/main.ipynb index 7900997..ee4b11a 100644 --- a/your-code/main.ipynb +++ b/your-code/main.ipynb @@ -1,169 +1,3584 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 1. Import pandas library" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2. Import users table:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 3. Rename Id column to userId" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 4. Import posts table:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 5. Rename Id column to postId and OwnerUserId to userId" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 6. Define new dataframes for users and posts with the following selected columns:\n", - " **users columns**: userId, Reputation,Views,UpVotes,DownVotes\n", - " **posts columns**: postId, Score,userId,ViewCount,CommentCount" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 7. Merge both dataframes, users and posts. \n", - "You will need to make a [merge](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) of posts and users dataframes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 8. How many missing values do you have in your merged dataframe? On which columns?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 9. You will need to make something with missing values. Will you clean or filling them? Explain. \n", - "**Remember** to check the results of your code before passing to the next step" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 10. Adjust the data types in order to avoid future issues. Which ones should be changed? " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "SXCu_w85sWRX" + }, + "source": [ + "#### 1. Import pandas library" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Oazp2DN9sWRY" + }, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kt_QAYhOsWRZ" + }, + "source": [ + "#### 2. Import users table:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 347 + }, + "id": "br1mjAO8sWRZ", + "outputId": "19d0c501-cb2a-4951-80da-a251cad2d524" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Id Reputation CreationDate DisplayName LastAccessDate \\\n", + "0 -1 1 2010-07-19 06:55:26 Community 2010-07-19 06:55:26 \n", + "1 2 101 2010-07-19 14:01:36 Geoff Dalgas 2013-11-12 22:07:23 \n", + "2 3 101 2010-07-19 15:34:50 Jarrod Dixon 2014-08-08 06:42:58 \n", + "3 4 101 2010-07-19 19:03:27 Emmett 2014-01-02 09:31:02 \n", + "4 5 6792 2010-07-19 19:03:57 Shane 2014-08-13 00:23:47 \n", + "\n", + " WebsiteUrl Location \\\n", + "0 http://meta.stackexchange.com/ on the server farm \n", + "1 http://stackoverflow.com Corvallis, OR \n", + "2 http://stackoverflow.com New York, NY \n", + "3 http://minesweeperonline.com San Francisco, CA \n", + "4 http://www.statalgo.com New York, NY \n", + "\n", + " AboutMe Views UpVotes \\\n", + "0
Hi, I'm not really a person.
\\r\\n\\r\\n... 0 5007 \n", + "1
Developer on the StackOverflow team. Find ... 25 3 \n", + "2
\\r\\n\\r\\n... 11 0 \n", + "4
Quantitative researcher focusing on statist... 1145 662 \n", + "\n", + " DownVotes AccountId Age ProfileImageUrl \n", + "0 1920 -1 NaN NaN \n", + "1 0 2 37.0 NaN \n", + "2 0 3 35.0 NaN \n", + "3 0 1998 28.0 http://i.stack.imgur.com/d1oHX.jpg \n", + "4 5 54503 35.0 NaN " + ], + "text/html": [ + "\n", + "
| \n", + " | Id | \n", + "Reputation | \n", + "CreationDate | \n", + "DisplayName | \n", + "LastAccessDate | \n", + "WebsiteUrl | \n", + "Location | \n", + "AboutMe | \n", + "Views | \n", + "UpVotes | \n", + "DownVotes | \n", + "AccountId | \n", + "Age | \n", + "ProfileImageUrl | \n", + "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", + "-1 | \n", + "1 | \n", + "2010-07-19 06:55:26 | \n", + "Community | \n", + "2010-07-19 06:55:26 | \n", + "http://meta.stackexchange.com/ | \n", + "on the server farm | \n", + "<p>Hi, I'm not really a person.</p>\\r\\n\\r\\n<p>... | \n", + "0 | \n", + "5007 | \n", + "1920 | \n", + "-1 | \n", + "NaN | \n", + "NaN | \n", + "
| 1 | \n", + "2 | \n", + "101 | \n", + "2010-07-19 14:01:36 | \n", + "Geoff Dalgas | \n", + "2013-11-12 22:07:23 | \n", + "http://stackoverflow.com | \n", + "Corvallis, OR | \n", + "<p>Developer on the StackOverflow team. Find ... | \n", + "25 | \n", + "3 | \n", + "0 | \n", + "2 | \n", + "37.0 | \n", + "NaN | \n", + "
| 2 | \n", + "3 | \n", + "101 | \n", + "2010-07-19 15:34:50 | \n", + "Jarrod Dixon | \n", + "2014-08-08 06:42:58 | \n", + "http://stackoverflow.com | \n", + "New York, NY | \n", + "<p><a href=\"http://blog.stackoverflow.com/2009... | \n", + "22 | \n", + "19 | \n", + "0 | \n", + "3 | \n", + "35.0 | \n", + "NaN | \n", + "
| 3 | \n", + "4 | \n", + "101 | \n", + "2010-07-19 19:03:27 | \n", + "Emmett | \n", + "2014-01-02 09:31:02 | \n", + "http://minesweeperonline.com | \n", + "San Francisco, CA | \n", + "<p>currently at a startup in SF</p>\\r\\n\\r\\n<p>... | \n", + "11 | \n", + "0 | \n", + "0 | \n", + "1998 | \n", + "28.0 | \n", + "http://i.stack.imgur.com/d1oHX.jpg | \n", + "
| 4 | \n", + "5 | \n", + "6792 | \n", + "2010-07-19 19:03:57 | \n", + "Shane | \n", + "2014-08-13 00:23:47 | \n", + "http://www.statalgo.com | \n", + "New York, NY | \n", + "<p>Quantitative researcher focusing on statist... | \n", + "1145 | \n", + "662 | \n", + "5 | \n", + "54503 | \n", + "35.0 | \n", + "NaN | \n", + "
Hi, I'm not really a person.
\\r\\n\\r\\n... 0 5007 \n", + "1
Developer on the StackOverflow team. Find ... 25 3 \n", + "2
\\r\\n\\r\\n... 11 0 \n", + "4
Quantitative researcher focusing on statist... 1145 662 \n", + "\n", + " DownVotes AccountId Age ProfileImageUrl \n", + "0 1920 -1 NaN NaN \n", + "1 0 2 37.0 NaN \n", + "2 0 3 35.0 NaN \n", + "3 0 1998 28.0 http://i.stack.imgur.com/d1oHX.jpg \n", + "4 5 54503 35.0 NaN " + ], + "text/html": [ + "\n", + "
| \n", + " | userId | \n", + "Reputation | \n", + "CreationDate | \n", + "DisplayName | \n", + "LastAccessDate | \n", + "WebsiteUrl | \n", + "Location | \n", + "AboutMe | \n", + "Views | \n", + "UpVotes | \n", + "DownVotes | \n", + "AccountId | \n", + "Age | \n", + "ProfileImageUrl | \n", + "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", + "-1 | \n", + "1 | \n", + "2010-07-19 06:55:26 | \n", + "Community | \n", + "2010-07-19 06:55:26 | \n", + "http://meta.stackexchange.com/ | \n", + "on the server farm | \n", + "<p>Hi, I'm not really a person.</p>\\r\\n\\r\\n<p>... | \n", + "0 | \n", + "5007 | \n", + "1920 | \n", + "-1 | \n", + "NaN | \n", + "NaN | \n", + "
| 1 | \n", + "2 | \n", + "101 | \n", + "2010-07-19 14:01:36 | \n", + "Geoff Dalgas | \n", + "2013-11-12 22:07:23 | \n", + "http://stackoverflow.com | \n", + "Corvallis, OR | \n", + "<p>Developer on the StackOverflow team. Find ... | \n", + "25 | \n", + "3 | \n", + "0 | \n", + "2 | \n", + "37.0 | \n", + "NaN | \n", + "
| 2 | \n", + "3 | \n", + "101 | \n", + "2010-07-19 15:34:50 | \n", + "Jarrod Dixon | \n", + "2014-08-08 06:42:58 | \n", + "http://stackoverflow.com | \n", + "New York, NY | \n", + "<p><a href=\"http://blog.stackoverflow.com/2009... | \n", + "22 | \n", + "19 | \n", + "0 | \n", + "3 | \n", + "35.0 | \n", + "NaN | \n", + "
| 3 | \n", + "4 | \n", + "101 | \n", + "2010-07-19 19:03:27 | \n", + "Emmett | \n", + "2014-01-02 09:31:02 | \n", + "http://minesweeperonline.com | \n", + "San Francisco, CA | \n", + "<p>currently at a startup in SF</p>\\r\\n\\r\\n<p>... | \n", + "11 | \n", + "0 | \n", + "0 | \n", + "1998 | \n", + "28.0 | \n", + "http://i.stack.imgur.com/d1oHX.jpg | \n", + "
| 4 | \n", + "5 | \n", + "6792 | \n", + "2010-07-19 19:03:57 | \n", + "Shane | \n", + "2014-08-13 00:23:47 | \n", + "http://www.statalgo.com | \n", + "New York, NY | \n", + "<p>Quantitative researcher focusing on statist... | \n", + "1145 | \n", + "662 | \n", + "5 | \n", + "54503 | \n", + "35.0 | \n", + "NaN | \n", + "
How should I elicit prior distributions fro... 8.0 \n", + "1
In many different statistical methods there... 24.0 \n", + "2
What are some valuable Statistical Analysis... 18.0 \n", + "3
I have two groups of data. Each with a dif... 23.0 \n", + "4
The R-project
\\n\\n\n",
+ " 5 rows × 21 columns How should I elicit prior distributions fro... 8.0 \n",
+ "1 8198.0 In many different statistical methods there... 24.0 \n",
+ "2 3613.0 What are some valuable Statistical Analysis... 18.0 \n",
+ "3 5224.0 I have two groups of data. Each with a dif... 23.0 \n",
+ "4 NaN The R-project \n",
+ " 5 rows × 21 columns In many different statistical methods there... 24.0 \n",
+ "1 3613.0 What are some valuable Statistical Analysis... 18.0 \n",
+ "2 5224.0 I have two groups of data. Each with a dif... 23.0 \n",
+ "3 NaN The R-project Last year, I read a blog post from Developer on the StackOverflow team. Find ... \n",
+ "1 New York, NY \\r\\n\\r\\n ... \n",
+ "3 New York, NY Quantitative researcher focusing on statist... \n",
+ "4 District of Columbia 5 rows × 35 columns In many different statistical methods there... 24.0 \n",
+ "1 3613.0 What are some valuable Statistical Analysis... 18.0 \n",
+ "2 5224.0 I have two groups of data. Each with a dif... 23.0 \n",
+ "3 0.0 The R-project Last year, I read a blog post from you can use the matlab codes for svm and co... 19966.0 \n",
+ "28537 0.0 I use If I understand your question correctly, yo... 2020.0 \n",
+ "28539 0.0 Doesn't really help you with your question,... 19914.0 \n",
+ "28540 116.0 I have 10 vectors each having 100,000 point... 19968.0 \n",
+ "\n",
+ " LasActivityDate Title \\\n",
+ "0 2012-11-12 09:21:54 What is normality? \n",
+ "1 2013-05-27 14:48:36 What are some valuable Statistical Analysis op... \n",
+ "2 2010-09-08 03:00:19 Assessing the significance of differences in d... \n",
+ "3 2010-07-19 19:21:15 0 \n",
+ "4 2014-05-29 03:54:31 The Two Cultures: statistics vs. machine learn... \n",
+ "... ... ... \n",
+ "28536 2013-01-23 09:00:01 0 \n",
+ "28537 2013-01-23 13:13:30 0 \n",
+ "28538 2013-01-23 09:16:44 0 \n",
+ "28539 2013-01-23 09:36:07 0 \n",
+ "28540 2013-02-22 11:23:54 are data sets obtained from a Normal distribut... \n",
+ "\n",
+ " ... LastAccessDate WebsiteUrl \\\n",
+ "0 ... 2013-11-12 22:07:23 http://stackoverflow.com \n",
+ "1 ... 2014-08-08 06:42:58 http://stackoverflow.com \n",
+ "2 ... 2014-01-02 09:31:02 http://minesweeperonline.com \n",
+ "3 ... 2014-08-13 00:23:47 http://www.statalgo.com \n",
+ "4 ... 2014-08-07 19:49:44 http://www.harlan.harris.name \n",
+ "... ... ... ... \n",
+ "28536 ... 2014-07-15 14:53:00 0 \n",
+ "28537 ... 2014-07-05 05:27:26 0 \n",
+ "28538 ... 2014-07-13 14:47:33 0 \n",
+ "28539 ... 2014-06-26 07:56:53 0 \n",
+ "28540 ... 2014-06-27 14:00:15 0 \n",
+ "\n",
+ " Location \\\n",
+ "0 Corvallis, OR \n",
+ "1 New York, NY \n",
+ "2 San Francisco, CA \n",
+ "3 New York, NY \n",
+ "4 District of Columbia \n",
+ "... ... \n",
+ "28536 Kharagpur, India \n",
+ "28537 0 \n",
+ "28538 0 \n",
+ "28539 0 \n",
+ "28540 0 \n",
+ "\n",
+ " AboutMe Views UpVotes \\\n",
+ "0 Developer on the StackOverflow team. Find ... 25 3 \n",
+ "1 ... 11 0 \n",
+ "3 Quantitative researcher focusing on statist... 1145 662 \n",
+ "4 I am a Research scholar in IIT kharagpur. I... 0 0 \n",
+ "28537 0 0 0 \n",
+ "28538 0 0 0 \n",
+ "28539 0 1 0 \n",
+ "28540 0 9 0 \n",
+ "\n",
+ " DownVotes AccountId Age \\\n",
+ "0 0 2 37.0 \n",
+ "1 0 3 35.0 \n",
+ "2 0 1998 28.0 \n",
+ "3 5 54503 35.0 \n",
+ "4 0 46050 41.0 \n",
+ "... ... ... ... \n",
+ "28536 0 2272095 27.0 \n",
+ "28537 0 4609206 0.0 \n",
+ "28538 0 4524448 0.0 \n",
+ "28539 0 4609233 0.0 \n",
+ "28540 0 2766774 0.0 \n",
+ "\n",
+ " ProfileImageUrl \n",
+ "0 0 \n",
+ "1 0 \n",
+ "2 http://i.stack.imgur.com/d1oHX.jpg \n",
+ "3 0 \n",
+ "4 0 \n",
+ "... ... \n",
+ "28536 https://www.gravatar.com/avatar/cb64ec23c43128... \n",
+ "28537 https://www.gravatar.com/avatar/?s=128&d=ident... \n",
+ "28538 https://www.gravatar.com/avatar/d292896af36243... \n",
+ "28539 https://www.gravatar.com/avatar/80f9e492c63a22... \n",
+ "28540 https://www.gravatar.com/avatar/fe4cdbbd14f7ce... \n",
+ "\n",
+ "[28541 rows x 35 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " 28541 rows × 35 columns In many different statistical methods there... 24.0 \n",
+ "1 What are some valuable Statistical Analysis... 18.0 \n",
+ "2 I have two groups of data. Each with a dif... 23.0 \n",
+ "3 The R-project Last year, I read a blog post from Developer on the StackOverflow team. Find ... \n",
+ "1 New York, NY \\r\\n\\r\\n ... \n",
+ "3 New York, NY Quantitative researcher focusing on statist... \n",
+ "4 District of Columbia 5 rows × 35 columns\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Id \n",
+ " PostTypeId \n",
+ " AcceptedAnswerId \n",
+ " CreaionDate \n",
+ " Score \n",
+ " ViewCount \n",
+ " Body \n",
+ " OwnerUserId \n",
+ " LasActivityDate \n",
+ " Title \n",
+ " ... \n",
+ " AnswerCount \n",
+ " CommentCount \n",
+ " FavoriteCount \n",
+ " LastEditorUserId \n",
+ " LastEditDate \n",
+ " CommunityOwnedDate \n",
+ " ParentId \n",
+ " ClosedDate \n",
+ " OwnerDisplayName \n",
+ " LastEditorDisplayName \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 1 \n",
+ " 15.0 \n",
+ " 2010-07-19 19:12:12 \n",
+ " 23 \n",
+ " 1278.0 \n",
+ " <p>How should I elicit prior distributions fro... \n",
+ " 8.0 \n",
+ " 2010-09-15 21:08:26 \n",
+ " Eliciting priors from experts \n",
+ " ... \n",
+ " 5.0 \n",
+ " 1 \n",
+ " 14.0 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " 59.0 \n",
+ " 2010-07-19 19:12:57 \n",
+ " 22 \n",
+ " 8198.0 \n",
+ " <p>In many different statistical methods there... \n",
+ " 24.0 \n",
+ " 2012-11-12 09:21:54 \n",
+ " What is normality? \n",
+ " ... \n",
+ " 7.0 \n",
+ " 1 \n",
+ " 8.0 \n",
+ " 88.0 \n",
+ " 2010-08-07 17:56:44 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 3 \n",
+ " 1 \n",
+ " 5.0 \n",
+ " 2010-07-19 19:13:28 \n",
+ " 54 \n",
+ " 3613.0 \n",
+ " <p>What are some valuable Statistical Analysis... \n",
+ " 18.0 \n",
+ " 2013-05-27 14:48:36 \n",
+ " What are some valuable Statistical Analysis op... \n",
+ " ... \n",
+ " 19.0 \n",
+ " 4 \n",
+ " 36.0 \n",
+ " 183.0 \n",
+ " 2011-02-12 05:50:03 \n",
+ " 2010-07-19 19:13:28 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4 \n",
+ " 1 \n",
+ " 135.0 \n",
+ " 2010-07-19 19:13:31 \n",
+ " 13 \n",
+ " 5224.0 \n",
+ " <p>I have two groups of data. Each with a dif... \n",
+ " 23.0 \n",
+ " 2010-09-08 03:00:19 \n",
+ " Assessing the significance of differences in d... \n",
+ " ... \n",
+ " 5.0 \n",
+ " 2 \n",
+ " 2.0 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " \n",
+ "4 \n",
+ " 5 \n",
+ " 2 \n",
+ " NaN \n",
+ " 2010-07-19 19:14:43 \n",
+ " 81 \n",
+ " NaN \n",
+ " <p>The R-project</p>\\n\\n<p><a href=\"http://www... \n",
+ " 23.0 \n",
+ " 2010-07-19 19:21:15 \n",
+ " NaN \n",
+ " ... \n",
+ " NaN \n",
+ " 3 \n",
+ " NaN \n",
+ " 23.0 \n",
+ " 2010-07-19 19:21:15 \n",
+ " 2010-07-19 19:14:43 \n",
+ " 3.0 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " postId \n",
+ " PostTypeId \n",
+ " AcceptedAnswerId \n",
+ " CreaionDate \n",
+ " Score \n",
+ " ViewCount \n",
+ " Body \n",
+ " userId \n",
+ " LasActivityDate \n",
+ " Title \n",
+ " ... \n",
+ " AnswerCount \n",
+ " CommentCount \n",
+ " FavoriteCount \n",
+ " LastEditorUserId \n",
+ " LastEditDate \n",
+ " CommunityOwnedDate \n",
+ " ParentId \n",
+ " ClosedDate \n",
+ " OwnerDisplayName \n",
+ " LastEditorDisplayName \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 1 \n",
+ " 15.0 \n",
+ " 2010-07-19 19:12:12 \n",
+ " 23 \n",
+ " 1278.0 \n",
+ " <p>How should I elicit prior distributions fro... \n",
+ " 8.0 \n",
+ " 2010-09-15 21:08:26 \n",
+ " Eliciting priors from experts \n",
+ " ... \n",
+ " 5.0 \n",
+ " 1 \n",
+ " 14.0 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 1 \n",
+ " 59.0 \n",
+ " 2010-07-19 19:12:57 \n",
+ " 22 \n",
+ " 8198.0 \n",
+ " <p>In many different statistical methods there... \n",
+ " 24.0 \n",
+ " 2012-11-12 09:21:54 \n",
+ " What is normality? \n",
+ " ... \n",
+ " 7.0 \n",
+ " 1 \n",
+ " 8.0 \n",
+ " 88.0 \n",
+ " 2010-08-07 17:56:44 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 3 \n",
+ " 1 \n",
+ " 5.0 \n",
+ " 2010-07-19 19:13:28 \n",
+ " 54 \n",
+ " 3613.0 \n",
+ " <p>What are some valuable Statistical Analysis... \n",
+ " 18.0 \n",
+ " 2013-05-27 14:48:36 \n",
+ " What are some valuable Statistical Analysis op... \n",
+ " ... \n",
+ " 19.0 \n",
+ " 4 \n",
+ " 36.0 \n",
+ " 183.0 \n",
+ " 2011-02-12 05:50:03 \n",
+ " 2010-07-19 19:13:28 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4 \n",
+ " 1 \n",
+ " 135.0 \n",
+ " 2010-07-19 19:13:31 \n",
+ " 13 \n",
+ " 5224.0 \n",
+ " <p>I have two groups of data. Each with a dif... \n",
+ " 23.0 \n",
+ " 2010-09-08 03:00:19 \n",
+ " Assessing the significance of differences in d... \n",
+ " ... \n",
+ " 5.0 \n",
+ " 2 \n",
+ " 2.0 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " \n",
+ "4 \n",
+ " 5 \n",
+ " 2 \n",
+ " NaN \n",
+ " 2010-07-19 19:14:43 \n",
+ " 81 \n",
+ " NaN \n",
+ " <p>The R-project</p>\\n\\n<p><a href=\"http://www... \n",
+ " 23.0 \n",
+ " 2010-07-19 19:21:15 \n",
+ " NaN \n",
+ " ... \n",
+ " NaN \n",
+ " 3 \n",
+ " NaN \n",
+ " 23.0 \n",
+ " 2010-07-19 19:21:15 \n",
+ " 2010-07-19 19:14:43 \n",
+ " 3.0 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \\r\\n
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " postId \n",
+ " PostTypeId \n",
+ " AcceptedAnswerId \n",
+ " CreaionDate \n",
+ " Score \n",
+ " ViewCount \n",
+ " Body \n",
+ " userId_x \n",
+ " LasActivityDate \n",
+ " Title \n",
+ " ... \n",
+ " LastAccessDate \n",
+ " WebsiteUrl \n",
+ " Location \n",
+ " AboutMe \n",
+ " Views \n",
+ " UpVotes \n",
+ " DownVotes \n",
+ " AccountId \n",
+ " Age \n",
+ " ProfileImageUrl \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 2 \n",
+ " 1 \n",
+ " 59.0 \n",
+ " 2010-07-19 19:12:57 \n",
+ " 22 \n",
+ " 8198.0 \n",
+ " <p>In many different statistical methods there... \n",
+ " 24.0 \n",
+ " 2012-11-12 09:21:54 \n",
+ " What is normality? \n",
+ " ... \n",
+ " 2013-11-12 22:07:23 \n",
+ " http://stackoverflow.com \n",
+ " Corvallis, OR \n",
+ " <p>Developer on the StackOverflow team. Find ... \n",
+ " 25 \n",
+ " 3 \n",
+ " 0 \n",
+ " 2 \n",
+ " 37.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 3 \n",
+ " 1 \n",
+ " 5.0 \n",
+ " 2010-07-19 19:13:28 \n",
+ " 54 \n",
+ " 3613.0 \n",
+ " <p>What are some valuable Statistical Analysis... \n",
+ " 18.0 \n",
+ " 2013-05-27 14:48:36 \n",
+ " What are some valuable Statistical Analysis op... \n",
+ " ... \n",
+ " 2014-08-08 06:42:58 \n",
+ " http://stackoverflow.com \n",
+ " New York, NY \n",
+ " <p><a href=\"http://blog.stackoverflow.com/2009... \n",
+ " 22 \n",
+ " 19 \n",
+ " 0 \n",
+ " 3 \n",
+ " 35.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4 \n",
+ " 1 \n",
+ " 135.0 \n",
+ " 2010-07-19 19:13:31 \n",
+ " 13 \n",
+ " 5224.0 \n",
+ " <p>I have two groups of data. Each with a dif... \n",
+ " 23.0 \n",
+ " 2010-09-08 03:00:19 \n",
+ " Assessing the significance of differences in d... \n",
+ " ... \n",
+ " 2014-01-02 09:31:02 \n",
+ " http://minesweeperonline.com \n",
+ " San Francisco, CA \n",
+ " <p>currently at a startup in SF</p>\\r\\n\\r\\n<p>... \n",
+ " 11 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1998 \n",
+ " 28.0 \n",
+ " http://i.stack.imgur.com/d1oHX.jpg \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 2 \n",
+ " NaN \n",
+ " 2010-07-19 19:14:43 \n",
+ " 81 \n",
+ " NaN \n",
+ " <p>The R-project</p>\\n\\n<p><a href=\"http://www... \n",
+ " 23.0 \n",
+ " 2010-07-19 19:21:15 \n",
+ " NaN \n",
+ " ... \n",
+ " 2014-08-13 00:23:47 \n",
+ " http://www.statalgo.com \n",
+ " New York, NY \n",
+ " <p>Quantitative researcher focusing on statist... \n",
+ " 1145 \n",
+ " 662 \n",
+ " 5 \n",
+ " 54503 \n",
+ " 35.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " \n",
+ "4 \n",
+ " 6 \n",
+ " 1 \n",
+ " NaN \n",
+ " 2010-07-19 19:14:44 \n",
+ " 152 \n",
+ " 29229.0 \n",
+ " <p>Last year, I read a blog post from <a href=... \n",
+ " 5.0 \n",
+ " 2014-05-29 03:54:31 \n",
+ " The Two Cultures: statistics vs. machine learn... \n",
+ " ... \n",
+ " 2014-08-07 19:49:44 \n",
+ " http://www.harlan.harris.name \n",
+ " District of Columbia \n",
+ " <ul>\\r\\n<li>PhD in CS/AI/Machine Learning/Cogn... \n",
+ " 114 \n",
+ " 47 \n",
+ " 0 \n",
+ " 46050 \n",
+ " 41.0 \n",
+ " NaN \n",
+ " \\r\\n
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " postId \n",
+ " PostTypeId \n",
+ " AcceptedAnswerId \n",
+ " CreaionDate \n",
+ " Score \n",
+ " ViewCount \n",
+ " Body \n",
+ " userId_x \n",
+ " LasActivityDate \n",
+ " Title \n",
+ " ... \n",
+ " LastAccessDate \n",
+ " WebsiteUrl \n",
+ " Location \n",
+ " AboutMe \n",
+ " Views \n",
+ " UpVotes \n",
+ " DownVotes \n",
+ " AccountId \n",
+ " Age \n",
+ " ProfileImageUrl \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 2 \n",
+ " 1 \n",
+ " 59.0 \n",
+ " 2010-07-19 19:12:57 \n",
+ " 22 \n",
+ " 8198.0 \n",
+ " <p>In many different statistical methods there... \n",
+ " 24.0 \n",
+ " 2012-11-12 09:21:54 \n",
+ " What is normality? \n",
+ " ... \n",
+ " 2013-11-12 22:07:23 \n",
+ " http://stackoverflow.com \n",
+ " Corvallis, OR \n",
+ " <p>Developer on the StackOverflow team. Find ... \n",
+ " 25 \n",
+ " 3 \n",
+ " 0 \n",
+ " 2 \n",
+ " 37.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 3 \n",
+ " 1 \n",
+ " 5.0 \n",
+ " 2010-07-19 19:13:28 \n",
+ " 54 \n",
+ " 3613.0 \n",
+ " <p>What are some valuable Statistical Analysis... \n",
+ " 18.0 \n",
+ " 2013-05-27 14:48:36 \n",
+ " What are some valuable Statistical Analysis op... \n",
+ " ... \n",
+ " 2014-08-08 06:42:58 \n",
+ " http://stackoverflow.com \n",
+ " New York, NY \n",
+ " <p><a href=\"http://blog.stackoverflow.com/2009... \n",
+ " 22 \n",
+ " 19 \n",
+ " 0 \n",
+ " 3 \n",
+ " 35.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4 \n",
+ " 1 \n",
+ " 135.0 \n",
+ " 2010-07-19 19:13:31 \n",
+ " 13 \n",
+ " 5224.0 \n",
+ " <p>I have two groups of data. Each with a dif... \n",
+ " 23.0 \n",
+ " 2010-09-08 03:00:19 \n",
+ " Assessing the significance of differences in d... \n",
+ " ... \n",
+ " 2014-01-02 09:31:02 \n",
+ " http://minesweeperonline.com \n",
+ " San Francisco, CA \n",
+ " <p>currently at a startup in SF</p>\\r\\n\\r\\n<p>... \n",
+ " 11 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1998 \n",
+ " 28.0 \n",
+ " http://i.stack.imgur.com/d1oHX.jpg \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 2 \n",
+ " 0.0 \n",
+ " 2010-07-19 19:14:43 \n",
+ " 81 \n",
+ " 0.0 \n",
+ " <p>The R-project</p>\\n\\n<p><a href=\"http://www... \n",
+ " 23.0 \n",
+ " 2010-07-19 19:21:15 \n",
+ " 0 \n",
+ " ... \n",
+ " 2014-08-13 00:23:47 \n",
+ " http://www.statalgo.com \n",
+ " New York, NY \n",
+ " <p>Quantitative researcher focusing on statist... \n",
+ " 1145 \n",
+ " 662 \n",
+ " 5 \n",
+ " 54503 \n",
+ " 35.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 6 \n",
+ " 1 \n",
+ " 0.0 \n",
+ " 2010-07-19 19:14:44 \n",
+ " 152 \n",
+ " 29229.0 \n",
+ " <p>Last year, I read a blog post from <a href=... \n",
+ " 5.0 \n",
+ " 2014-05-29 03:54:31 \n",
+ " The Two Cultures: statistics vs. machine learn... \n",
+ " ... \n",
+ " 2014-08-07 19:49:44 \n",
+ " http://www.harlan.harris.name \n",
+ " District of Columbia \n",
+ " <ul>\\r\\n<li>PhD in CS/AI/Machine Learning/Cogn... \n",
+ " 114 \n",
+ " 47 \n",
+ " 0 \n",
+ " 46050 \n",
+ " 41.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 28536 \n",
+ " 48321 \n",
+ " 2 \n",
+ " 0.0 \n",
+ " 2013-01-23 09:00:01 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " <p>you can use the matlab codes for svm and co... \n",
+ " 19966.0 \n",
+ " 2013-01-23 09:00:01 \n",
+ " 0 \n",
+ " ... \n",
+ " 2014-07-15 14:53:00 \n",
+ " 0 \n",
+ " Kharagpur, India \n",
+ " <p>I am a Research scholar in IIT kharagpur. I... \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 2272095 \n",
+ " 27.0 \n",
+ " https://www.gravatar.com/avatar/cb64ec23c43128... \n",
+ " \n",
+ " \n",
+ " 28537 \n",
+ " 48322 \n",
+ " 2 \n",
+ " 0.0 \n",
+ " 2013-01-23 09:09:34 \n",
+ " 3 \n",
+ " 0.0 \n",
+ " <p>I use <a href=\"http://www.gnu.org/software/... \n",
+ " 892.0 \n",
+ " 2013-01-23 13:13:30 \n",
+ " 0 \n",
+ " ... \n",
+ " 2014-07-05 05:27:26 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 4609206 \n",
+ " 0.0 \n",
+ " https://www.gravatar.com/avatar/?s=128&d=ident... \n",
+ " \n",
+ " \n",
+ " 28538 \n",
+ " 48323 \n",
+ " 2 \n",
+ " 0.0 \n",
+ " 2013-01-23 09:16:44 \n",
+ " 1 \n",
+ " 0.0 \n",
+ " <p>If I understand your question correctly, yo... \n",
+ " 2020.0 \n",
+ " 2013-01-23 09:16:44 \n",
+ " 0 \n",
+ " ... \n",
+ " 2014-07-13 14:47:33 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 4524448 \n",
+ " 0.0 \n",
+ " https://www.gravatar.com/avatar/d292896af36243... \n",
+ " \n",
+ " \n",
+ " 28539 \n",
+ " 48324 \n",
+ " 2 \n",
+ " 0.0 \n",
+ " 2013-01-23 09:36:07 \n",
+ " 3 \n",
+ " 0.0 \n",
+ " <p>Doesn't really help you with your question,... \n",
+ " 19914.0 \n",
+ " 2013-01-23 09:36:07 \n",
+ " 0 \n",
+ " ... \n",
+ " 2014-06-26 07:56:53 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 4609233 \n",
+ " 0.0 \n",
+ " https://www.gravatar.com/avatar/80f9e492c63a22... \n",
+ " \n",
+ " \n",
+ " \n",
+ "28540 \n",
+ " 48325 \n",
+ " 1 \n",
+ " 0.0 \n",
+ " 2013-01-23 09:44:07 \n",
+ " -1 \n",
+ " 116.0 \n",
+ " <p>I have 10 vectors each having 100,000 point... \n",
+ " 19968.0 \n",
+ " 2013-02-22 11:23:54 \n",
+ " are data sets obtained from a Normal distribut... \n",
+ " ... \n",
+ " 2014-06-27 14:00:15 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 9 \n",
+ " 0 \n",
+ " 0 \n",
+ " 2766774 \n",
+ " 0.0 \n",
+ " https://www.gravatar.com/avatar/fe4cdbbd14f7ce... \n",
+ " \\r\\n
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " postId \n",
+ " PostTypeId \n",
+ " AcceptedAnswerId \n",
+ " CreaionDate \n",
+ " Score \n",
+ " ViewCount \n",
+ " Body \n",
+ " userId_x \n",
+ " LasActivityDate \n",
+ " Title \n",
+ " ... \n",
+ " LastAccessDate \n",
+ " WebsiteUrl \n",
+ " Location \n",
+ " AboutMe \n",
+ " Views \n",
+ " UpVotes \n",
+ " DownVotes \n",
+ " AccountId \n",
+ " Age \n",
+ " ProfileImageUrl \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 2 \n",
+ " 1 \n",
+ " 59.0 \n",
+ " 2010-07-19 19:12:57 \n",
+ " 22 \n",
+ " 8198.0 \n",
+ " <p>In many different statistical methods there... \n",
+ " 24.0 \n",
+ " 2012-11-12 09:21:54 \n",
+ " What is normality? \n",
+ " ... \n",
+ " 2013-11-12 22:07:23 \n",
+ " http://stackoverflow.com \n",
+ " Corvallis, OR \n",
+ " <p>Developer on the StackOverflow team. Find ... \n",
+ " 25 \n",
+ " 3 \n",
+ " 0 \n",
+ " 2 \n",
+ " 37.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 3 \n",
+ " 1 \n",
+ " 5.0 \n",
+ " 2010-07-19 19:13:28 \n",
+ " 54 \n",
+ " 3613.0 \n",
+ " <p>What are some valuable Statistical Analysis... \n",
+ " 18.0 \n",
+ " 2013-05-27 14:48:36 \n",
+ " What are some valuable Statistical Analysis op... \n",
+ " ... \n",
+ " 2014-08-08 06:42:58 \n",
+ " http://stackoverflow.com \n",
+ " New York, NY \n",
+ " <p><a href=\"http://blog.stackoverflow.com/2009... \n",
+ " 22 \n",
+ " 19 \n",
+ " 0 \n",
+ " 3 \n",
+ " 35.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4 \n",
+ " 1 \n",
+ " 135.0 \n",
+ " 2010-07-19 19:13:31 \n",
+ " 13 \n",
+ " 5224.0 \n",
+ " <p>I have two groups of data. Each with a dif... \n",
+ " 23.0 \n",
+ " 2010-09-08 03:00:19 \n",
+ " Assessing the significance of differences in d... \n",
+ " ... \n",
+ " 2014-01-02 09:31:02 \n",
+ " http://minesweeperonline.com \n",
+ " San Francisco, CA \n",
+ " <p>currently at a startup in SF</p>\\r\\n\\r\\n<p>... \n",
+ " 11 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1998 \n",
+ " 28.0 \n",
+ " http://i.stack.imgur.com/d1oHX.jpg \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 5 \n",
+ " 2 \n",
+ " NaN \n",
+ " 2010-07-19 19:14:43 \n",
+ " 81 \n",
+ " NaN \n",
+ " <p>The R-project</p>\\n\\n<p><a href=\"http://www... \n",
+ " 23.0 \n",
+ " 2010-07-19 19:21:15 \n",
+ " NaN \n",
+ " ... \n",
+ " 2014-08-13 00:23:47 \n",
+ " http://www.statalgo.com \n",
+ " New York, NY \n",
+ " <p>Quantitative researcher focusing on statist... \n",
+ " 1145 \n",
+ " 662 \n",
+ " 5 \n",
+ " 54503 \n",
+ " 35.0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " \n",
+ "4 \n",
+ " 6 \n",
+ " 1 \n",
+ " NaN \n",
+ " 2010-07-19 19:14:44 \n",
+ " 152 \n",
+ " 29229.0 \n",
+ " <p>Last year, I read a blog post from <a href=... \n",
+ " 5.0 \n",
+ " 2014-05-29 03:54:31 \n",
+ " The Two Cultures: statistics vs. machine learn... \n",
+ " ... \n",
+ " 2014-08-07 19:49:44 \n",
+ " http://www.harlan.harris.name \n",
+ " District of Columbia \n",
+ " <ul>\\r\\n<li>PhD in CS/AI/Machine Learning/Cogn... \n",
+ " 114 \n",
+ " 47 \n",
+ " 0 \n",
+ " 46050 \n",
+ " 41.0 \n",
+ " NaN \n",
+ "