From a6a7320d7d23120f4a1a4de2ce9fc027036b0c6a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gon=C3=A7alo=20Nobre?= <66629614+Goncalo-Nobre@users.noreply.github.com> Date: Sun, 2 Aug 2020 18:42:28 +0100 Subject: [PATCH 1/2] =?UTF-8?q?[lab-reading-stats-concepts]=20Gon=C3=A7alo?= =?UTF-8?q?=20Nobre?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- lab-reading-stats-concepts.ipynb | 72 ++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 lab-reading-stats-concepts.ipynb diff --git a/lab-reading-stats-concepts.ipynb b/lab-reading-stats-concepts.ipynb new file mode 100644 index 0000000..8e216fa --- /dev/null +++ b/lab-reading-stats-concepts.ipynb @@ -0,0 +1,72 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Challenge 1: What is the difference between expected value and mean?\n", + "\n", + "Expected value is the average value of a random variable over a large number of experiments. We can calculate expected value for a discrete random variable — one in which the number of potential outcomes is countable — by taking a sum in which each term is a possible value of the random variable multiplied by the probability of that outcome.\n", + "\n", + "The mean is the average of a set of numbers. To find the mean of a data set, add up all of the numbers in the set, and then divide that total by the number of numbers in the set.\n", + "\n", + "A practical approach results in a frequency distribution and a mean value; a theoretical approach results in a probability distribution and an expected value.\n", + "If the sample space is infinitely large, the mean value should approach the expected value -> every time!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Challenge 2: What is the \"problem\" in science with p-values?\n", + "\n", + "Researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome)1. Nor do statistically significant results ‘prove’ some other hypothesis.\n", + "\n", + "Like mentioned in the article: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not.\n", + "\n", + "The problem in science is that bucketing results into ‘statistically significant’ and ‘statistically non-significant’ makes people think that the items assigned in that way are categorically different -> which it can or cannot be the case!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Challenge 3: Applying testing to a specific case: A/B testing.\n", + "\n", + "Netflix example:\n", + "\n", + "Netflix knows that if you don’t capture a member’s attention within 90 seconds, that member will likely lose interest and move onto another activity.\n", + "Through various studies, they found that the members look at the artwork first and then decide whether to look at additional details, and they wanted to capitalize on this.\n", + "Broadly, Netflix’s A/B testing philosophy is about building incrementally, using data to drive decisions, and failing fast.\n", + "\n", + "They experimented using a movie -> 'The Short Game' measured the engagement with the title for each variant — click through rate, aggregate play duration, fraction of plays with short duration, fraction of content viewed (how far did you get through a movie or series), etc, changing the image from user to user,\n", + "\n", + "Then, they expanded to a two way multi-cell explore-exploit test, where they measured the engagement of each user artwork for a set of titles -> \"explore\".\n", + "Finally they went to \"exploit\": test served the most engaging artwork (from explore test) for future users and see if we can improve aggregate streaming hours.\n", + "\n", + "In my opinion this is very useful since we can A/B test everything, at every time, getting real data to help us making the decisions!\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 079de87a0b70096b6aa6768a328824d3a92cbf34 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gon=C3=A7alo=20Nobre?= <66629614+Goncalo-Nobre@users.noreply.github.com> Date: Sun, 2 Aug 2020 18:43:56 +0100 Subject: [PATCH 2/2] Create lab-reading-stats-concepts-checkpoint.ipynb --- .../lab-reading-stats-concepts-checkpoint.ipynb | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 .ipynb_checkpoints/lab-reading-stats-concepts-checkpoint.ipynb diff --git a/.ipynb_checkpoints/lab-reading-stats-concepts-checkpoint.ipynb b/.ipynb_checkpoints/lab-reading-stats-concepts-checkpoint.ipynb new file mode 100644 index 0000000..7fec515 --- /dev/null +++ b/.ipynb_checkpoints/lab-reading-stats-concepts-checkpoint.ipynb @@ -0,0 +1,6 @@ +{ + "cells": [], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 4 +}