-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.html
More file actions
executable file
·1935 lines (1732 loc) · 115 KB
/
index.html
File metadata and controls
executable file
·1935 lines (1732 loc) · 115 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="description" content="">
<meta name="author" content="">
<title>muoten.github.io: Data (&) Science (&) ...</title>
<!-- Bootstrap core CSS -->
<link href="vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">
<!-- Custom fonts for this template -->
<link href="https://fonts.googleapis.com/css?family=Saira+Extra+Condensed:500,700" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Muli:400,400i,800,800i" rel="stylesheet">
<link href="vendor/fontawesome-free/css/all.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="css/resume.min.css" rel="stylesheet">
</head>
<body id="page-top">
<nav class="navbar navbar-expand-lg navbar-dark bg-primary fixed-top" id="sideNav">
<a class="navbar-brand js-scroll-trigger" href="#page-top">
<span class="d-block d-lg-none">Data (&) Science (&)...</span>
<span class="d-none d-lg-block">
<img class="img-fluid img-profile rounded-circle mx-auto mb-2" src="img/profile.png" alt="Profile picture or avatar">
</span>
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav">
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#about">About</a>
</li>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#uncool">Uncool Data Science:</a>
</li>
<li>
<ul>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#causal-inference">Causal inference</a>
</li>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#linear-models">Linear models</a>
</li>
</ul>
</li>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#covid19">COVID-19:</a>
</li>
<li>
<ul>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#covid19-estimates">Context, analysis & simulations</a>
</li>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#covid19-data">References</a>
</li>
</ul>
<ul class="navbar-nav">
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#others">Other topics:</a>
</li>
<li>
<ul>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#audio2vec">Music Embeddings & Fingerprints</a>
</li>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#computervision">Computer Vision</a>
</li>
<li class="nav-item">
<a class="nav-link js-scroll-trigger" href="#hacks">Hackathons/Demos</a>
</li>
</ul>
</li>
</ul>
</ul>
</div>
</nav>
<div class="container-fluid p-0">
<section class="resume-section p-3 p-lg-5 d-flex align-items-center" id="about">
<div class="w-100">
<h4 style="font-family:'Arial Narrow';margin-bottom: 1.5rem">Data <small>(&)</small> Science <small>(&)</small> ...
</h4>
<p class="lead mb-1">personal projects and recommended links, considering that...
</p>
<ul>
<li>
<i>"essentially all models are wrong, but some are useful".</i> <a href="https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1740-9713.2010.00442.x">Box G.</a> et al.
</li>
<li>
<i>"explicit better than implicit". <a href="https://www.python.org/dev/peps/pep-0020/">The Zen of Python</a> et al.
(with <a href="https://resources.bibblio.org/hubfs/share/2018-01-24-RecSysLDN-Ravelin.pdf">
exceptions</a>).
</i>
</li>
</ul>
<p class="lead mb-5">I expect models being wrong, but their assumptions explicit.
</p>
<p><i>Last update</i>: <a href="#audio2vec">Music embeddings & fingerprints</a><sup><small>(15/09/2024)</small></sup></p>
<p>And previously...</p>
<ul>
<li><a href="#causal-discovery-2">Causal wars: assumptions strike back</a><!--<sup><small>(17/06/2021)</small></sup>--></li>
<li><a href="#linear-models">Regression to linear regression</a><!--<sup><small>(2/08/2020)</small></sup>--></li>
<li><a href="#others">...</a></li>
</ul>
<!--<p>Moreover <a href="#covid19">COVID-19 sections</a>, with:
</p>
<ul>
<li>
Context on
<a href="#covid19-estimates-cases">confirmed cases vs estimates</a><sup><small>(17/05/2020)</small></sup>, <a href="#covid19-estimates-deaths">mortality</a><i><sup><small>(10/04/2020)</small></sup></i>
</li>
<li><i>
<a href="#covid19-seroprevalence-bias">seroprevalence bias analysis</a><sup><small>(21/05/2020)</small></sup>,
<a href="#covid19-data">data sources</a><sup><small>(21/05/2020)</small></sup>
</i>
</li>
</ul>
-->
<p>Feedback welcome!
</p>
<div class="social-icons">
<a href="https://www.linkedin.com/in/enrique-otero-muras-0aab1a133">
<i class="fab fa-linkedin-in"></i>
</a>
<a href="https://github.com/muoten">
<i class="fab fa-github"></i>
</a>
<a href="https://twitter.com/eoteromuras">
<i class="fab fa-twitter"></i>
</a>
</div>
<p><br/>
<i><small>Content layout based on
<a href="https://github.com/BlackrockDigital/startbootstrap-resume">startbootstrap-resume</a>.
Custom charts generated with <a href="https://plotly.com/python/plotly-express/">Plotly Express</a>.
</small>
</i>
</p>
</div>
</section>
<section class="resume-section p-3 p-lg-5 d-flex align-items-center" id="uncool">
<div class="w-100">
<div class="resume-item d-flex flex-column flex-md-row justify-content-between mb-5">
<div class="resume-content">
<h3 class="mb-3">Uncool Data Science</h3>
<div id="causal-inference">
<div id="causal-discovery-2" class="subheading mb-3">Causal Wars: episode II (or V?) - Assumptions Strike Back
</div>
<h6>(Written 13/12/2020, updated 17/06/2021. 10 min read)</h6>
<p>After my first (somewhat disappointing) adventures in automatic
<a href="#causal-discovery">causal discovery with CDT package</a>, I go back to the topic with new experiments. Trying to get a better understanding of the following:
</p>
<ul>
<li>
the theoretical and practical <strong>limits</strong> of <strong>automatic causal discovery</strong> (also known as <i>structure learning</i>).
</li>
<li>
its relation with main <strong>assumptions</strong> required to estimate causal effects from observational data, with and without <strong>graphical models</strong>
</li>
<li>
and the involved <strong>methods</strong> and specifics for the <strong>bivariate example</strong>.
</li>
</ul>
<p>As there is some consensus on the need of assumptions to infer causality based on observational data,
things get tricky about the details.
Even researchers from different fields have diverse opinions on the preferible
ways to express the requirements and limitations of the different methodologies: languages, notations, diagrams...
For instance, Lauritzen on <a href="https://onlinelibrary.wiley.com/doi/10.1111/j.1467-9469.2004.03-200A.x">Discussion in Causality</a>
identifies up to 4 different formalisms or <i>causal languages</i>:
structural equations, graphical models, counterfactual random variables and potential outcomes
</p>
<p>It is also in question if it is possible to infer causality based on data and some automated algorithm. According to Judea Pearl:
</p>
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<i>"[...] machine learning systems, based
only on associations, are prevented
from reasoning about (novel) actions,
experiments, and causal explanations."</i>
</blockquote>
<p>
Pearl postulates a
<strong>three-level hierarchy</strong>, or "ladder of causation", where <i>"questions at level i (i =
1, 2, 3) can be answered only if information from level j (j>=i) is available".</i>
</p>
<figure>
<img src="img/7_tools_causal_inference_preprint.png" alt="7 tools of causal inference" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://ftp.cs.ucla.edu/pub/stat_ser/r481.pdf">
The Seven Tools of Causal Inference, with Reflections on Machine Learning, by Judea Pearl
</a>
</small>
</figcaption>
</figure>
<p>
The logical consequence of Pearl's postulates is that automatic causal discovery based on observational data is not feasible in general.
In other words, as Peters et al state in <a href="https://library.oapen.org/bitstream/handle/20.500.12657/26040/11283.pdf">
Elements of Causal Inference</a>: "there is an ill-posed-ness due to the fact that
even complete knowledge of an observational distribution usually does not determine the underlying causal model".
Or according to <a href="https://arxiv.org/abs/1904.02826">Maclaren's et al</a>, causal estimators may be unstable, and
"lack of stability implies that [...] an achievable statistical estimation target may prove impossible".
But, <i>could automatic causal discovery be epistemically impossible in theory but useful in practice under any circumstances?</i>
In other words, is it possible to infer causality based on observations, an algorithm and some <i>soft</i> assumptions
that are not explicitly predefining a causal model?
</p>
<div class="subheading mb-2">May the graphical models be with you
</div>
<p>
To answer properly the question we should, first of all, disambiguate the different meanings
and goals of "causal inference", and even what a (causal) "model" is.
Language ambiguity is a source of misunderstandings, even between the most brilliant people. As an example,
the reply of Andrew Gelman to Pearl's controversial statements in the Book of Why:
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<i>"Pearl and Mackenzie write that statistics “became a model-blind data reduction enterprise.”
Hey! What the hell are you talking about?? I’m a statistician, I’ve been doing statistics for 30 years,
working in areas ranging from politics to toxicology. “Model-blind data reduction”? That’s just bullshit.
We use models all the time"</i>
</blockquote>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">“The Book of Why” by Pearl and Mackenzie <a href="https://t.co/hEMzlDa4wU">https://t.co/hEMzlDa4wU</a></p>— Andrew Gelman (@StatModeling) <a href="https://twitter.com/StatModeling/status/1082639137780498432?ref_src=twsrc%5Etfw">January 8, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js"></script>
<figure>
<img src="img/sabres.gif" alt="the empire strikes back gif" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="http://unfunnynerdtangent.com/2018/10/40-for-40-16-the-empire-strikes-back/">unfunnynerdtangent.com
</a>
</small>
</figcaption>
</figure>
<p>It's probably not obvious that <strong>"models"</strong> referred by Pearl are "causal models",
preferible <strong>causal diagrams</strong>,
or any type of model that <strong>encodes causal information</strong> in a transparent and testable way.
As <i><a href="https://fabiandablander.com/r/Causal-Inference.html#structural-causal-models">
Structured Causal Models</a></i> (SCMs) also do via assignment functions.
</p>
<p>
There are several types of probabilistic graphical models that express sets of conditional independence assumptions via graph structure, including Directed Acyclic Graphs (DAGs)
known as <i>bayesian networks</i>.
DAGs can describe every conditional dependence and independence between the represented variables
if <a href="https://library.oapen.org/bitstream/handle/20.500.12657/26040/11283.pdf#page=118">Markov and Faithfulness</a> hipothesis hold.
And benefit from expressiveness and <a href="https://arxiv.org/pdf/1304.1505.pdf">d-separation</a>, a criteria that involves checking
whether a set of vertices Z blocks all connections of a certain type between X and Y in a graph G,
and reduces statistical independencies to connectivity in graphs.
</p>
<p>
But we still need stronger assumptions beyond Markov condition, faithfulness and d-separation to have a <a href="https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2021/03/ciwhatif_hernanrobins_30mar21.pdf#page=79">
causal DAG</a>. Whereas (probabilistic) DAGs entail observational distributions, causal DAGs entail interventional distributions.
In <a href="https://www.cmu.edu/dietrich/philosophy/docs/scheines/introtocausalinference.pdf">Scheines's</a> words:
</p>
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<i>"one might take a further step by assuming that when DAGs are interpreted causally the Markov condition and d-separation are in fact the correct connection between causal structure and probabilistic independence.
We call the latter assumption the Causal Markov condition"
</i>
</blockquote>
<p>
Moreover, the term "causal inference" can (ambiguously) refer both to <strong>causal learning</strong> or
<strong>causal reasoning</strong>.
The former being the inference of a causal model from observations or interventions. And the latter,
estimating outcomes or effects (at individual or population level) based on a predefined causal model.
</p>
<figure>
<img src="img/elements_of_causal_inference_terminology.png" alt="Terminology on Elements of Causal Inference (Peters et al.)" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://library.oapen.org/bitstream/handle/20.500.12657/26040/11283.pdf">Elements of Causal Inference (Peters et al.)
</a>
</small>
</figcaption>
</figure>
<div class="subheading mb-2">identifiability criteria for causal effects from observations
</div>
<p>In relation to causal reasoning, an effect is identifiable if it can be estimated from data, given a set of assumptions.
In case of Average Treatment Effect (ATE), also known as Average Causal Effect (ACE),
the required assumptions for identifiability, to my understanding are:</p>
<ul>
<li>
Conditional <strong>exchangeability</strong>. <i>Maybe the hypothesis with more possible enunciates and ambiguity I found.
I recognize myself still unable to identify subtle differences between the following terms: ignorability, unconfoundness, selection on observables,
conditional independence assumption (CIA), exogeneity, causal sufficiency... are all them exchangeable?
Thanks to Miguel Hernán et al's <a href="https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2021/03/ciwhatif_hernanrobins_30mar21.pdf#page=110">
Causal Inference Book
</a>
for shedding some light on this Tower of Babel full of synonyms and meronyms:</i>
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<i>"[...] conditional exchangeability [...] often referred as
“weak ignorability” or “ignorable
treatment assignment” in statistics (Rosenbaum and Rubin, 1983),
“selection on observables” in the
social sciences (Barnow et al.,
1980), and no “ommitted variable
bias” or “exogeneity” in econometrics (Imbens, 2004)"
</i>
</blockquote>
Or these words from <a href="https://clas.ucdenver.edu/marcelo-perraillon/sites/default/files/attached-files/w2_causal_inference_perraillon_0.pdf">
Marcelo Perraillon's lectures</a>:
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<i>"Jargon, jargon, jargon: This assumption comes in many names, the most
common perhaps is no unmeasured confounders. Other names: selection
on observables, exogeneity, conditional independence assumption (CIA),
ignorability"
</i>
</blockquote>
</li>
</ul>
<ul>
<li>
<strong>Consistency</strong> of treatment. And <strong>no interference</strong>.
Combination of consistency and no interference is also known as
<i>Stable Unit Treatment Value Assumption (SUTVA)</i> in (deterministic) <a href="http://www.stat.columbia.edu/~cook/qr33.pdf">
potential outcomes</a>. See Neal's
<a href="https://www.bradyneal.com/Introduction_to_Causal_Inference-Aug28_2020-Neal.pdf#page=20">Introduction to Causal Inference</a>.
</li>
<li>
<strong>Positivity</strong> (a.k.a. <i>overlap</i> a.k.a. <i>common support</i>). Or existence of all posible values of treatment and outcome.
</li>
</ul>
<p>
Given the previous assumptions hold, several identifiability methods can be used
to convert our causal estimand in a statistical formula that does not contain
any potential outcome or do-operator notation. The most usual is standardization with the
<strong>adjustment formula</strong>, analogously derived as Robin's <i>g-formula</i>,
Spirtes's <i>manipulated distribution</i> or Pearl's <i>truncated factorization</i>, and closely related to <i>backdoor criterion</i>.
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">"backdoor adjustment" is more than a formula. It provides two things: 1) an adjustment formula (RHS) and 2) a license (backdoor condition) to apply it. When we compare this condition to the license provided by the g-formula the key difference between the two shines brightly.</p>— Judea Pearl (@yudapearl) <a href="https://twitter.com/yudapearl/status/1024245586214539264?ref_src=twsrc%5Etfw">July 31, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Another interrelated identifiability technique is the Inverse Probability Weighting (IPW). Both standardization and IPW allow
generalized estimates in the entire population so they are also known as <i>G-methods</i>
(where "G" stands for "generalized"). Whereas matching or restriction only provide estimates for subsets of our population.
</p>
<p>Anyway, we can use these methods without graphical models. As we don't require
<a href="https://fabiandablander.com/r/Causal-Inference.html#counterfactuals">
neither counterfactuals nor SCMs</a>, unless we are
interested on individual effects.
<!--So, as stated by <a href="https://arxiv.org/pdf/1907.07271.pdf">Guido Imbens</a>, "the Potential Outcome
framework is more apprehensive about definite answers to such questions that depend delicately on individual-level heterogeneity".
--> However, it's hard to know if exchangeability holds without the help of causal DAGs and
<i>d-separation</i> to apply the <strong>adjustment criterion</strong> (defined by <a href="https://arxiv.org/abs/1203.3515">Shpitser et al</a> as a generalization of Pearl's <i>backdoor criterion</i>).
Moreover, in Pearl's words:
</p>
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<i>
"it is always possible to replace assumptions made in SCM [Structural Causal Model] with equivalent,
albeit cumbersome assumptions in PO [Potential Outcomes] language,
and eventually come to the correct conclusions." But "rejection of graphs and structural models
leaves investigators with no process-model guidance and, not surprisingly,
it has resulted in a number of blunders which the PO community is not very proud of"
</i>
</blockquote>
<p>
As an example of these "blunders" or difficulties, Rubin, cocreator of Potential Outcomes framework, in relation to the concept of colliders or M-bias,
claimed that <i>"to avoid conditioning on some observed covariates... is [...] nonscientific ad hockery"</i>.
More details on <a href="https://statmodeling.stat.columbia.edu/2020/01/27/causal-inference-in-ai-expressing-potential-outcomes-in-a-graphical-modeling-framework-that-can-be-fit-using-stan/#comment-1242930">
David Rohde's comment</a> on Gelman's blog and Rohde et al article on
<a href="https://gradientinstitute.org/blog/6">Causal Inference with Bayes Rule</a>.
Anyway, as Gelman recognises in the post refered on the previous section, qualitative models (as DAGs) are less usual in statisticians' work, and "we need both".
</p>
<p>To be fair, apparently even <a href="https://csss.uw.edu/files/working-papers/2013/wp128.pdf#page=142">Pearl himself made mistakes</a> in his proposal of unification of DAGs and counterfactuals.
As Richardson's et al remarked when they proposed the <a href="https://csss.uw.edu/files/working-papers/2013/wp128.pdf">
Single World Intervention Graphs (SWIG)</a>, claiming that:
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<i>
We are in strong agreement with Pearl’s basic contention that directed
graphs are a very valuable tool for reasoning about causality, and by extension, potential outcomes. If anything our criticism of the approach to
synthesizing graphs and counterfactuals in (Pearl, 2000, 2009) is that it is
not ‘graphical enough’ [...]
</i>
</blockquote>
<p>Furthermore, there exists several identifiability methods that do not exploit the assumption of conditional exchangeability to handle confounding.
These methods rely on alternative (also unverifiable) assumptions. Examples of these alternative methods are <i>difference-in-differences</i>, <i>instrumental variables</i> or the <i>front-door criterion</i>.
See Hernán's et al <a href="https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2021/03/ciwhatif_hernanrobins_30mar21.pdf#page=105">Causal Inference: What If</a>
for more details.
</p>
<div class="subheading mb-2">Identifiability of causal structure
</div>
<p>Otherwise, Pearl, apparent denier of Machine Learning to perform causal reasoning, also opens the possibility to
(automatic) <strong>causal discovery</strong>. In his <a href="https://ftp.cs.ucla.edu/pub/stat_ser/r481-reprint.pdf">
7 Tools of Causal Inference</a> he proposes
"systematic searches" under "certain circumstances" and "mild assumptions". Besides, he worked years ago with T.S. Verma
in the <a href="https://arxiv.org/pdf/1303.5435.pdf">
Inductive Causation (IC) algorithm</a> of <strong>structure learning</strong>.
And he also refers to the method by Shimizu et al (2006) to discover causal directionality
based on functional decomposition in linear model with nonGaussian distributions. A method known as
<a href="https://www.jmlr.org/papers/volume7/shimizu06a/shimizu06a.pdf">
LiNGAM: linear non-Gaussian acyclic model for causal discovery.
</a>
</p>
<p>So... let's do it!
</p>
<figure>
<img src="img/yoda.gif" alt="Yoda: do or do not. There is no try" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://tenor.com/es/ver/yoda-do-it-or-not-man-up-your-pick-star-wars-gif-7566902/">tenor.com
</a>
</small>
</figcaption>
</figure>
<p>
I focused on 2 methods for the bivariate case:
</p>
<ul>
<li><strong>Conditional Similarity Distribution</strong>: as it seemed the best candidate from an empirical point of view.
Because it performed better than the alternatives in my <a href="#causal-discovery">first quick experiment</a>.
Furthermore is was also part of the <a href="https://arxiv.org/pdf/1601.06680.pdf">Jarfo model</a> that scored 2nd place in the ChaLearn cause-effect Kaggle challenge in 2013.
This method assumes that "the shape of the conditional distribution <math><mtext>p(Y(X=x))</mtext></math>
tends to be very similar for different values of <math><mi>x</mi></math> if the random variable <math><mi>X</mi></math>
is the cause of <math><mi>Y</mi></math>".
It's related to the principles of <i>(physical) independence of cause and mechanism</i> (ICM) and
<i>algorithmic independence of conditionals</i>.
The last one states that the joint distribution has a shorter description in the true causal direction
than in the anticausal. In the spirit of Occam's razor. Unfortunately, Kolmogorov complexity is uncomputable,
so the Minimum Description Length in the sense of Kolmogorov "should be considered a philosophical principle
rather than a method". <i>See <a href="https://library.oapen.org/bitstream/handle/20.500.12657/26040/11283.pdf?sequence=1&isAllowed=y">
Elements of Causal Inference</a></i>.
</li>
<li><strong>Post-Nonlinear Causal Model</strong>: as it seems the more solid from an analytical point of view.
It even tries to explain the 5 circumstances in which their assumptions don't hold and it would fail.
Published by <a href="https://arxiv.org/pdf/1205.2599.pdf">Zhang et al in 2009</a>, it can be considered as a generalization of the so-called
<a href="https://papers.nips.cc/paper/2008/file/f7664060cc52bc6f3d620bcedc94a4b6-Paper.pdf">
additive noise model (ANM)</a> by Hoyer et al. Both are based on <i>a priori restrictions of the model class</i>.
For instance, assuming functions and probability densities three times differentiable.
</li>
</ul>
<p>
I used the <a href="https://webdav.tuebingen.mpg.de/cause-effect/">Tubingen cause-effect pairs dataset</a>.
As part of their 108 pairwise examples, it contains also relationship between Gross National Income
per capita and life expectancy.
</p>
<figure>
<img src="img/cds_score_example_0.png" alt="Gross National Income vs life expectancy" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://github.com/muoten/causal-discovery-playground/blob/master/automatic_pairwise_causal_discovery_via_cds.ipynb">muoten.github.io
</a>
</small>
</figcaption>
</figure>
<p>Results:
</p>
<ul>
<li>
<a href="https://github.com/muoten/causal-discovery-playground/blob/master/automatic_pairwise_causal_discovery_via_cds.ipynb">Notebook 1: experiments on Conditional Similarity Distribution</a>
</li>
<li>
<a href="https://github.com/muoten/causal-discovery-playground/blob/master/automatic_pairwise_causal_discovery_via_pnl.ipynb">Notebook 2 with experiments on Post-Nonlinear Causal Model</a>
</li>
</ul>
<p>To be continued...
</p>
<figure>
<img src="img/cds_score_example_1.png" alt="conditional distribution similarity score" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://github.com/muoten/causal-discovery-playground/blob/master/automatic_pairwise_causal_discovery_via_cds.ipynb">muoten.github.io
</a>
</small>
</figcaption>
</figure>
<figure>
<img src="img/cds_score_example_2.png" alt="conditional distribution similarity score" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://github.com/muoten/causal-discovery-playground/blob/master/automatic_pairwise_causal_discovery_via_cds.ipynb">muoten.github.io
</a>
</small>
</figcaption>
</figure>
<br/>
<h6>References</h6>
<ul>
<li>Pearl, J. (<a href="https://ftp.cs.ucla.edu/pub/stat_ser/r481.pdf">2018</a>,
<a href="https://dl.acm.org/doi/pdf/10.1145/3241036">2019</a>).
The seven tools of causal inference, with reflections on machine learning. Commun. ACM, 62(3), 54-60.
</li>
<li>
Peters, J., Janzing, D., & Schölkopf, B.
(<a href="https://library.oapen.org/bitstream/handle/20.500.12657/26040/11283.pdf">2017</a>). Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.
</li>
<li>
Maclaren, O.J., Nicholson, R. (<a href="https://arxiv.org/abs/1904.02826">2019</a>).
What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems.
<i>arXiv preprint</i>
</li>
<li>Dablander, F.
(<a href="https://psyarxiv.com/b3fkw">2019</a>). An Introduction to Causal Inference. <i>PsyArXiv preprint</i>
</li>
<li>Geiger, D., Verma, T.S., Pearl, J.
(1989, <a href="https://arxiv.org/abs/1304.1505">2013</a>).
d-Separation: From Theorems to Algorithms.
UAI '89: Proceedings of the Fifth Annual Conference on Uncertainty in Artificial Intelligence. Pages 139-148
</li>
<li>Hernán, M., & Robins, J. (2020, <a href="https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2021/03/ciwhatif_hernanrobins_30mar21.pdf">2021</a>).
Causal inference: What if. Boca Raton: Chapman&Hill/CRC.
</li>
<li>Scheines, R. (<a href="https://www.cmu.edu/dietrich/philosophy/docs/scheines/introtocausalinference.pdf">1997</a>). An Introduction to Causal Inference. Causality in Crisis? University of Notre Dame Press. Pages 185-200
</li>
<li>
Coca-Perraillon, M. (<a href="https://clas.ucdenver.edu/marcelo-perraillon/content/hsr-week-2-causal-inference">2021</a>). Lectures on Causal Inference at University of Colorado Denver
</li>
<li>Rubin, D.B. (<a href="https://stat.columbia.edu/~cook/qr33.pdf">2003</a>).
Basic concepts of statistical inference for causal effects in experiments and observational studies.
Course material in Quantitative Reasoning.
</li>
<li>Neal, B. (<a href="https://www.bradyneal.com/Introduction_to_Causal_Inference-Aug28_2020-Neal.pdf">2020</a>.
Introduction to Causal Inference from a Machine Learning Perspective.
Course Lecture Notes
</li>
<li>Imbens, G.W. (<a href="https://arxiv.org/abs/1907.07271">2019</a>, 2020). Potential Outcome and Directed Acyclic Graph
Approaches to Causality: Relevance for
Empirical Practice in Economics. Journal of Economic Literature
</li>
<li>Shpitser I., VanderWeele, T., Robins, J.M. (<a href="https://arxiv.org/abs/1203.3515">2012</a>). On the Validity of Covariate Adjustment for Estimating Causal Effects
Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010).
</li>
<li>Lattimore, F., Rohde, D.
(<a href="https://medium.com/gradient-institute/causal-inference-with-bayes-rule-eed8ae45fb2e">2019</a>).
Causal Inference with Bayes Rule. <i>Post on medium.com</i>
</li>
<li>Richardson, T.S., Robins, J.M.
(<a href="https://csss.uw.edu/files/working-papers/2013/wp128.pdf">2013)
Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical. Approaches to Causality
</li>
<li>Verma, T., Pearl, J. (1992, <a href="https://arxiv.org/abs/1303.5435">2013</a>). An Algorithm for Deciding if a Set of Observed Independencies
Has a Causal Explanation. Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence.
</li>
<li>Shimizu, S., Hoyer, P.O., Hyvärinen, A., Kerminen, A.
(<a href="https://www.jmlr.org/papers/volume7/shimizu06a/shimizu06a.pdf">2006</a>).
A Linear Non-Gaussian Acyclic Model for Causal Discovery. Journal of Machine Learning Research
</li>
<li>Fonollosa, J.A, (<a href="https://arxiv.org/abs/1601.06680">2016</a>, 2019). Conditional Distribution Variability Measures for Causality Detection. Cause Effect Pairs in Machine Learning. The Springer Series on Challenges in Machine Learning
</li>
<li>Zhang, K., Hyvärinen, A.
(2009, <a href="https://arxiv.org/abs/1205.2599">2012</a>).
On the Identifiability of the Post-Nonlinear Causal Model. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)
</li>
<li>
Hoyer, P.O., Janzig, D., Mooij, J., Peters, J., Schölkopf, B.
(<a href="https://papers.nips.cc/paper/2008/file/f7664060cc52bc6f3d620bcedc94a4b6-Paper.pdf">2008</a>).
Nonlinear causal discovery with additive noise models. Proceedings of the 21st International Conference on Neural Information Processing SystemsDecember 2008 Pages 689–696
</li>
</ul>
<hr/>
<div id="causal-discovery" class="subheading mb-3">No free lunch in automatic causal discovery</div>
<h6>(Written 20/9/2020, updated 22/10/2020)</h6>
<p>
After reading
<a href="http://is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/ICML2009-Mooij_[0].pdf">
"Regression by dependence minimization and its application to causal inference in additive noise models"</a>
by Mooij et al, even regression models seem more <i>cool</i>.
Though this kind of regression is not exactly our old well known <a href="#linear-models">linear regression</a>, it's based on similar principles.
</p>
<p>Summarizing this approach, they combine regression methods and <strong>independence tests</strong>.
Inspired by a previous research from the same group, that used
<a href="http://webdav.tuebingen.mpg.de/causality/NIPS2008-Hoyer.pdf">
Gaussian Process Regression
</a> and the
<a href="https://www.jmlr.org/papers/volume6/gretton05a/gretton05a.pdf">
Hilbert-Schmidt Independence Criterion (HSIC)
</a>.
In the new approach, to avoid making additional assumptions about the noise <strong>distribution of the residuals</strong>,
as Gaussian Process Regression does,
the authors directly propose the HSIC estimator as regression "loss function"
measuring dependence between residuals and regressors.
</p>
<p>Published results seem promising. Anyhow, we have to be cautious, as Judea Pearl says,
no matter how sophisticated an algorithm is,
you will not have an automatic way to certainly infer causality from data only.
Application of this "cool" algorithms don't get "causality" for free.
Some previous <strong>assumptions are required</strong> to move Data Science from "data fitting" direction to "data-intensive Causal Science".
Or as I'd rephrase, from being fully data-driven to be model-driven but data-supported.
Or in <a href="https://twitter.com/_miguelhernan/status/897918662514024448?lang=es">
@_MiguelHernan</a> words: "draw your assumptions before your conclusions".
</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Contesting the Soul of Data-Science. Below, an introduction to the Spanish translation of <a href="https://twitter.com/hashtag/Bookofwhy?src=hash&ref_src=twsrc%5Etfw">#Bookofwhy</a>:<a href="https://t.co/gWpMQEIWhX">https://t.co/gWpMQEIWhX</a>. It expresses my conviction that the data-fitting direction taken by “Data Science” is temporary, soon to be replaced by "Data-intensive Causal Science"</p>— Judea Pearl (@yudapearl) <a href="https://twitter.com/yudapearl/status/1280791406894694410?ref_src=twsrc%5Etfw">July 8, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" ></script>
<p>
Moreover I wonder if this epistemic limit warned by Pearl
applies only at some deep level of knowledge but we could circumvent it in some practical scenarios.
As Heisemberg's Uncertainty Principle applies at the micro-world but it
can be ignored with no much penalty at the macro-world physics (with some
<a href="https://phys.org/news/2013-02-heisenberg-uncertainty-principle-macro.html">exceptions
</a>).
</p>
<p>
Several of the previously referred articles come from the research group lead by Bernhard Schölkopf at
<a href="http://webdav.tuebingen.mpg.de/causality/">
Causal inference at the Max-Planck-Institute for Intelligent Systems Tübingen
</a>.
Reading these papers made me feel quite optimistic about the possibilities of this kind of methods.
Particularly considering that Schölkopf and his team are aware of Pearl's work,
and anyway they still present advances on their field.
And benchmarks where several of these methods perform <strong>better than random</strong>.
</p>
<p>
So I started by testing some features from the
<a href="https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html">
Causal Discovery Toolbox
</a>, that in words of their creators is an
"open-source Python package including many state-of-the-art causal modeling algorithms
(most of which are imported from R)".
</p>
<p> I found some of the features very interesting, though regarding to automatic causal discovery with
<code>cdt.causality.pairwise</code> implementations, my first results were a little disappointing.
</p>
<figure>
<img src="img/cdt_experiment1.png" alt="causal discovery with cdt and boston dataset" style="border:1px solid black; width:80%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://github.com/muoten/causal-discovery-playground/blob/master/boston_dataset_causal_inference_playground.ipynb">
https://github.com/muoten/causal-discovery-playground
</a>
</small>
</figcaption>
</figure>
<p>Seemingly, the <i>Additive Noise Model</i> (<code>ANM</code>) from <code>cdt.causality.pairwise</code> package was unable to
properly determinate the cause & effect relationship between CRIM and RAD variables in the
<a href="https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html">
Boston Dataset
</a>.
Considering that <i>CRIM</i> was the "per capita crime rate by town"
and <i>RAD</i> the "index of accesibility to radial highways", is unlikely that the former was a direct cause of the latter.
Being much more likely if the arrow on the generated causal diagram was reversed.
</p>
<p><code>BivariateFit</code> model also provided the same weird causal direction between <i>CRIM</i>
and <i>RAD</i>.
While <i>Conditional Distribution Similarity Statistic</i> (<code>CDS</code>),
<i>Information Geometric Causal Inference</i> (<code>IGCI</code>)
and <i>Regression Error based Causal Inference</i> (<code>RECI</code>) provided the more reasonable
opposite direction. Though all of them struggled also with other pairs as <i>NOX</i> (nitric oxides concentration
and <i>DIS</i> (weighted distances to five Boston employment centres),
with low average rate of success.
</p>
<p>Among these 5 different methods,
only <code>CDS</code> selected the pairwise directions that make more sense:
distance to radial highways causing effects on per capita crime rate (<i>RAD -> CRIM</i>),
and distance to employment centres causing effects on nitric oxides concentration (<i>DIS -> NOX</i>).
</p>
<figure>
<img src="img/cdt_experiment2.png" alt="causal discovery with cdt.causality.pairwise.CDS and boston dataset" style="width:70%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: <a href="https://github.com/muoten/causal-discovery-playground/blob/master/boston_dataset_causal_inference_playground.ipynb">
https://github.com/muoten/causal-discovery-playground
</a>
</small>
</figcaption>
</figure>
<p>More details in the following
<a href="https://github.com/muoten/causal-discovery-playground/blob/master/boston_dataset_causal_inference_playground.ipynb">
Jupyter Notebook
</a>.
</p>
<p>Curiously, I didn't know the origin of this dataset: a paper by David Harrison Jr. and Daniel L. Rubinfeld,
published on 1978 on the Journal of Environmental Economics and Management,
called
<a href="https://deepblue.lib.umich.edu/handle/2027.42/22636">
Hedonic housing prices and the demand for clean air
</a>
</p>
<p>
Finally I've tried to apply these methods to a different data set, also related to health and money,
but not so popular, at least in the Machine Learning field: <i>life expectancy vs health expenditure</i>.
As a context, I was playing around with
<a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_linear_regression.ipynb">
life expectancy vs health expenditure</a> data weeks ago, using
<a href="https://ourworldindata.org/grapher/life-expectancy?time=..&country=~OWID_WRL">Our World in Data</a> and
WHO's <a href="http://apps.who.int/nha/database/Home/IndicatorsDownload/en">Global Health Expenditure Database (GHED)</a> as data sources.
</p>
<p>
My first surprise now when applying to this data the previous 5 methods from <code>cdt.causality.pairwise</code>
was that none of them was able to infer that
<code>health_expenditure_log</code> was directly related to <code>health_expenditure</code>, despite the former was a calculated field by applying logarithm to the latter.
Anyway, ignoring this point and focusing on the
<code>health_expenditure</code> vs. <code>life_expectancy</code> relationship, only 2 of the 5 previous methods inferred the direction that I think makes more sense:
health expenditure as cause of life expectancy.
More details in the following <a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_pairwise_causal_discovery.ipynb">
notebook
</a>.
</p>
<figure>
<img src="img/life_expectancy_and_health_expenditure_pairwise.png" alt="causal discovery with cdt.causality.pairwise.CDS and life expectancy vs health expenditure data">
<figcaption style="text-align:center" >
<small>Source: <a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_pairwise_causal_discovery.ipynb">
https://github.com/muoten/life-expectancy
</a>
</small>
</figcaption>
</figure>
<p>
Summarizing my first (failed) adventure on automatic causal discovery: 3 causal directions to infer, 5 tested methods,
and only 1 of 5 succeeded for all 3 of them. By coin flipping or random guessing we would expect 1 of 8 hits.
Getting 1 of 5 is slightly better than random. Not so bad to abandon hope,
specially for <i>Conditional Distribution Similarity Statistic</i> (<code>CDS</code>) implementation,
that in this little experiment succeeded as our best coin flipper. Though not so good to go deeper on this adventure by the moment.
</p>
<p>Still <i>no free lunch</i>... as usually.</p>
</div>
<br/>
<hr/>
<!--
#############
-->
<div id="linear-models" class="subheading mb-3">Regression to Linear Regression</div>
<h6>(2/8/2020)</h6>
<p>
Sometimes we can feel we know the basics and it's time to move on to something else.
If <strong>gradient boosted trees are good enough to fit my tabular data</strong>,
but <a href="https://xgboost.readthedocs.io/en/latest/">xgboost</a> implementation seems not efficient and accurate enough to my needs,
I could try <a href="https://github.com/microsoft/LightGBM">lightgbm</a> or <a href="https://catboost.ai/">catboost</a>.
Moreover, all of these algorithm implementations deal
reasonable well even with unbalanced datasets and colinearity. So <strong>why should I care about</strong> simpler (and older) <strong>regression models?</strong>
</p>
<p>
It's very healthy to never stop learning, and questioning ourselves everything. But, since a pragmatic and utilitaristic point of view...
is it really worth the time to review the fundamentals of linear and logistic regression
in details?
</p>
<figure>
<img src="img/linear_regression_bro.jpeg" alt="Linear Regression concept in Layman Term" style="border:1px solid black; width:70%;margin:auto;display:block">
<figcaption style="text-align:center" >
<small>Source: https://medium.com/@sarawritezz
<a href="https://medium.com/swlh/this-is-how-i-explain-linear-regression-to-12-year-old-boy-using-sklearn-python-library-d4bb06a649cc">
This is how I explain Linear Regression to 12-Year-old Boy
</a>
</small>
</figcaption>
</figure>
<br/>
<p>
My opinion is that it's not crucial to know implementation details of algorithms to get reasonable accurate predictions
on your validation and test sets, but... accurate predictions, out of Kaggle, many times are not the main point.
And our search for "accuracy" often hides the requirement of generating <strong>actionable knowledge</strong>,
that is, getting insights to make useful interventions in the <i>real world</i>.
For this purpose it's preferable to understand our problem in terms of <strong>associations and causality</strong>.
And be prepared to answer <i>"what if"</i> questions.
</p>
<!--
<p>
Thus... <i>what if</i> I tried to infer causal knowledge about a particular problem by using simple regressions?
</p>
-->
<br/><h4>Life expectancy vs Health expenditure: Linear Regression models for Longitudinal data.</h4>
<blockquote style='display: block;margin-top: 1em;margin-bottom: 1em;margin-left: 40px;margin-right: 40px; border:1px; content: open-quote;'>
<p><i>Sometimes...You can't see the forest for the (Gradient Boosted) trees</i></p>
</blockquote>
<p>
The great performance of <i>xgboost</i> might cause that I didn't really understand and value
some benefits and relevant aspects of linear models.
To better understand linear regression I've tried to compare different implementations.
My goal was to model the main variables that affect <strong>life expectancy</strong>.
Attempting to isolate the effect of multiple collinearities and interdependences expected between the explanatory variables.
</p>
<p>As this goal was quite ambitious I started by using only
the <strong>health expenditure</strong> per capita as explanatory variable. For this purpose
I obtained the data from the <a href="http://apps.who.int/nha/database/Home/IndicatorsDownload/en">
WHO's Global Health Expenditure Database (GHED)
</a>.
Source for Life Expectancy estimates was
<a href="https://ourworldindata.org/grapher/life-expectancy?time=..&country=~OWID_WRL">
OurWorldInData
</a>. More details in my
<a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_linear_regression.ipynb">
Jupyter Notebook on github
</a>.
</p>
<figure>
<img src="img/health_expenditure_and_life_expectancy.png"
alt="Health Expenditure and Life Expectancy" style="border:1px solid black;width:70%;margin:auto;display:block">
<figcaption style="text-align:center">
<small>Source: <a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_linear_regression.ipynb">
muoten.github.io
</a>. With data from
<a href="https://ourworldindata.org/grapher/life-expectancy?time=..&country=~OWID_WRL">
OurWorldIndata
</a> and
<a href="http://apps.who.int/nha/database/Home/IndicatorsDownload/en">
WHO's Global Health Expenditure Database
</a>
</small>
</figcaption>
</figure>
<br/>
<p>
Both datasets correspond to multi-dimensional data involving "measurements"
over time. This kind of data is usually referred in statistics and econometrics,
as <strong>longitudinal</strong> or also <strong>panel data</strong>.
</p>
<p>In Python there are at least 3 different packages we could use to apply linear models to our data.
So I've compared the implementations of the following:
</p>
<ul>
<li><pre>scikitlearn</pre></li>
<li><pre>statsmodels</pre></li>
<li><pre>linearmodels</pre></li>
</ul>
<p>First of all, after preparing and exploring the data, I decide to transform Health Expenditure variable by applying the natural logarithm.
</p>
<figure>
<img src="img/life_expectancy_and_health_expenditure_log.png"
alt="Health Expenditure and Life Expectancy (log)">
<figcaption style="text-align:center"><small>Source: <a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_linear_regression.ipynb">
muoten.github.io
</a>
</small>
</figcaption>
</figure>
<br/>
<p>Finally we can compare the different implementations of Ordinary Least Squares.
In the next figures, the statsmodel implementation, with a constant term (the intercept).
Resulting in a coefficient value (beta) of <math><mn>4,32</mn></math> for the explanatory variable, with standard error of <math><mn>0,33</mn></math>.
Assuming that <i>the covariance matrix of the errors is correctly specified</i>, as indicated in the output.
</p>
<figure>
<img src="img/ols_regression_life_expectancy_vs_health_expenditure_log_metrics.png" style="border:1px solid black;width:70%;margin:auto;display:block"
alt="Ordinary Least Squares (statsmodel): Life Expectancy vs Health Expenditure (log)">
<figcaption style="text-align:center"><small>Source:
<a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_linear_regression.ipynb">
muoten.github.io
</a> Ordinary Least Squares (statsmodel): Life Expectancy vs Health Expenditure (log)
</small>
</figcaption>
</figure>
<br/>
<figure>
<img src="img/ols_regression_life_expectancy_vs_health_expenditure_log.png" style="border:1px solid black;width:70%;margin:auto;display:block"
alt="Ordinary Least Squares (statsmodel): Life Expectancy vs Health Expenditure (log)">
<figcaption style="text-align:center"><small>Source:
<a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_linear_regression.ipynb">
muoten.github.io
</a> Ordinary Least Squares (statsmodel): Life Expectancy vs Health Expenditure (log)
</small>
</figcaption>
</figure>
<br/>
<p>According to our simple model the life expectancy for Spain in 2017 could be roughly estimated as
<math><mn>79.6</mn></math><math><mo>±</mo><mn>0.8</mn></math> years.
Actually life expectancy in Spain is even higher,
as it is one of the highest in the world. Concretely, the value in our dataset for 2017 was <math><mn>83.3</mn></math> years.
Not bad for a single variable simple model.
</p>
<p>Furthermore, in the notebook they are more details about the Coefficient of Determination calculation,
or the relation and differences between different configurations of Linear Regression models for longitudinal (panel) data.
Finally this first analysis has been limited to Ordinary Least Squares,
with
<a href="http://statmath.wu.ac.at/~hauser/LVs/FinEtricsQF/FEtrics_Chp5.pdf">Fixed Effect and Random Effect variants
</a>,
supported by
<a href="https://bashtage.github.io/linearmodels/doc/panel/mathematical-formula.html#panel-mathematical-notation">
linearmodels package.
</a>, with the names of PanelOLS and RandomEffects, respectively.
Other alternatives, as Lasso (L1) and Ridge (L2) regularizations have been excluded from the scope, at the moment.
</p>
<br/>
<h4>Linear regression. Recycling the residuals</h4>
<p>
The residuals in a linear regression model are the differences between the estimates and the true values.
Besides to optimize them in order to be as small as possible, it's important to check these residuals are independent.
</p>
<p>
Non random patterns in the residuals reveal our model is not good enough.
Further, in the OLS context, random errors are assumed to produce residuals that are
normally distributed and centered in zero. Moreover, the residuals should not be correlated with another variable, and adjacent residuals should not be correlated with each other (autocorrelation)
<a href="https://blog.minitab.com/blog/adventures-in-statistics-2/why-you-need-to-check-your-residual-plots-for-regression-analysis">
Reference</a>
</p>
<p>
The python package <a href="https://www.scikit-yb.org/en/latest/api/regressor/residuals.html">
yellowbrick</a> includes oneline methods to visualize ResidualPlots of a scikit-learn model.
By plotting the residuals for the Life Expectancy vs Health Expenditure (log) LinearRegression Model of the previous example
we can see there is indeed a pattern.
</p>
<figure>
<img src="img/residuals_plot.png" style="border:1px solid black;width:70%;margin:auto;display:block"
alt="Residuals for LinerRegression Model">
<figcaption style="text-align:center"><small>Source: <a href="https://github.com/muoten/life-expectancy/blob/master/life_expectancy_vs_health_expenditure_linear_regression.ipynb">
muoten.github.io</a> Residuals for Life Expectancy vs Health Expenditure (log) LinerRegression Model
</small>
</figcaption>
</figure>
<br/>
</div>
</div>
</div>
</section>
<section class="resume-section p-3 p-lg-5 d-flex justify-content-center" id="covid19">
<div class="w-100">
<h2 class="mb-5">COVID-19</h2>
<div class="resume-item d-flex flex-column flex-md-row justify-content-between mb-5" id="covid19-estimates">
<div class="resume-content">
<h3 class="mb-0">Estimates and Context. Analysis and Simulations. Language matters</h3>
<!--
#############
-->
<br/>
<div class="subheading mb-3" id="covid19-seroprevalence-bias">ENECOVID (II): Test error estimates and (my) simplified bias analysis on preliminary results
</div>
<h6>(Updated 21/05/2020)</h6>
<p>
Tests used to detect antibodies, as other kind of clinical measures don't have to be perfect to be useful.
Generally 2 characteristics define this kind of imperfections: <strong>specificity</strong> (true negative rate) and <strong>sensitivity</strong> (true positive rate or <i>recall</i>).
Let's say, they relate to the ability to avoid
<strong>false positives</strong> and <strong>false negatives</strong> respectively.
</p>
<p>
These two kind of errors are also known in statistics as <i>type 1</i> and <i>type 2</i> errors.
And there is often some unbalance between their corresponding error rates.
For clinical diagnoses these unbalance can be even desirable, as both type of errors don't have in general the same impact.
</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I’m not sure who made this, but I might have to stick above my desk because it’s the only way I seem to be able remember the difference between type 1 and type 2 error! <a href="https://t.co/bmk9DI30Jw">pic.twitter.com/bmk9DI30Jw</a></p>— Dr Ellen Grimås (@EllenGrimas) <a href="https://twitter.com/EllenGrimas/status/1171002176551956480?ref_src=twsrc%5Etfw">September 9, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" ></script>
<p>
But in order to quantify prevalences it's preferable that both errors compensate.
Or making posterior adjustments if this unbalance is significant and can't be tuned.
</p>
<p>