pin-code/index.html at master · edwinhu/pin-code · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Duarte, Hu, and Young (2019) JFE Code</title>
<meta name="author" content="Jefferson Duarte, Edwin Hu, and Lance Young" />
<meta name="generator" content="Org Mode" />
<style type="text/css">
  #content { max-width: 60em; margin: auto; }
  .title  { text-align: center;
             margin-bottom: .2em; }
  .subtitle { text-align: center;
              font-size: medium;
              font-weight: bold;
              margin-top:0; }
  .todo   { font-family: monospace; color: red; }
  .done   { font-family: monospace; color: green; }
  .priority { font-family: monospace; color: orange; }
  .tag    { background-color: #eee; font-family: monospace;
            padding: 2px; font-size: 80%; font-weight: normal; }
  .timestamp { color: #bebebe; }
  .timestamp-kwd { color: #5f9ea0; }
  .org-right  { margin-left: auto; margin-right: 0px;  text-align: right; }
  .org-left   { margin-left: 0px;  margin-right: auto; text-align: left; }
  .org-center { margin-left: auto; margin-right: auto; text-align: center; }
  .underline { text-decoration: underline; }
  #postamble p, #preamble p { font-size: 90%; margin: .2em; }
  p.verse { margin-left: 3%; }
  pre {
    border: 1px solid #e6e6e6;
    border-radius: 3px;
    background-color: #f2f2f2;
    padding: 8pt;
    font-family: monospace;
    overflow: auto;
    margin: 1.2em;
  }
  pre.src {
    position: relative;
    overflow: auto;
  }
  pre.src:before {
    display: none;
    position: absolute;
    top: -8px;
    right: 12px;
    padding: 3px;
    color: #555;
    background-color: #f2f2f299;
  }
  pre.src:hover:before { display: inline; margin-top: 14px;}
  /* Languages per Org manual */
  pre.src-asymptote:before { content: 'Asymptote'; }
  pre.src-awk:before { content: 'Awk'; }
  pre.src-authinfo::before { content: 'Authinfo'; }
  pre.src-C:before { content: 'C'; }
  /* pre.src-C++ doesn't work in CSS */
  pre.src-clojure:before { content: 'Clojure'; }
  pre.src-css:before { content: 'CSS'; }
  pre.src-D:before { content: 'D'; }
  pre.src-ditaa:before { content: 'ditaa'; }
  pre.src-dot:before { content: 'Graphviz'; }
  pre.src-calc:before { content: 'Emacs Calc'; }
  pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
  pre.src-fortran:before { content: 'Fortran'; }
  pre.src-gnuplot:before { content: 'gnuplot'; }
  pre.src-haskell:before { content: 'Haskell'; }
  pre.src-hledger:before { content: 'hledger'; }
  pre.src-java:before { content: 'Java'; }
  pre.src-js:before { content: 'Javascript'; }
  pre.src-latex:before { content: 'LaTeX'; }
  pre.src-ledger:before { content: 'Ledger'; }
  pre.src-lisp:before { content: 'Lisp'; }
  pre.src-lilypond:before { content: 'Lilypond'; }
  pre.src-lua:before { content: 'Lua'; }
  pre.src-matlab:before { content: 'MATLAB'; }
  pre.src-mscgen:before { content: 'Mscgen'; }
  pre.src-ocaml:before { content: 'Objective Caml'; }
  pre.src-octave:before { content: 'Octave'; }
  pre.src-org:before { content: 'Org mode'; }
  pre.src-oz:before { content: 'OZ'; }
  pre.src-plantuml:before { content: 'Plantuml'; }
  pre.src-processing:before { content: 'Processing.js'; }
  pre.src-python:before { content: 'Python'; }
  pre.src-R:before { content: 'R'; }
  pre.src-ruby:before { content: 'Ruby'; }
  pre.src-sass:before { content: 'Sass'; }
  pre.src-scheme:before { content: 'Scheme'; }
  pre.src-screen:before { content: 'Gnu Screen'; }
  pre.src-sed:before { content: 'Sed'; }
  pre.src-sh:before { content: 'shell'; }
  pre.src-sql:before { content: 'SQL'; }
  pre.src-sqlite:before { content: 'SQLite'; }
  /* additional languages in org.el's org-babel-load-languages alist */
  pre.src-forth:before { content: 'Forth'; }
  pre.src-io:before { content: 'IO'; }
  pre.src-J:before { content: 'J'; }
  pre.src-makefile:before { content: 'Makefile'; }
  pre.src-maxima:before { content: 'Maxima'; }
  pre.src-perl:before { content: 'Perl'; }
  pre.src-picolisp:before { content: 'Pico Lisp'; }
  pre.src-scala:before { content: 'Scala'; }
  pre.src-shell:before { content: 'Shell Script'; }
  pre.src-ebnf2ps:before { content: 'ebfn2ps'; }
  /* additional language identifiers per "defun org-babel-execute"
       in ob-*.el */
  pre.src-cpp:before  { content: 'C++'; }
  pre.src-abc:before  { content: 'ABC'; }
  pre.src-coq:before  { content: 'Coq'; }
  pre.src-groovy:before  { content: 'Groovy'; }
  /* additional language identifiers from org-babel-shell-names in
     ob-shell.el: ob-shell is the only babel language using a lambda to put
     the execution function name together. */
  pre.src-bash:before  { content: 'bash'; }
  pre.src-csh:before  { content: 'csh'; }
  pre.src-ash:before  { content: 'ash'; }
  pre.src-dash:before  { content: 'dash'; }
  pre.src-ksh:before  { content: 'ksh'; }
  pre.src-mksh:before  { content: 'mksh'; }
  pre.src-posh:before  { content: 'posh'; }
  /* Additional Emacs modes also supported by the LaTeX listings package */
  pre.src-ada:before { content: 'Ada'; }
  pre.src-asm:before { content: 'Assembler'; }
  pre.src-caml:before { content: 'Caml'; }
  pre.src-delphi:before { content: 'Delphi'; }
  pre.src-html:before { content: 'HTML'; }
  pre.src-idl:before { content: 'IDL'; }
  pre.src-mercury:before { content: 'Mercury'; }
  pre.src-metapost:before { content: 'MetaPost'; }
  pre.src-modula-2:before { content: 'Modula-2'; }
  pre.src-pascal:before { content: 'Pascal'; }
  pre.src-ps:before { content: 'PostScript'; }
  pre.src-prolog:before { content: 'Prolog'; }
  pre.src-simula:before { content: 'Simula'; }
  pre.src-tcl:before { content: 'tcl'; }
  pre.src-tex:before { content: 'TeX'; }
  pre.src-plain-tex:before { content: 'Plain TeX'; }
  pre.src-verilog:before { content: 'Verilog'; }
  pre.src-vhdl:before { content: 'VHDL'; }
  pre.src-xml:before { content: 'XML'; }
  pre.src-nxml:before { content: 'XML'; }
  /* add a generic configuration mode; LaTeX export needs an additional
     (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
  pre.src-conf:before { content: 'Configuration File'; }

  table { border-collapse:collapse; }
  caption.t-above { caption-side: top; }
  caption.t-bottom { caption-side: bottom; }
  td, th { vertical-align:top;  }
  th.org-right  { text-align: center;  }
  th.org-left   { text-align: center;   }
  th.org-center { text-align: center; }
  td.org-right  { text-align: right;  }
  td.org-left   { text-align: left;   }
  td.org-center { text-align: center; }
  dt { font-weight: bold; }
  .footpara { display: inline; }
  .footdef  { margin-bottom: 1em; }
  .figure { padding: 1em; }
  .figure p { text-align: center; }
  .equation-container {
    display: table;
    text-align: center;
    width: 100%;
  }
  .equation {
    vertical-align: middle;
  }
  .equation-label {
    display: table-cell;
    text-align: right;
    vertical-align: middle;
  }
  .inlinetask {
    padding: 10px;
    border: 2px solid gray;
    margin: 10px;
    background: #ffffcc;
  }
  #org-div-home-and-up
   { text-align: right; font-size: 70%; white-space: nowrap; }
  textarea { overflow-x: auto; }
  .linenr { font-size: smaller }
  .code-highlighted { background-color: #ffff00; }
  .org-info-js_info-navigation { border-style: none; }
  #org-info-js_console-label
    { font-size: 10px; font-weight: bold; white-space: nowrap; }
  .org-info-js_search-highlight
    { background-color: #ffff00; color: #000000; font-weight: bold; }
  .org-svg { }
</style>
<style type="text/css">
body { max-width: 120ch !important; }
</style>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-67919104-2"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-67919104-2');
</script>
<link rel="stylesheet" href="https://latex.vercel.app/style.css">
</head>
<body>
<div id="content" class="content">
<h1 class="title">Duarte, Hu, and Young (2019) JFE Code</h1>
<div id="table-of-contents" role="doc-toc">
<h2>Table of Contents</h2>
<div id="text-table-of-contents" role="doc-toc">
<ul>
<li><a href="#pin-code">PIN Code</a></li>
<li><a href="#data">Prepare Data</a></li>
<li><a href="#models">Model code</a>
<ul>
<li><a href="#eo-model"><code>EOModel</code></a></li>
<li><a href="#owr-model"><code>OWRModel</code></a></li>
</ul>
</li>
<li><a href="#estimation">Estimation code</a></li>
</ul>
</div>
</div>
<div id="outline-container-pin-code" class="outline-2">
<h2 id="pin-code">PIN Code</h2>
<div class="outline-text-2" id="text-pin-code">
<p>
<b>Note</b>: This code is provided as-is, and this write-up is for illustrative
purposes. Since the publication of the paper we have received numerous requests
for code in different languages, and I decided to revisit the code, update it
for Python 3 and make it available for those that are interested in learning how
the estimation works.
</p>

<p>
This code runs on the <a href="https://wrds-www.wharton.upenn.edu/pages/support/the-wrds-cloud/">WRDS Cloud</a> and prepares the data and does the
estimation for the models of information asymmetry found in <a href="https://www.sciencedirect.com/science/article/pii/S0304405X19301965">Duarte,
Hu, and Young (2019) JFE</a>. Unlike the paper, this data is based on the
<a href="https://wrds-web.wharton.upenn.edu/wrds/query_forms/navigation.cfm?navId=524">WRDS Intraday Indicators</a>, but otherwise the variable construction and
filtering are very similar.
</p>

<p>
In order to make running your own versions of the code easier, I&rsquo;ve decided to
prepackage a Python environment that has all of the dependencies for estimating
all of the models, which you can find here:
<a href="https://www.dropbox.com/scl/fi/m3u1i5aoejf7ltoo30tl6/environment.sh?rlkey=s44j5sbqn5m7ri5hlxhk67xlw&amp;st=fbxwbvxu&amp;dl=1">https://www.dropbox.com/scl/fi/m3u1i5aoejf7ltoo30tl6/environment.sh?rlkey=s44j5sbqn5m7ri5hlxhk67xlw&amp;st=fbxwbvxu&amp;dl=1</a>
</p>

<p>
Put this script in your project directory and run the following commands:
</p>

<div class="org-src-container">
<pre class="src src-bash"><span style="color: #ffc777;">chmod</span> a+x environment.sh
<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">this will install the environment</span>
./environment.sh
<span style="color: #ffc777;">chmod</span> a+x activate.sh
<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">load the environment</span>
<span style="color: #c099ff;">source</span> ~/activate.sh

<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">to see if it is working you can try:</span>
<span style="color: #c099ff;">which</span> ipcluster
</pre>
</div>
</div>
</div>
<div id="outline-container-data" class="outline-2">
<h2 id="data">Prepare Data</h2>
<div class="outline-text-2" id="text-data">
<p>
This SAS code constructs the yearly stock-day files necessary to estimate the
various structural models. To save time, I am using various SAS macros that can
be found <a href="https://github.com/edwinhu/sas">here</a>.
</p>

<p>
It requires access to CRSP (for market cap), COMPUSTAT (for book
values), and TAQ&#x2014;specifically the intraday indicators to get daily
order imbalance, volume, and intraday and overnight returns.
</p>

<p>
The final file will be <code>out.taqdfx_all6</code>.
</p>

<div class="org-src-container">
<pre class="src src-sas">
<span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> this first piece merges CRSP/COMPUSTAT </span><span style="color: #7a88cf;">*/</span>

<span style="color: #ff995e;">%INCLUDE</span> <span style="color: #c3e88d;">"~/git/sas/CC_LINK.sas"</span>;
<span style="color: #82aaff;">%CC_LINK(</span>dsetin=comp.funda,
    dsetout=compx,
    datevar=datadate,
    keep_vars=at lt);

<span style="color: #ff995e;">data</span> crspm6;
    <span style="color: #c099ff;">set</span> crsp.msf;
    <span style="color: #c099ff;">where</span> <span style="color: #82aaff;">month(</span>date)=<span style="color: #ff995e; font-weight: bold;">6</span>;
    ME6=<span style="color: #82aaff;">abs(</span>prc*shrout);
    <span style="color: #c099ff;">keep</span> permno date ME6;
<span style="color: #ff995e;">data</span> crspm;
    <span style="color: #c099ff;">set</span> crsp.msf;
    ME=<span style="color: #82aaff;">abs(</span>prc*shrout);
    datadate=date;
    <span style="color: #c099ff;">keep</span> permno datadate date ME;
<span style="color: #ff995e;">run;</span>

<span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> MERGE_ASOF merges the most recent </span>
<span style="color: #7a88cf;">observation in dataset B into dataset A </span><span style="color: #7a88cf;">*/</span>
<span style="color: #ff995e;">%INCLUDE</span> <span style="color: #c3e88d;">"~/git/sas/MERGE_ASOF.sas"</span>;
<span style="color: #82aaff;">%MERGE_ASOF(</span>a=crspm,b=crspm6,
    merged=crspm2,
    datevar=date,
    num_vars=ME6);
<span style="color: #82aaff;">%MERGE_ASOF(</span>a=crspm2,b=compx,
    merged=crspm3,
    datevar=datadate,
    num_vars=BE ME_COMP at lt gp);
<span style="color: #ff995e;">data</span> crspm3;
    <span style="color: #c099ff;">set</span> crspm3;
    BM = BE/ME6;
    bm_log = <span style="color: #82aaff;">log(</span>BM);
    me_log = <span style="color: #82aaff;">log(</span>ME);
<span style="color: #ff995e;">run;</span>

<span style="color: #ff995e;">proc print</span> <span style="color: #c099ff;">data=</span>crspm3(obs=<span style="color: #ff995e; font-weight: bold;">25</span>) width=min;
    <span style="color: #c099ff;">where</span> permno=<span style="color: #ff995e; font-weight: bold;">11850</span> <span style="color: #c099ff;">and</span> <span style="color: #82aaff;">year(</span>date) <span style="color: #c099ff;">between</span> <span style="color: #ff995e; font-weight: bold;">1993</span> <span style="color: #c099ff;">and</span> <span style="color: #ff995e; font-weight: bold;">2018</span>;;
<span style="color: #c099ff;">var</span> permno date me: bm:;<span style="color: #ff995e;">run;</span>

<span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> This macro creates yearly stock-day files</span>
<span style="color: #7a88cf;">pulling from both master files and then WRDS IID </span>
<span style="color: #7a88cf;">for the second-level TAQ data </span><span style="color: #7a88cf;">*/</span>
<span style="color: #ff995e;">%MACRO</span> TAQ_OWR_GPIN(yyyy=<span style="color: #ff995e; font-weight: bold;">2004</span>);
<span style="color: #ff995e;">data</span> work.mastm_&amp;yyyy. ;
    <span style="color: #c099ff;">set</span> <span style="color: #ff995e;">%if</span> &amp;yyyy &gt; <span style="color: #ff995e; font-weight: bold;">1993</span>
    <span style="color: #ff995e;">%then</span> <span style="color: #ff995e;">%do</span>;
    taq.mast_<span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.-<span style="color: #ff995e; font-weight: bold;">1</span>):
    <span style="color: #ff995e;">%end</span>;
    taq.mast_&amp;yyyy.:
    taq.mast_<span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.+<span style="color: #ff995e; font-weight: bold;">1</span>):;
    SYM_ROOT=<span style="color: #82aaff;">scan(</span><span style="color: #c099ff;">SYMBOL</span>, <span style="color: #ff995e; font-weight: bold;">1</span>, <span style="color: #c3e88d;">' '</span>);
    SYM_SUFFIX=<span style="color: #82aaff;">scan(</span><span style="color: #c099ff;">SYMBOL</span>, <span style="color: #ff995e; font-weight: bold;">2</span>, <span style="color: #c3e88d;">' '</span>);
    DATE=coalesce(FDATE,DATEF);
    <span style="color: #c099ff;">format</span> date yymmdd10.;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>work.mastm_&amp;yyyy. NODUPKEY;
    <span style="color: #c099ff;">by</span> <span style="color: #c099ff;">SYMBOL</span> DATE;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sql</span>;
    create <span style="color: #c099ff;">table</span> work.mastm_crsp_&amp;yyyy. as
    <span style="color: #c099ff;">select</span> a.date, sym_root, sym_suffix, <span style="color: #c099ff;">symbol</span>,
    <span style="color: #82aaff;">substr(</span>coalesce(b.ncusip, b.cusip),<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">8</span>) as cusip8,
    a.permno, a.permco, shrcd, exchcd,
    a.prc, a.ret, a.retx, a.shrout, a.vol, c.divamt, c.distcd,
    coalesce(e.SP500,<span style="color: #ff995e; font-weight: bold;">0</span>) as SP500
    from crsp.dsf a
    left join
    crsp.dsenames b
    on a.permno = b.permno
    <span style="color: #c099ff;">and</span> a.date <span style="color: #c099ff;">between</span> b.namedt <span style="color: #c099ff;">and</span> coalesce(b.nameendt, <span style="color: #82aaff;">today(</span>))
    left join
    crsp.dsedist c
    on a.permno = c.permno
    <span style="color: #c099ff;">and</span> a.date = c.paydt
    left join
    (<span style="color: #c099ff;">select</span> distinct cusip, sym_root, sym_suffix, <span style="color: #c099ff;">symbol</span>,
    <span style="color: #82aaff;">min(</span>date) as mindt, <span style="color: #82aaff;">max(</span>date) as maxdt
    from work.mastm_&amp;yyyy.
    group <span style="color: #c099ff;">by</span> cusip, sym_root, sym_suffix, <span style="color: #c099ff;">symbol</span>) d
    on <span style="color: #82aaff;">substr(</span>d.cusip,<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">8</span>) = <span style="color: #82aaff;">substr(</span>coalesce(b.ncusip, b.cusip),<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">8</span>)
    <span style="color: #c099ff;">and</span> a.date ge d.mindt
    <span style="color: #c099ff;">and</span> a.date le coalesce(d.maxdt,<span style="color: #82aaff;">today(</span>))
    left join
    (<span style="color: #c099ff;">select</span> *, <span style="color: #ff995e; font-weight: bold;">1</span> as SP500 from crsp.dsp500list) e
    on a.permno = e.permno
    <span style="color: #c099ff;">and</span> a.date <span style="color: #c099ff;">between</span> e.start <span style="color: #c099ff;">and</span> e.ending
    <span style="color: #c099ff;">where</span> <span style="color: #82aaff;">year(</span>a.date) = &amp;yyyy.
    <span style="color: #c099ff;">and</span> <span style="color: #c099ff;">symbol</span> is <span style="color: #c099ff;">not</span> <span style="color: #c099ff;">null</span>
    order <span style="color: #c099ff;">by</span> a.date, sym_root, sym_suffix;
<span style="color: #ff995e;">quit;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>work.mastm_crsp_&amp;yyyy. nodupkey;
    <span style="color: #c099ff;">by</span> date sym_root sym_suffix;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>taq.wrds_iid_&amp;yyyy.
    <span style="color: #c099ff;">out=</span>work.wrds_iid_&amp;yyyy.;
    <span style="color: #c099ff;">by</span> date <span style="color: #c099ff;">symbol</span>;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">data</span> work.taqdf_&amp;yyyy.;
    <span style="color: #c099ff;">length</span> date <span style="color: #ff995e; font-weight: bold;">8</span>;
    <span style="color: #c099ff;">merge</span> work.wrds_iid_&amp;yyyy.(<span style="color: #c099ff;">keep</span>=date <span style="color: #c099ff;">symbol</span>
    buynumtrades_lri sellnumtrades_lri
    FPrice OPrice CPrc: ret_mkt_t
    vwap_m
    SumVolume_m SumVolume_b SumVolume_a)
    work.mastm_crsp_&amp;yyyy.;
    <span style="color: #c099ff;">by</span> date <span style="color: #c099ff;">symbol</span>;
    <span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> make names consistent with TAQMSEC </span><span style="color: #7a88cf;">*/</span>
    CCPrc = <span style="color: #82aaff;">abs(</span>coalesce(prc,cprc,cprc2));
    mid_after_open = coalesce((oprice+fprice)/<span style="color: #ff995e; font-weight: bold;">2</span>,oprice,fprice);
    y_e = divide(buynumtrades_lri-sellnumtrades_lri,buynumtrades_lri+sellnumtrades_lri);
    symbol_15=<span style="color: #c099ff;">symbol</span>;
     <span style="color: #c099ff;">rename</span> buynumtrades_lri = n_buys
    sellnumtrades_lri = n_sells
    vwap_m = vw_price_m
    ret_mkt_t = ret_mkt_m
    SumVolume_m = total_vol_m
    SumVolume_b = total_vol_b
    SumVolume_a = total_vol_a;
    <span style="color: #c099ff;">label</span> CCPrc=<span style="color: #c3e88d;">'Closing Price (CRSP or TAQ)'</span> y_e=<span style="color: #c3e88d;">'Order Imbalance (%)'</span>;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>work.taqdf_&amp;yyyy. <span style="color: #c099ff;">out=</span>taqdf_&amp;yyyy.x nodupkey;
    <span style="color: #c099ff;">by</span> permno date;
    <span style="color: #c099ff;">where</span> permno &gt; .Z
    <span style="color: #c099ff;">and</span> shrcd in (<span style="color: #ff995e; font-weight: bold;">10</span>,<span style="color: #ff995e; font-weight: bold;">11</span>)
    <span style="color: #c099ff;">and</span> exchcd in (<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">2</span>,<span style="color: #ff995e; font-weight: bold;">3</span>,<span style="color: #ff995e; font-weight: bold;">4</span>);
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">%MEND</span>;

<span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> This macro creates yearly stock-day files</span>
<span style="color: #7a88cf;">pulling from both master files and then WRDS IID </span>
<span style="color: #7a88cf;">for the millisecond-level TAQ data </span><span style="color: #7a88cf;">*/</span>
<span style="color: #ff995e;">%MACRO</span> TAQM_OWR_GPIN(yyyy=<span style="color: #ff995e; font-weight: bold;">2014</span>);
<span style="color: #ff995e;">%let</span> sysyear= <span style="color: #82aaff;">%sysfunc(year(</span><span style="color: #c3e88d;">"&amp;sysdate"</span>d));
<span style="color: #ff995e;">data</span> work.mast1_&amp;yyyy.;
    <span style="color: #c099ff;">length</span> date <span style="color: #ff995e; font-weight: bold;">8</span> sym_root $6 sym_suffix $10 symbol_15 $15;
    <span style="color: #c099ff;">set</span> taqmsec.mastm_<span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.-<span style="color: #ff995e; font-weight: bold;">1</span>):
    taqmsec.mastm_&amp;yyyy.:
    <span style="color: #ff995e;">%if</span> <span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.+<span style="color: #ff995e; font-weight: bold;">1</span>) &lt;= &amp;sysyear. <span style="color: #ff995e;">%then</span> <span style="color: #ff995e;">%do</span>;
    taqmsec.mastm_<span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.+<span style="color: #ff995e; font-weight: bold;">1</span>):
    <span style="color: #ff995e;">%end</span>;;
    SYM_ROOT=<span style="color: #82aaff;">scan(</span>SYMBOL_15, <span style="color: #ff995e; font-weight: bold;">1</span>, <span style="color: #c3e88d;">' '</span>);
    SYM_SUFFIX=<span style="color: #82aaff;">scan(</span>SYMBOL_15, <span style="color: #ff995e; font-weight: bold;">2</span>, <span style="color: #c3e88d;">' '</span>);
    <span style="color: #c099ff;">keep</span> date cusip sym_root sym_suffix symbol_15;
    <span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">data</span> work.mast2_&amp;yyyy. ;
    <span style="color: #c099ff;">length</span> date <span style="color: #ff995e; font-weight: bold;">8</span> sym_root $6 sym_suffix $10 symbol_15 $15;
    <span style="color: #c099ff;">set</span> taq.mast_<span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.-<span style="color: #ff995e; font-weight: bold;">1</span>):
    taq.mast_&amp;yyyy.:
    <span style="color: #ff995e;">%if</span> <span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.+<span style="color: #ff995e; font-weight: bold;">1</span>) &lt;= &amp;sysyear. <span style="color: #ff995e;">%then</span> <span style="color: #ff995e;">%do</span>;
    taq.mast_<span style="color: #82aaff;">%SYSEVALF(</span>&amp;yyyy.+<span style="color: #ff995e; font-weight: bold;">1</span>):
    <span style="color: #ff995e;">%end</span>;;
    SYM_ROOT=<span style="color: #82aaff;">scan(</span><span style="color: #c099ff;">SYMBOL</span>, <span style="color: #ff995e; font-weight: bold;">1</span>, <span style="color: #c3e88d;">' '</span>);
    SYM_SUFFIX=<span style="color: #82aaff;">scan(</span><span style="color: #c099ff;">SYMBOL</span>, <span style="color: #ff995e; font-weight: bold;">2</span>, <span style="color: #c3e88d;">' '</span>);
    DATE=coalesce(DATE,FDATE,DATEF);
    SYMBOL_15=coalescec(SYMBOL_15,<span style="color: #c099ff;">SYMBOL</span>);
    <span style="color: #c099ff;">keep</span> date cusip sym_root sym_suffix symbol_15;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">data</span> work.mastm_&amp;yyyy.;
    <span style="color: #c099ff;">length</span> date <span style="color: #ff995e; font-weight: bold;">8</span> cusip $12
    sym_root $6 sym_suffix $10 symbol_15 $15;
    <span style="color: #c099ff;">set</span> work.mast1_&amp;yyyy. work.mast2_&amp;yyyy.;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>work.mastm_&amp;yyyy. NODUPKEY;
    <span style="color: #c099ff;">by</span> SYM_ROOT SYM_SUFFIX DATE;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sql</span>;
    create <span style="color: #c099ff;">table</span> work.mastm_crsp_&amp;yyyy. as
    <span style="color: #c099ff;">select</span> a.date, sym_root, sym_suffix, symbol_15,
    <span style="color: #82aaff;">substr(</span>coalesce(b.ncusip, b.cusip),<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">8</span>) as cusip8,
    a.permno, a.permco, shrcd, exchcd,
    a.prc, a.ret, a.retx, a.shrout, a.vol, c.divamt, c.distcd,
    coalesce(e.SP500,<span style="color: #ff995e; font-weight: bold;">0</span>) as SP500
    from crsp.dsf a
    left join
    crsp.dsenames b
    on a.permno = b.permno
    <span style="color: #c099ff;">and</span> a.date <span style="color: #c099ff;">between</span> b.namedt <span style="color: #c099ff;">and</span> coalesce(b.nameendt, <span style="color: #82aaff;">today(</span>))
    left join
    crsp.dsedist c
    on a.permno = c.permno
    <span style="color: #c099ff;">and</span> a.date = c.paydt
    left join
    (<span style="color: #c099ff;">select</span> distinct cusip, sym_root, sym_suffix, symbol_15,
    <span style="color: #82aaff;">min(</span>date) as mindt, <span style="color: #82aaff;">max(</span>date) as maxdt
    from work.mastm_&amp;yyyy.
    group <span style="color: #c099ff;">by</span> cusip, sym_root, sym_suffix, symbol_15) d
    on <span style="color: #82aaff;">substr(</span>d.cusip,<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">8</span>) = <span style="color: #82aaff;">substr(</span>coalesce(b.ncusip, b.cusip),<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">8</span>)
    <span style="color: #c099ff;">and</span> a.date ge d.mindt
    <span style="color: #c099ff;">and</span> a.date le coalesce(d.maxdt,<span style="color: #82aaff;">today(</span>))
    left join
    (<span style="color: #c099ff;">select</span> *, <span style="color: #ff995e; font-weight: bold;">1</span> as SP500 from crsp.dsp500list) e
    on a.permno = e.permno
    <span style="color: #c099ff;">and</span> a.date <span style="color: #c099ff;">between</span> e.start <span style="color: #c099ff;">and</span> e.ending
    <span style="color: #c099ff;">where</span> <span style="color: #82aaff;">year(</span>a.date) = &amp;yyyy.
    <span style="color: #c099ff;">and</span> symbol_15 is <span style="color: #c099ff;">not</span> <span style="color: #c099ff;">null</span>
    order <span style="color: #c099ff;">by</span> a.date, sym_root, sym_suffix;
<span style="color: #ff995e;">quit;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>work.mastm_crsp_&amp;yyyy. nodupkey;
    <span style="color: #c099ff;">by</span> date sym_root sym_suffix;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>taqmsec.wrds_iid_&amp;yyyy.
    <span style="color: #c099ff;">out=</span>work.wrds_iid_&amp;yyyy.;
    <span style="color: #c099ff;">by</span> date sym_root sym_suffix;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">data</span> work.taqdf_&amp;yyyy.;
    <span style="color: #c099ff;">length</span> date <span style="color: #ff995e; font-weight: bold;">8</span> sym_root $6 sym_suffix $10;
    <span style="color: #c099ff;">merge</span> work.wrds_iid_&amp;yyyy.(<span style="color: #c099ff;">keep</span>=date sym_root sym_suffix
    buynumtrades_lr sellnumtrades_lr oprc cprc ret_mkt_m
    vw_price_m mid_after_open
    total_vol_m total_vol_b total_vol_a)
    work.mastm_crsp_&amp;yyyy.;
    <span style="color: #c099ff;">by</span> date sym_root sym_suffix;
    CCPrc = <span style="color: #82aaff;">abs(</span>coalesce(prc,cprc));
    y_e = divide(buynumtrades_lr-sellnumtrades_lr,buynumtrades_lr+sellnumtrades_lr);
    <span style="color: #c099ff;">rename</span> buynumtrades_lr=n_buys sellnumtrades_lr=n_sells;
    <span style="color: #c099ff;">label</span> CCPrc=<span style="color: #c3e88d;">'Closing Price (CRSP or TAQ)'</span> y_e=<span style="color: #c3e88d;">'Order Imbalance (%)'</span>;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>work.taqdf_&amp;yyyy. <span style="color: #c099ff;">out=</span>taqdf_&amp;yyyy.x nodupkey;
    <span style="color: #c099ff;">by</span> permno date;
    <span style="color: #c099ff;">where</span> permno &gt; .Z
    <span style="color: #c099ff;">and</span> shrcd in (<span style="color: #ff995e; font-weight: bold;">10</span>,<span style="color: #ff995e; font-weight: bold;">11</span>)
    <span style="color: #c099ff;">and</span> exchcd in (<span style="color: #ff995e; font-weight: bold;">1</span>,<span style="color: #ff995e; font-weight: bold;">2</span>,<span style="color: #ff995e; font-weight: bold;">3</span>,<span style="color: #ff995e; font-weight: bold;">4</span>);
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">%MEND</span>;

<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">1993</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">1994</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">1995</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">1996</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">1997</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">1998</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">1999</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2000</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2001</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2002</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2003</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2004</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2005</span>);
<span style="color: #82aaff;">%TAQ_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2006</span>);
<span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> NMS Implementation Feb 2007 </span><span style="color: #7a88cf;">*/</span>
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2007</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2008</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2009</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2010</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2011</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2012</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2013</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2014</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2015</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2016</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2017</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2018</span>);
<span style="color: #82aaff;">%TAQM_OWR_GPIN(</span>yyyy=<span style="color: #ff995e; font-weight: bold;">2019</span>);

<span style="color: #ff995e;">data</span> taqdfx_all;
    <span style="color: #c099ff;">set</span> taqdf_:;
<span style="color: #ff995e;">run;</span>

<span style="color: #ff995e;">proc sql</span>;
    create <span style="color: #c099ff;">table</span> taqdfx_all1 as
    <span style="color: #c099ff;">select</span> a.*, b.vwretd, b.vwretx
    from taqdfx_all a
    left join crsp.dsiy b
    on a.date = b.caldt
    order <span style="color: #c099ff;">by</span> a.permno, a.date;
<span style="color: #ff995e;">quit;</span>

<span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> Compute and adjust OWR variables </span><span style="color: #7a88cf;">*/</span>
<span style="color: #ff995e;">proc printto</span> log=<span style="color: #c3e88d;">'/dev/null'</span>;<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc expand</span> <span style="color: #c099ff;">data=</span>taqdfx_all1
    <span style="color: #c099ff;">out=</span>taqdfx_all2
    method=none;
    <span style="color: #c099ff;">by</span> permno;
    convert y_e = y_eL1 / transformout = (lag <span style="color: #ff995e; font-weight: bold;">1</span>);
    convert ccprc = CCPrcL1 / transformout = (lag <span style="color: #ff995e; font-weight: bold;">1</span>);
    convert mid_after_open = omF1 / transformout = (lead <span style="color: #ff995e; font-weight: bold;">1</span>);
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc printto</span>;<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">%put</span> expand &amp;syslast. done;

<span style="color: #ff995e;">data</span> taqdfx_all2;
    <span style="color: #c099ff;">set</span> taqdfx_all2;
    yyyy=<span style="color: #82aaff;">year(</span>date);
    r_d = (vw_price_m-mid_after_open+coalesce(divamt,<span style="color: #ff995e; font-weight: bold;">0</span>))/mid_after_open;
    r_o = (omF1-vw_price_m)/mid_after_open;
<span style="color: #ff995e;">run;</span>

<span style="color: #82aaff;">%MERGE_ASOF(</span>a=taqdfx_all2,b=crspm3,
    merged=taqdfx_all3,
    datevar=date,
    num_vars=bm_log me_log);

<span style="color: #ff995e;">proc printto</span> log=<span style="color: #c3e88d;">'/dev/null'</span>;<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc reg</span> <span style="color: #c099ff;">data=</span>taqdfx_all3 outest=_beta
    (<span style="color: #c099ff;">drop</span>=_: retx <span style="color: #c099ff;">rename</span>=(Intercept=alpha vwretx=beta)) noprint;
    <span style="color: #c099ff;">by</span> permno yyyy;
    <span style="color: #c099ff;">model</span> retx = vwretx;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc printto</span>;<span style="color: #ff995e;">run;</span>

<span style="color: #ff995e;">data</span> taqdfx_all4;
    <span style="color: #c099ff;">merge</span> taqdfx_all3 _beta;
    <span style="color: #c099ff;">by</span> permno yyyy;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>taqdfx_all4 nodupkey;
    <span style="color: #c099ff;">by</span> date permno;
<span style="color: #ff995e;">run;</span>

<span style="color: #ff995e;">proc printto</span> log=<span style="color: #c3e88d;">'/dev/null'</span>;<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc reg</span> <span style="color: #c099ff;">data=</span>taqdfx_all4 noprint;
      <span style="color: #c099ff;">model</span> r_o r_d = beta me_log bm_log;
      <span style="color: #c099ff;">output</span> <span style="color: #c099ff;">out=</span>_ret_resid(<span style="color: #c099ff;">keep</span>=permno date ur_o ur_d) r=ur_o ur_d;
      <span style="color: #c099ff;">model</span> y_e = y_eL1 me_log;
      <span style="color: #c099ff;">output</span> <span style="color: #c099ff;">out=</span>_oib_resid(<span style="color: #c099ff;">keep</span>=permno date uy_e) r=uy_e;
      <span style="color: #c099ff;">by</span> date;
<span style="color: #ff995e;">run;</span>
<span style="color: #ff995e;">proc printto</span>;<span style="color: #ff995e;">run;</span>

<span style="color: #ff995e;">data</span> taqdfx_all5;
    <span style="color: #c099ff;">merge</span> taqdfx_all4 _ret_resid _oib_resid;
    <span style="color: #c099ff;">by</span> date permno;
<span style="color: #ff995e;">run;</span>

<span style="color: #ff995e;">%INCLUDE</span> <span style="color: #c3e88d;">"~/git/sas/WINSORIZE_TRUNCATE.sas"</span>;
<span style="color: #82aaff;">%WINSORIZE_TRUNCATE(</span>dsetin = taqdfx_all5,
    dsetout = taqdfx_all6,
    byvar = date,
    vars = ur_o ur_d,
    type = W,
    pctl = <span style="color: #ff995e; font-weight: bold;">1</span> <span style="color: #ff995e; font-weight: bold;">99</span>,
    filter = <span style="color: #c099ff;">and</span> exchcd eq <span style="color: #ff995e; font-weight: bold;">1</span>);

<span style="color: #7a88cf;">/*</span><span style="color: #7a88cf;"> Output files </span><span style="color: #7a88cf;">*/</span>
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>taqdfx_all6
    <span style="color: #c099ff;">out=</span>out.taqdfx_all6(<span style="color: #c099ff;">compress=</span>no) nodupkey;
    <span style="color: #c099ff;">by</span> permno date;
<span style="color: #ff995e;">proc sort</span> <span style="color: #c099ff;">data=</span>crspm3
    <span style="color: #c099ff;">out=</span>out.crspm3 nodupkey;
    <span style="color: #c099ff;">by</span> permno date;
<span style="color: #ff995e;">run;</span>
</pre>
</div>

<p>
This python script loads the SAS file and writes it to a <a href="https://www.pytables.org/">PyTables</a> HDF5
file, a data format that is much better suited for multiple read/write
and query. This will allow for much easier parallelization (see
<code>est.py</code>).
</p>

<p>
The last piece actually shows an example of estimating three of the
models. Given the raw data, we try one iteration for XOM in 2015, and
get as output a dictionary of parameter estimates. We&rsquo;ll get into this
later after going through the model code.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #c099ff;">import</span> os
<span style="color: #c099ff;">import</span> pandas <span style="color: #c099ff;">as</span> pd
<span style="color: #c099ff;">from</span> importlib <span style="color: #c099ff;">import</span> <span style="color: #c099ff;">reload</span>
os.chdir(<span style="color: #c3e88d;">'/home/nyu/eddyhu/git/pin-code'</span>)
<span style="color: #c099ff;">import</span> eo_model <span style="color: #c099ff;">as</span> eo
<span style="color: #c099ff;">import</span> gpin_model <span style="color: #c099ff;">as</span> gpin
<span style="color: #c099ff;">import</span> owr_model <span style="color: #c099ff;">as</span> owr

<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">setup data</span>
<span style="color: #ff98a4;">df</span> = pd.read_sas(<span style="color: #c3e88d;">'/scratch/nyu/hue/taqdfx_all6.sas7bdat'</span>)
<span style="color: #ff98a4;">df</span>[<span style="color: #c3e88d;">'yyyy'</span>] = df.yyyy.astype(<span style="color: #c3e88d;">'int'</span>)
<span style="color: #ff98a4;">df</span>[<span style="color: #c3e88d;">'date'</span>] = df.DATE
<span style="color: #ff98a4;">df</span>[<span style="color: #c3e88d;">'permno'</span>] = df.permno.astype(<span style="color: #c3e88d;">'int'</span>)
<span style="color: #ff98a4;">df</span>[<span style="color: #c3e88d;">'ticker'</span>] = df.symbol_15.<span style="color: #c099ff;">str</span>.decode(<span style="color: #c3e88d;">'UTF-8'</span>)
df.set_index(<span style="color: #c3e88d;">'permno yyyy'</span>.split(),inplace=<span style="color: #ff995e;">True</span>)
<span style="color: #ff98a4;">c</span> = df.groupby(level=(<span style="color: #ff995e; font-weight: bold;">0</span>,<span style="color: #ff995e; font-weight: bold;">1</span>))\
    [<span style="color: #c3e88d;">'n_buys n_sells ur_d ur_o uy_e'</span>.split()]\
    .count().<span style="color: #c099ff;">min</span>(axis=<span style="color: #ff995e; font-weight: bold;">1</span>)
c.<span style="color: #ff98a4;">name</span> = <span style="color: #c3e88d;">'count_min'</span>
<span style="color: #ff98a4;">df1</span> = df.join(c)
df1.loc[df1.count_min&gt;=<span style="color: #ff995e; font-weight: bold;">230</span>]\
    [<span style="color: #c3e88d;">'date ticker n_buys n_sells ur_d ur_o uy_e'</span>.split()]\
    .to_hdf(<span style="color: #c3e88d;">'/scratch/nyu/hue/taqdf_1319.h5'</span>,<span style="color: #c3e88d;">'data'</span>,<span style="color: #c099ff;">format</span>=<span style="color: #c3e88d;">'table'</span>)

<span style="color: #ff98a4;">d</span> = pd.read_hdf(<span style="color: #c3e88d;">'/scratch/nyu/hue/taqdf_1319.h5'</span>,where=<span style="color: #c3e88d;">'permno==11850 &amp; yyyy==2015'</span>)

<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">rest run of each model</span>
eo.fit(d.n_buys,d.n_sells,starts=<span style="color: #ff995e; font-weight: bold;">1</span>)
gpin.fit(d.n_buys,d.n_sells,starts=<span style="color: #ff995e; font-weight: bold;">1</span>)
owr.fit(d.uy_e,d.ur_d,d.ur_o,starts=<span style="color: #ff995e; font-weight: bold;">1</span>)
</pre>
</div>
</div>
</div>
<div id="outline-container-models" class="outline-2">
<h2 id="models">Model code</h2>
<div class="outline-text-2" id="text-models">
<p>
The model code includes <code>eo_model.py</code>, <code>dy_model.py</code>, <code>gpin_model.py</code>,
and <code>owr_model.py</code>. These files also rely on some utility files like
<code>common.py</code> and <code>regressions.py</code>.
</p>

<p>
To make things simple we will start with <code>eo_model.py</code> as it is the
simplest model and code. The code for <code>dy</code> and <code>gpin</code> are nearly
structurally identical to <code>eo</code>, except for differences in
parameterization, the degree of involvement in running simulations,
and the likelihood functions.
</p>

<p>
I will describe <code>owr_model.py</code> in detail as it involves quite a few
optimization tricks.
</p>
</div>
<div id="outline-container-eo-model" class="outline-3">
<h3 id="eo-model"><code>EOModel</code></h3>
<div class="outline-text-3" id="text-eo-model">
<p>
Let&rsquo;s start with the import statements. Because Python is a general
purpose programming language, we will need to import the mathematical
functions that we need, including basics like <code>log</code>, <code>exponential</code>,
etc. <code>common.py</code> also imports and defines some functions like the <code>log
factorial</code> using the <code>gammaln</code> function from scipy.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">numpy for matrix algebra</span>
<span style="color: #c099ff;">import</span> numpy <span style="color: #c099ff;">as</span> np
<span style="color: #c099ff;">from</span> numpy <span style="color: #c099ff;">import</span> log, exp

<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">some scipy special mathematical functions</span>
<span style="color: #c099ff;">from</span> scipy.special <span style="color: #c099ff;">import</span> logsumexp
<span style="color: #c099ff;">from</span> scipy.linalg <span style="color: #c099ff;">import</span> inv

<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">this is the main optimization library</span>
<span style="color: #c099ff;">import</span> scipy.optimize <span style="color: #c099ff;">as</span> op

<span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">import common functions</span>
<span style="color: #c099ff;">from</span> common <span style="color: #c099ff;">import</span> *
</pre>
</div>

<p>
Each model is defined as a Python Class. A Python Class is an object
that we define, which contains attributes (data) and methods
(functions). In the <code>EOModel</code> attributes include the parameters:
&alpha;, &delta;, &epsilon;, etc.; and the methods include functions
that simulate the PIN model, define the likelihood functions, and run
the model estimation (<code>fit()</code>).
</p>

<p>
Every Class needs to have an <code>__init__()</code> function, which sets up the
model Class. Let&rsquo;s take a look at the Class definition.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #c099ff;">class</span> <span style="color: #ffc777;">EOModel</span>(<span style="color: #c099ff;">object</span>): <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">because we are defining custom models, we are subclassing the most generic Python object</span>

    <span style="color: #c099ff;">def</span> <span style="color: #82aaff;">__init__</span>(<span style="color: #c099ff;">self</span>,a,d,es,eb,u,n=<span style="color: #ff995e; font-weight: bold;">1</span>,t=<span style="color: #ff995e; font-weight: bold;">252</span>): <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">here we describe the EOModel parameters</span>
        <span style="color: #9ba5db;">"""Initializes parameters of an Easley and O'Hara Sequential Trade Model</span>
<span style="color: #9ba5db;">        </span>
<span style="color: #9ba5db;">        a : $</span><span style="color: #ff995e;">\a</span><span style="color: #9ba5db;">lpha$, the unconditional probability of an information event</span>
<span style="color: #9ba5db;">        d : $\delta$, the unconditional probability of good news</span>
<span style="color: #9ba5db;">        es : $\epsilon_s$, the average number of sells on a day with no news</span>
<span style="color: #9ba5db;">        eb : $\epsilon_b$, the average number of buys on a day with no news</span>
<span style="color: #9ba5db;">        u : $\mu$, the average number of (additional) trades on a day with news</span>

<span style="color: #9ba5db;">        n : the number of stocks to simulate, default 1</span>
<span style="color: #9ba5db;">        t : the number of periods to simulate, default 252 (one trading year)</span>
<span style="color: #9ba5db;">        """</span>

        <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">Assign model parameters</span>
        <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">a</span>, <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">d</span>, <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">es</span>, <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">eb</span>, <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">u</span>, <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">N</span>, <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">T</span> = a, d, es, eb, u, n, t
        <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">states</span> = <span style="color: #c099ff;">self</span>._draw_states()
        <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">buys</span> = np.random.poisson((eb+(<span style="color: #c099ff;">self</span>.states == <span style="color: #ff995e; font-weight: bold;">1</span>)*u))
        <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">sells</span> = np.random.poisson((es+(<span style="color: #c099ff;">self</span>.states == -<span style="color: #ff995e; font-weight: bold;">1</span>)*u))
        <span style="color: #c099ff;">self</span>.<span style="color: #ff98a4;">alpha</span> = compute_alpha(a, d, eb, es, u, <span style="color: #c099ff;">self</span>.buys, <span style="color: #c099ff;">self</span>.sells)

</pre>
</div>

<p>
In addition to the standard PIN model parameters, our class includes
<i>n</i>, the number of stocks to simulate, and <i>t</i>, the number of periods
to simulate.
</p>

<p>
We can initialize an <code>EOModel</code> like this:
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #ff98a4;">a</span> = <span style="color: #ff995e; font-weight: bold;">0.41</span>
<span style="color: #ff98a4;">d</span> = <span style="color: #ff995e; font-weight: bold;">0.58</span>
<span style="color: #ff98a4;">es</span> = <span style="color: #ff995e; font-weight: bold;">2719</span>
<span style="color: #ff98a4;">eb</span> = <span style="color: #ff995e; font-weight: bold;">2672</span>
<span style="color: #ff98a4;">u</span> = <span style="color: #ff995e; font-weight: bold;">2700</span>

<span style="color: #ff98a4;">N</span> = <span style="color: #ff995e; font-weight: bold;">1000</span>
<span style="color: #ff98a4;">T</span> = <span style="color: #ff995e; font-weight: bold;">252</span>

<span style="color: #ff98a4;">model</span> = EOModel(a,d,es,eb,u,n=N,t=T)
</pre>
</div>

<p>
Behind the scenes this will initialize an instance of a PIN model, and
will simulate 1000 stock-year observations (252 days in a trading
year). This happens because the <code>__init__()</code> function draws the states
and then draws buys and sells from Poisson
distributions. <code>_draw_states()</code> works by drawing independent binomials
based on the probability of an event &alpha;, and probability of good
nes &delta;.
</p>

<div class="org-src-container">
<pre class="src src-python">    <span style="color: #c099ff;">def</span> <span style="color: #82aaff;">_draw_states</span>(<span style="color: #c099ff;">self</span>):
        <span style="color: #9ba5db;">"""Draws the states for N stocks and T periods.</span>

<span style="color: #9ba5db;">        In the Easley and O'Hara sequential trade model at the beginning of each period nature determines whether there is an information event with probability $</span><span style="color: #ff995e;">\a</span><span style="color: #9ba5db;">lpha$ (a). If there is information, nature determines whether the signal is good news with probability $\delta$ (d) or bad news $1-\delta$ (1-d).</span>

<span style="color: #9ba5db;">        A quick way to implement this is to draw all of the event states at once as an `NxT` matrix from a binomial distribution with $p=</span><span style="color: #ff995e;">\a</span><span style="color: #9ba5db;">lpha$, and independently draw all of the news states as an `NxT` matrix from a binomial with $p=\delta$. </span>
<span style="color: #9ba5db;">        </span>
<span style="color: #9ba5db;">        An information event occurs for stock i on day t if `events[i][t]=1`, and zero otherwise. The news is good if `news[i][t]=1` and bad if `news[i][t]=-1`. </span>

<span style="color: #9ba5db;">        The element-wise product of `events` with `news` gives a complete description of the states for the sequential trade model, where the state variable can take the values (-1,0,1) for bad news, no news, and good news respectively.</span>

<span style="color: #9ba5db;">        self : EOSequentialTradeModel instance which contains parameter definitions</span>
<span style="color: #9ba5db;">        """</span>
        <span style="color: #ff98a4;">events</span> = np.random.binomial(<span style="color: #ff995e; font-weight: bold;">1</span>, <span style="color: #c099ff;">self</span>.a, (<span style="color: #c099ff;">self</span>.N,<span style="color: #c099ff;">self</span>.T))
        <span style="color: #ff98a4;">news</span> = np.random.binomial(<span style="color: #ff995e; font-weight: bold;">1</span>, <span style="color: #c099ff;">self</span>.d, (<span style="color: #c099ff;">self</span>.N,<span style="color: #c099ff;">self</span>.T))
        <span style="color: #ff98a4;">news</span>[news == <span style="color: #ff995e; font-weight: bold;">0</span>] = -<span style="color: #ff995e; font-weight: bold;">1</span>

        <span style="color: #ff98a4;">states</span> = events*news

        <span style="color: #c099ff;">return</span> states
</pre>
</div>

<p>
The last step, <code>compute_alpha</code> is a function that will compute CPIEs
for real or simulated data. The computation of the CPIE depends on the
likelihood function definitions.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #c099ff;">def</span> <span style="color: #82aaff;">_lf</span>(eb, es, n_buys, n_sells):
    <span style="color: #c099ff;">return</span> -eb+n_buys*log(eb)-lfact(n_buys)-es+n_sells*log(es)-lfact(n_sells)

<span style="color: #c099ff;">def</span> <span style="color: #82aaff;">_ll</span>(a, d, eb, es, u, n_buys, n_sells):
    <span style="color: #c099ff;">return</span> np.array([log(a*(<span style="color: #ff995e; font-weight: bold;">1</span>-d))+_lf(eb,es+u,n_buys,n_sells),
                   log(a*d)+_lf(eb+u,es,n_buys,n_sells),
                   log(<span style="color: #ff995e; font-weight: bold;">1</span>-a)+_lf(eb,es,n_buys,n_sells)])

<span style="color: #c099ff;">def</span> <span style="color: #82aaff;">compute_alpha</span>(a, d, eb, es, u, n_buys, n_sells):
    <span style="color: #9ba5db;">'''Compute the conditional alpha given parameters, buys, and sells.</span>

<span style="color: #9ba5db;">    '''</span>
    <span style="color: #ff98a4;">ll</span> = _ll(a, d, eb, es, u, n_buys, n_sells)
    <span style="color: #ff98a4;">llmax</span> = ll.<span style="color: #c099ff;">max</span>(axis=<span style="color: #ff995e; font-weight: bold;">0</span>)
    <span style="color: #ff98a4;">y</span> = exp(ll-llmax)
    <span style="color: #ff98a4;">alpha</span> = y[:-<span style="color: #ff995e; font-weight: bold;">1</span>].<span style="color: #c099ff;">sum</span>(axis=<span style="color: #ff995e; font-weight: bold;">0</span>)/y.<span style="color: #c099ff;">sum</span>(axis=<span style="color: #ff995e; font-weight: bold;">0</span>)

    <span style="color: #c099ff;">return</span> alpha

<span style="color: #c099ff;">def</span> <span style="color: #82aaff;">loglik</span>(theta, n_buys, n_sells):
    <span style="color: #ff98a4;">a</span>,<span style="color: #ff98a4;">d</span>,<span style="color: #ff98a4;">eb</span>,<span style="color: #ff98a4;">es</span>,<span style="color: #ff98a4;">u</span> = theta
    <span style="color: #ff98a4;">ll</span> = _ll(a, d, eb, es, u, n_buys, n_sells)

    <span style="color: #c099ff;">return</span> <span style="color: #c099ff;">sum</span>(logsumexp(ll,axis=<span style="color: #ff995e; font-weight: bold;">0</span>))
</pre>
</div>

<p>
<code>_lf()</code> is a function that represents the Poisson log-likelihood which
is common to each of the three states: good, bad, and no news.
</p>

<p>
<code>_ll()</code> is a function that represents the full vector of
log-likelihoods for the PIN model.
</p>

<p>
<code>compute_alpha()</code> computes CPIEs, using a numerical trick. We compute
the vector of likelihoods by calling <code>_ll()</code>, we get a vector of the
max across the three states, and then we scale the vector of
likelihoods by the max before computing the ratio that represents the
CPIE.
</p>

<p>
Finally, <code>loglik()</code> computes the total likelihood that will be used in
the optimization.
</p>

<p>
At this point you are probably wondering why some these functions are
named with underscores (<code>_</code>) in front, and others are not. In Python
this indicates that these are &ldquo;hidden&rdquo; functions. This is helpful for
users that are exploring the code interactively, as we want them to
only see/interact with the higher-level functions, like
<code>compute_alpha</code> and <code>loglik</code>.
</p>

<p>
The actual estimation is handled by the <code>fit()</code> function.
</p>

<p>
The <code>fit()</code> function does a number of things that are seemingly
complex, but necessary to get the numerical optimization to work well.
</p>

<p>
For instance we have up to 10 random <code>starts</code>, and we will try each
optimization up to <code>maxiter=100</code> times.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #c099ff;">def</span> <span style="color: #82aaff;">fit</span>(n_buys, n_sells, starts=<span style="color: #ff995e; font-weight: bold;">10</span>, maxiter=<span style="color: #ff995e; font-weight: bold;">100</span>,
        a=<span style="color: #ff995e;">None</span>, d=<span style="color: #ff995e;">None</span>, eb=<span style="color: #ff995e;">None</span>, es=<span style="color: #ff995e;">None</span>, u=<span style="color: #ff995e;">None</span>,
        se=<span style="color: #ff995e;">None</span>, **kwargs):

    <span style="color: #ff98a4;">nll</span> = <span style="color: #c099ff;">lambda</span> *args: -loglik(*args) <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">define the negative log likelihood that we will minimize</span>
    <span style="color: #ff98a4;">bounds</span> = [(<span style="color: #ff995e; font-weight: bold;">0.00001</span>,<span style="color: #ff995e; font-weight: bold;">0.99999</span>)]*<span style="color: #ff995e; font-weight: bold;">2</span>+[(<span style="color: #ff995e; font-weight: bold;">0.00001</span>,np.inf)]*<span style="color: #ff995e; font-weight: bold;">3</span> <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">we will do a constrained optimization</span>
    <span style="color: #ff98a4;">ranges</span> = [(<span style="color: #ff995e; font-weight: bold;">0.00001</span>,<span style="color: #ff995e; font-weight: bold;">0.99999</span>)]*<span style="color: #ff995e; font-weight: bold;">2</span> <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">we will define the min-max range for our random guesses</span>

    <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">if we do not have a prior on what the estimates are, we compute them here</span>
    <span style="color: #ff98a4;">a0</span>,<span style="color: #ff98a4;">d0</span> = [x <span style="color: #c099ff;">or</span> <span style="color: #ff995e; font-weight: bold;">0.5</span> <span style="color: #c099ff;">for</span> x <span style="color: #c099ff;">in</span> (a,d)] <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">50% chance of information/news</span>
    <span style="color: #ff98a4;">eb0</span>,<span style="color: #ff98a4;">es0</span> = eb <span style="color: #c099ff;">or</span> np.mean(n_buys), es <span style="color: #c099ff;">or</span> np.mean(n_sells) <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">expected buys/sells = mean of observed buy/sells</span>
    <span style="color: #ff98a4;">oib</span> = n_buys - n_sells
    <span style="color: #ff98a4;">u0</span> = u <span style="color: #c099ff;">or</span> np.mean(<span style="color: #c099ff;">abs</span>(oib)) <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">expected order imbalance = mean of absolute order imbalance</span>

    <span style="color: #ff98a4;">res_final</span> = [a0,d0,eb0,es0,u0] <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">define the vector that will hold all the parameters</span>
    <span style="color: #ff98a4;">stderr</span> = np.zeros_like(res_final) <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">define the vector that will hold our standard errors</span>
    <span style="color: #ff98a4;">f</span> = nll(res_final,n_buys,n_sells) <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">initialize the log likelihood function with the buys/sells data</span>
    <span style="color: #c099ff;">for</span> i <span style="color: #c099ff;">in</span> <span style="color: #c099ff;">range</span>(starts):
        <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">rc is going to be our return code</span>
        <span style="color: #ff98a4;">rc</span> = -<span style="color: #ff995e; font-weight: bold;">1</span>
        <span style="color: #ff98a4;">j</span> = <span style="color: #ff995e; font-weight: bold;">0</span>
        <span style="color: #c099ff;">while</span> (rc != <span style="color: #ff995e; font-weight: bold;">0</span>) &amp; (j &lt;= maxiter):
            <span style="color: #c099ff;">if</span> (<span style="color: #ff995e;">None</span> <span style="color: #c099ff;">in</span> (res_final)) <span style="color: #c099ff;">or</span> i:
                <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">guess parameters</span>
                <span style="color: #ff98a4;">a0</span>,<span style="color: #ff98a4;">d0</span> = [np.random.uniform(l,np.nan_to_num(h)) <span style="color: #c099ff;">for</span> (l,h) <span style="color: #c099ff;">in</span> ranges]
                <span style="color: #ff98a4;">eb0</span>,<span style="color: #ff98a4;">es0</span>,<span style="color: #ff98a4;">u0</span> = np.random.poisson([eb,es,u])
            <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">do actual optimization here</span>
            <span style="color: #ff98a4;">res</span> = op.minimize(nll, [a0,d0,eb0,es0,u0], method=<span style="color: #ff995e;">None</span>,
                              bounds=bounds, args=(n_buys,n_sells))
            <span style="color: #ff98a4;">rc</span> = res[<span style="color: #c3e88d;">'status'</span>]
            <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">see if the optimization step violated any constraints</span>
            <span style="color: #ff98a4;">check_bounds</span> = <span style="color: #c099ff;">list</span>(imap(<span style="color: #c099ff;">lambda</span> x,y: x <span style="color: #c099ff;">in</span> y, res[<span style="color: #c3e88d;">'x'</span>], bounds))
            <span style="color: #c099ff;">if</span> <span style="color: #c099ff;">any</span>(check_bounds):
                <span style="color: #ff98a4;">rc</span> = <span style="color: #ff995e; font-weight: bold;">3</span>
            <span style="color: #ff98a4;">j</span>+=<span style="color: #ff995e; font-weight: bold;">1</span>
        <span style="color: #c099ff;">if</span> (res[<span style="color: #c3e88d;">'success'</span>]) &amp; (res[<span style="color: #c3e88d;">'fun'</span>] &lt;= f):
            <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">if everything worked fine and we have a </span>
            <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">smaller (negative) likelihood then store these parameters</span>
            <span style="color: #ff98a4;">f</span>,<span style="color: #ff98a4;">rc</span> = res[<span style="color: #c3e88d;">'fun'</span>],res[<span style="color: #c3e88d;">'status'</span>]
            <span style="color: #ff98a4;">res_final</span> = res[<span style="color: #c3e88d;">'x'</span>].tolist()
            <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">and compute standard errors</span>
            <span style="color: #ff98a4;">stderr</span> = <span style="color: #ff995e; font-weight: bold;">1</span>/np.sqrt(inv(res[<span style="color: #c3e88d;">'hess_inv'</span>].todense()).diagonal())

    <span style="color: #7a88cf;"># </span><span style="color: #7a88cf;">output the final parameter estimates</span>
    <span style="color: #ff98a4;">param_names</span> = [<span style="color: #c3e88d;">'a'</span>,<span style="color: #c3e88d;">'d'</span>,<span style="color: #c3e88d;">'eb'</span>,<span style="color: #c3e88d;">'es'</span>,<span style="color: #c3e88d;">'u'</span>]
    <span style="color: #ff98a4;">output</span> = <span style="color: #c099ff;">dict</span>(<span style="color: #c099ff;">zip</span>(param_names+[<span style="color: #c3e88d;">'f'</span>,<span style="color: #c3e88d;">'rc'</span>],
                    res_final+[f,rc]))
    <span style="color: #c099ff;">if</span> se:
        <span style="color: #ff98a4;">output</span> = {<span style="color: #c3e88d;">'params'</span>: <span style="color: #c099ff;">dict</span>(<span style="color: #c099ff;">zip</span>(param_names,res_final)),
                  <span style="color: #c3e88d;">'se'</span>: <span style="color: #c099ff;">dict</span>(<span style="color: #c099ff;">zip</span>(param_names,stderr)),
                  <span style="color: #c3e88d;">'stats'</span>:{<span style="color: #c3e88d;">'f'</span>: f,<span style="color: #c3e88d;">'rc'</span>: rc}
                 }
    <span style="color: #c099ff;">return</span> output
</pre>
</div>

<p>
The last function is <code>cpie_mech()</code> which is very simple for <code>EOModel</code>:
a dummy variable for whether observed turnover is higher than the
average.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #c099ff;">def</span> <span style="color: #82aaff;">cpie_mech</span>(turn):
    <span style="color: #ff98a4;">mech</span> = np.zeros_like(turn)
    <span style="color: #ff98a4;">mech</span>[turn &gt; turn.mean()] = <span style="color: #ff995e; font-weight: bold;">1</span>
    <span style="color: #c099ff;">return</span> mech
</pre>
</div>

<p>
The last piece defines the behavior for when you try to run
<code>eo_model.py</code> as a stand-alone script. In this case it simulates an
example PIN model and runs regressions based on the simulated data to
show how the model identifies information. This was part of an older
version of our paper but is useful for building intuition.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #c099ff;">if</span> <span style="color: #c099ff;">__name__</span> == <span style="color: #c3e88d;">'__main__'</span>:

    <span style="color: #c099ff;">import</span> pandas <span style="color: #c099ff;">as</span> pd
    <span style="color: #c099ff;">from</span> regressions <span style="color: #c099ff;">import</span> *

    <span style="color: #ff98a4;">a</span> = <span style="color: #ff995e; font-weight: bold;">0.41</span>