m-clark.github.io/documents.qmd at master · m-clark/m-clark.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
---
title: Content
toc: true
---

![content logo](img/content_logo.svg){width=75%}\

Here you'll find documents of varying technical degree covering things of interest to me, or which I think will be interesting to those I engage with. Generally you'll find a mix of demonstrations on statistical and machine learning topics, programming, and data processing and visualization. Most focus on application in R as that's what I used to primarily program with, but you'll find plenty of Python demonstrations as well. Be aware that some of the content is a bit dated, but even if the programming aspects are a bit off, the concepts should still be relevant.


## My Book!

<span itemscope itemtype='https://schema.org/Book'>
[<span itemprop="name">Models Demystified</span>](https://m-clark.github.io/book-of-models/)

<span itemprop="description">This book is a comprehensive overview of the statistical and machine learning landscape, along with mny other related topics. It is designed to be accessible to those starting out on their data science journey, but also to provide a deeper dive into the concepts for those with more experience. It covers an array of useful models from simple linear regression to deep learning. The book is designed to be a reference for those who want to understand the models and techniques they are using, and to provide a guide for those who want to learn new techniques.</span>


## Long Form Docs

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name">Mixed Models with R</span>](../mixed-models-with-R/)
This document focuses on <span itemprop="keywords">mixed effects models using R</span>, covering basic <span itemprop="keywords">random effects</span> models (<span itemprop="keywords">random intercepts and slopes</span>) as well as extensions into <span itemprop="keywords">generalized mixed models</span> and discussion of realms beyond.
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Bayesian Basics</span>](../bayesian-basics/)
<span itemprop="description">This serves as a conceptual introduction to <span itemprop="keywords">Bayesian</span> modeling with examples using <span itemprop="keywords">R</span> and <span itemprop="keywords">Stan</span>.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Model Estimation by Example</span>](../models-by-example/)
<span itemprop="description">This shows 'by-hand' code for various models and estimation approaches, from <span itemprop="keywords">linear regression</span> to <span itemprop="keywords">Bayesian multilevel mediation models</span>, and demonstrations from <span itemprop="keywords">penalized maximum likelihood</span> to <span itemprop="keywords">stochastic gradient descent</span>.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Generalized Additive Models</span>](../generalized-additive-models/)
<span itemprop="description">An introduction to <span itemprop="keywords">generalized additive models</span> with an emphasis on generalization from familiar linear models and using the <span itemprop="keywords">mgcv</span> package in <span itemprop="keywords">R</span>.
</span>
</span>


<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Introduction to Machine Learning</span>](../introduction-to-machine-learning/)
<span itemprop="description">A gentle introduction to <span itemprop="keywords">machine learning</span> concepts with some application in <span itemprop="keywords">R</span>.  It covers topics such as <span itemprop="keywords">loss functions</span>, <span itemprop="keywords">cross-validation</span>, <span itemprop="keywords">regularization</span>, and <span itemprop="keywords">bias-variance trade-off</span>, techniques such as <span itemprop="keywords">penalized regression</span>, <span itemprop="keywords">random forests</span>, and <span itemprop="keywords">neural nets</span>, and more.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name">Practical Data Science</span>](../data-processing-and-visualization/)
Focus is on common <span itemprop="keywords">data science</span>  tools and techniques in R, including data processing, programming, modeling, visualization, and presentation of results. Exercises may be found in the document, and demonstrations of most content in Python is available via [Jupyter notebooks](https://github.com/m-clark/data-processing-and-visualization/tree/master/jupyter_notebooks).
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Structural Equation Modeling</span>](../sem/)
<span itemprop="description">This document (and related workshop) focuses on <span itemprop="keywords">structural equation modeling</span>.  It is conceptually based, and tries to generalize beyond the standard SEM treatment. Topics include: <span itemprop="keywords">graphical</span> models (<span itemprop="keywords">directed</span> and <span itemprop="keywords">undirected</span>, including <span itemprop="keywords">path analysis</span>, <span itemprop="keywords">bayesian networks</span>, and <span itemprop="keywords">network analysis</span>), <span itemprop="keywords">mediation</span>, <span itemprop="keywords">moderation</span>, <span itemprop="keywords">latent variable</span> models (including <span itemprop="keywords">principal components</span> analysis and '<span itemprop="keywords">factor analysis</span>'), <span itemprop="keywords">measurement</span> models, <span itemprop="keywords">structural equation models</span>, <span itemprop="keywords">mixture models</span>, <span itemprop="keywords">growth curves</span>, <span itemprop="keywords">IRT</span>, <span itemprop="keywords">collaborative filtering</span>/<span itemprop="keywords">recommender systems</span>, <span itemprop="keywords">hidden Markov models</span>, <span itemprop="keywords">multi-group models</span> etc.
</span>
</span>


## Blog Posts

- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Is Boosting Still All You Need for Tabular Data?</span>](../posts/2026-03-01-dl-for-tabular-foundational/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Uncertainty Estimation with Conformal Prediction</span>](../posts/2025-06-01-conformal/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Imbalanced Outcomes</span>](../posts/2025-04-07-class-imbalance/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Deep Linear Models</span>](../posts/2022-09-deep-linear-models/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Exploring Time</span>](../posts/2021-05-time-series/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Programming Odds and Ends</span>](../posts/2022-07-25-programming/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Deep Learning for Tabular Data II</span>](../posts/2022-04-01-more-dl-for-tabular/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">The Double Descent Phenomenon</span>](../posts/2021-10-30-double-descent/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Deep Learning for Tabular Data</span>](../posts/2021-07-15-dl-for-tabular/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Practical Bayesian Analysis (I, II)</span>](../posts/2021-02-28-practical-bayes-part-i/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Micro-macro Models</span>](../posts/2020-08-31-micro-macro-mlm/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Predictions with an Offset</span>](../posts/2020-06-15-predict-with-offset/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Factor Analysis and Related Methods</span>](../posts/2020-04-10-psych-explained/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Convergence Problems in Mixed Models</span>](../posts/2020-03-16-convergence/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Categorical Random Effects</span>](../posts/2020-03-01-random-categorical/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Mixed Models for Big Data</span>](../posts/2019-10-20-big-mixed-models/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Fractional Regression</span>](../posts/2019-08-20-fractional-regression/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Group Comparisons in SEM</span>](../posts/2019-08-05-comparing-latent-variables/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Empirical Bayes</span>](../posts/2019-06-21-empirical-bayes/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Shrinkage in Mixed Models</span>](../posts/2019-05-14-shrinkage-in-mixed-models/)</span>
- <span itemscope itemtype ="https://schema.org/TechArticle">[<span itemprop="name keywords">Mediation Models</span>](../posts/2019-03-12-mediation-models/)</span>


## Statistical

### Models By Example

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Model Estimation by Example</span>](../models-by-example/)
<span itemprop="description">This shows 'by-hand' code for various models and estimation approaches, from <span itemprop="keywords">linear regression</span> to <span itemprop="keywords">Bayesian multilevel mediation models</span>, and demonstrations from <span itemprop="keywords">penalized maximum likelihood</span> to <span itemprop="keywords">stochastic gradient descent</span>.
</span>
</span>

### Modeling in R

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Data Modeling in R</span>](../R-models/)
<span itemprop="description">This document demonstrates a wide array of <span itemprop="keywords">statistical</span> and other <span itemprop="keywords">models</span> in <span itemprop="keywords">R</span>.  Generic code is provided for standard <span itemprop="keywords">regression</span>, <span itemprop="keywords">mixed</span>, <span itemprop="keywords">additive</span>, <span itemprop="keywords">survival</span>, and <span itemprop="keywords">latent variable</span> models, <span itemprop="keywords">principal components</span>, <span itemprop="keywords">factor analysis</span>, <span itemprop="keywords">SEM</span>, <span itemprop="keywords">cluster analysis</span>, <span itemprop="keywords">time series</span>, <span itemprop="keywords">spatial models</span>, <span itemprop="keywords">zero-altered models</span>, <span itemprop="keywords">text analysis</span>, <span itemprop="keywords">Bayesian analysis</span>, <span itemprop="keywords">machine learning</span> and more.
</span>

<span itemprop="description">The document is designed for newcomers to R, whether in a statistical sense, or just a programming one.  It also should appeal to those working in other packages who are curious how to do the same sorts of things in R.</span>


### Bayesian

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Bayesian Basics</span>](../bayesian-basics/)
<span itemprop="description">This serves as a conceptual introduction to <span itemprop="keywords">Bayesian</span> modeling with examples using <span itemprop="keywords">R</span> and <span itemprop="keywords">Stan</span>.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">MCMC algorithms</span>](/docs/ld_mcmc/)
<span itemprop="description">List of MCMC algorithms with brief descriptions.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name">Bayesian Demonstration</span>](https://micl.shinyapps.io/prior2post/)
<span itemprop="description">A simple interactive demonstration for those just starting on their <span itemprop="keywords">Bayesian</span> journey.
</span>
</span>


### Mixed Models

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name">Mixed Models with R</span>](../mixed-models-with-R/)
This workshop focuses on <span itemprop="keywords">mixed effects models using R</span>, covering basic <span itemprop="keywords">random effects</span> models (<span itemprop="keywords">random intercepts and slopes</span>) as well as extensions into <span itemprop="keywords">generalized mixed models</span> and discussion of realms beyond.
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Mixed Models Overview</span>](/docs/mixedModels/mixedModels.html)
<span itemprop="description">An overview that introduces <span itemprop="keywords">mixed models</span> for those with varying technical/statistical backgrounds.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Mixed Models Introduction</span>](/docs/mixedModels/anovamixed.html)
<span itemprop="description">A non-technical document to introduce <span itemprop="keywords">mixed models</span> for those who have used ANOVA.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Clustered Data Situations</span>](../clustered-data/)
<span itemprop="description">A comparison of standard models, <span itemprop="keywords">cluster robust standard errors</span>, <span itemprop="keywords">fixed effect models</span>,  <span itemprop="keywords">mixed models (random effects models)</span>, <span itemprop="keywords">generalized estimating equations (GEE)</span>, and <span itemprop="keywords">latent growth curve models</span> for dealing with clustered data (e.g. <span itemprop="keywords">longitudinal</span>, <span itemprop="keywords">hierarchical</span> etc.).
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Mixed Model Estimation</span>](/docs/mixedModels/mixedModelML.html)
<span itemprop="description">Demonstration of <span itemprop="keywords">mixed models</span> via <span itemprop="keywords">maximum likelihood</span> and link to <span itemprop="keywords">additive models</span>.</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Mixed and Growth Curve Models</span>](../mixed-growth-comparison/)
<span itemprop="description">A comparison of the <span itemprop="keywords">mixed model</span> vs. <span itemprop="keywords">latent variable</span> approach for <span itemprop="keywords">longitudinal data</span> (<span itemprop="keywords">growth curve models</span>), with [simulation](/docs/mixedModels/growth_vs_mixed_sim.html) of performance in situations of small sample sizes.</span>
</span>


### Latent Variables/SEM

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Structural Equation Modeling</span>](../sem/)
<span itemprop="description">This document (and related workshop) focuses on <span itemprop="keywords">structural equation modeling</span>.  It is conceptually based, and tries to generalize beyond the standard SEM treatment. The initial workshop was given to an audience of varying background and statistical skill, but the document should be useful to anyone interested in the techniques covered. It is completely R-based, with special emphasis on the [<span itemprop="keywords">lavaan</span>](https://lavaan.ugent.be/) package. It will continue to be a work in progress, particularly the sections after the <span itemprop="keywords">SEM</span> chapter.  Topics include: <span itemprop="keywords">graphical</span> models (<span itemprop="keywords">directed</span> and <span itemprop="keywords">undirected</span>, including <span itemprop="keywords">path analysis</span>, <span itemprop="keywords">bayesian networks</span>, and <span itemprop="keywords">network analysis</span>), <span itemprop="keywords">mediation</span>, <span itemprop="keywords">moderation</span>, <span itemprop="keywords">latent variable</span> models (including <span itemprop="keywords">principal components</span> analysis and '<span itemprop="keywords">factor analysis</span>'), <span itemprop="keywords">measurement</span> models, <span itemprop="keywords">structural equation models</span>, <span itemprop="keywords">mixture models</span>, <span itemprop="keywords">growth curves</span>.  Topics I hope to provide overviews of in the future include other latent variable techniques/extensions such as <span itemprop="keywords">IRT</span>, <span itemprop="keywords">collaborative filtering</span>/<span itemprop="keywords">recommender systems</span>, <span itemprop="keywords">hidden Markov models</span>, <span itemprop="keywords">multi-group models</span> etc.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Factor Analysis</span> and Related Methods](/docs/FA_notes.html)
<span itemprop="description">This document gives a brief overview of many <span itemprop="keywords">matrix factorization</span>, <span itemprop="keywords">dimension reduction</span>, and <span itemprop="keywords">latent variable</span> techniques. Here is a list:
</span>
</span>

<div class="" style="font-size:75%">

Principal Components Analysis - Factor Analysis - Probabilistic Components Analysis - Non-negative Matrix Factorization - Latent Dirichlet Allocation - Structural Equation Modeling - Item Response Theory - Independent Components Analysis - Multidimensional Scaling - t-Distributed Stochastic Neighbor Embedding (t-sne) - Recommender Systems - Hidden Markov Models - Random Effects Models - Bayesian Approaches - Mixture Models - k-means Cluster Analysis - Hierarchical Cluster Analysis - Latent Class Analysis

</div>


<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Latent Variables</span>, <span itemprop="name keywords">Sum Scores</span>, Single Items](/docs/lv_sim.html)
<span itemprop="description">It is very common to use sum scores of several variables as a single entity to be used in subsequent analysis (e.g. a regression model).  Some may even more use a single variable even though multiple indicators are available. Assuming the multiple measures indicate a latent construct, such typical practice would be problematic relative to using estimated <span itemprop="name keywords">factor scores</span>, either constructed as part of a two-stage process or as part of a <span itemprop="name keywords">structural equation model</span>.  This document covers simulations in which comparisons in performance are made between latent variable and sum score or single item approaches.
</span>
</span>

<span itemscope itemtype ="https://schema.org/ScholarlyArticle https://schema.org/TechArticle">
[Lord's Paradox](/docs/lord/index.html)
<span itemprop="description">Summary of <span itemprop="keywords">Pearl</span>'s technical reports on some modeling situations such as <span itemprop="keywords">Lord's Paradox and Simpson's Paradox</span> that lead to surprising results that are initially at odds with our intuition.  Looks particularly at the issue of change scores vs. controlling for baseline.
</span>
</span>


### Other Statistical

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Generalized Additive Models</span>](../generalized-additive-models/)
<span itemprop="description">An introduction to <span itemprop="keywords">generalized additive models</span> with an emphasis on generalization from familiar linear models and using the <span itemprop="keywords">mgcv</span> package in <span itemprop="keywords">R</span>.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Introduction to Machine Learning</span>](../introduction-to-machine-learning/)
<span itemprop="description">A gentle introduction to <span itemprop="keywords">machine learning</span> concepts with some application in <span itemprop="keywords">R</span>.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Reliability</span>](../reliability/)
<span itemprop="description">An unfinished document that ties together some ideas regarding the statistical and conceptual notion of reliability.</span>.
</span>
</span>


<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Fractional Regression</span>](../posts/2019-08-20-fractional-regression/)
<span itemprop="description">A quick primer regarding data between zero and one, including zero and one</span>.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Categorical Regression Models</span>](/docs/logregmodels.html)
<span itemprop="description">An overview of regression models for <span itemprop="keywords">binary, multinomial, and ordinal outcomes</span>, with connections among various types of models.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Topic Modeling Demo</span>](/docs/topic_models/topic-model-demo.html)
<span itemprop="description">A demonstration of <span itemprop="keywords">Latent Dirichlet Allocation</span> for <span itemprop="keywords">topic modeling</span> in <span itemprop="keywords">R</span>.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name">Comparing Measures of Dependency</span>](/docs/CorrelationComparison.pdf)
<span itemprop="description">A summary of articles that look at various measures of dependency <span itemprop="keywords">Pearson's r</span>, <span itemprop="keywords">Spearman's rho</span>, and <span itemprop="keywords">Hoeffding's D</span>, and newer ones such as <span itemprop="keywords">Distance Correlation</span> and <span itemprop="keywords">Maximal Information Coefficient</span>.</span>
</span>


## Programming

Check the old [workshops](workshops.html) section also for programming-related content.


[Practical Data Science](../data-processing-and-visualization/) (more details about this document below). The intention was to cover five key topics: basic information processing, programming, modeling, visualization, and publication/presentation.

[Exploratory Data Analysis Tools](../exploratory-data-analysis-tools/) An overview of various packages useful for quick exploration of data.

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">FastR</span>](/docs/fastr.html)
<span itemprop="description">A notebook on how to <span itemprop="keywords">make R faster</span> before or irrespective of the machinery used. Topics include <span itemprop="keywords">avoiding loops</span>, <span itemprop="keywords">vectorization</span>, faster <span itemprop="keywords">I/O</span> etc.
</span>
</span>

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name keywords">Engaging the Web with R</span>](../webR/)
<span itemprop="description">Document regarding the use of R for <span itemprop="keywords">web scraping</span>, extracting data via an <span itemprop="keywords">API</span>, <span itemprop="keywords">interactive</span> web-based <span itemprop="keywords">visualizations</span>, and producing <span itemprop="keywords">web-ready documents</span>.  It serves as an overview of ways one might start to use R for web-based activities as opposed to a hand-on approach.
</span>
</span>


## Workshops

I used to give workshops regularly when I worked in academia. Although they generally won't age well, I have kept the content [here](workshops.html) for any that might be interested.


## Miscellaneous

<span itemscope itemtype ="https://schema.org/TechArticle">
[<span itemprop="name">R for Social Science</span>](/docs/RSocialScience.pdf)
<span itemprop="description">This was put together in a couple of days under duress, and is put here in case someone can find it useful (and thus make the time spent on it not completely wasted).
</span>
</span>