cpPaper/paper.Rnw at master · adamSales/cpPaper · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
 \documentclass[notitlepage,12pt]{jedm}

\input{preamble.tex}

<<include=FALSE>>=
library(knitr)
library(lubridate)
library(scales)
library(reshape2)
library(dplyr)
library(ggplot2)
library(xtable)

#library(tikzDevice)
opts_chunk$set(
echo=FALSE, results='asis',cache=TRUE,warning=FALSE,error=FALSE,message=FALSE,autodep = FALSE
    )

#options(tikzDefaultEngine = "pdftex")
@

<<setup,include=FALSE>>=

library(lubridate)
library(scales)
library(reshape2)
library(dplyr)
library(ggplot2)
library(xtable)

addOverallState <- function(dat){
    levels(dat$state) <- c(levels(dat$state),'Overall')
    dat2 <- dat
    dat2$state <- 'Overall'
    dat2 <- rbind(dat,dat2)
    dat2$state <- factor(dat2$state, levels= levels(dat$state))
    dat2
}

@


<<data,include=FALSE,cache=TRUE>>=
if('cpPaper.RData'%in%list.files() & file.info('cpPaper.RData')$mtime>file.info('dataMerge.r')$mtime){
    load('cpPaper.RData')
} else source('dataMerge.r')
@


\begin{document}
\maketitle

\input{intro.tex}

\section{The RAND Effectiveness Trial}\label{sec:RANDtrial}
The study to measure the effectiveness of CTAI included 7 states, 73
high schools, and 74 middle schools with nearly 18,700 high school
students and 6,800 middle school students participating. Schools were enrolled in a
total of 52 school districts that were distributed among urban,
suburban, and rural areas. Schools were matched on a set of
covariates, and then randomly assigned to the treatment or control
group. Schools in the control group continued with their current
algebra curriculum, and schools in the treatment group used Carnegie
Learning's curriculum which includes CTAI textbook materials and sofware. Each school participated for two years, with a different cohort
of students participating the second year (with a small fraction of
students present in the study both years because they repeated
algebra). It should be noted that this study did not include statewide
implementations; the study results cannot be generalized to all
schools within the state. In some states, one large school district
participated, while in other states, a set of smaller school districts
participated. The states included Alabama (AL), Connecticut (CT),
Kentucky (KY), Louisiana (LA), Michigan (MI), New Jersey (NJ), and
Texas (TX). Each state participated in both the middle school and high
school arms of the study, except AL, which participated only in the
middle school arm. The current study focuses on high school students
only.

There are some limitations to the available data for this study.
Log data from some schools, and some students within schools, were
missing either because the log files were not retrievable, or because
of an imperfect ability to link log data to other study data files.
For this reason, this study uses only data from the
\Sexpr{n_distinct(data[['schoolid2']])} treatment schools for which at least
80\%
of students in both study years appear in the log data file.
This sample includes \Sexpr{n_distinct(data[['field_id']])} students, around
\Sexpr{round(n_distinct(data[['field_id']])/n_distinct(stud[['field_id']])*100)}\%
of the treated high-school sample.
Table \ref{tab:nByState} gives the number of students in the sample by
state and year.
The states in the table are ordered by the total number of students
they represent in the sample; they will appear in this order in
all of the forthcoming tables and figures.
Some figures will only show data from a subset of states; since so few
students were in New Jersey, it will be excluded from almost all
state-by-state comparisons (but included analyses that pool across states).
<<nByState,dependson='data'>>=
sampleSizeTable <- data%>%group_by(state,year)%>%summarize(n=n_distinct(field_id))%>%dcast(year~state)
sampleSizeTable <- sampleSizeTable[,-1]
rownames(sampleSizeTable) <- c('Year 1','Year 2')

print(xtable(sampleSizeTable,caption='Numbers of students in the sample by state and study year',label='tab:nByState'),include.rownames=TRUE)
@

In this sample from \Sexpr{n_distinct(data[['schoolid2']])} schools, \Sexpr{sum(data[['obsUsage']]==FALSE)}
students who participated in the RAND study do not appear in the log data; they may have not used the CTAI
software at all, or may have been excluded from the log data for other
reasons.
Since we don't know which is true, we exclude these students from most
analyses.

It is likely that some usage data were missing, even for students who
appear in the usage dataset.
However, it is impossible to know in which cases these data were
missing or why; for the most part, we ignore this problem, but it
should be kept in mind nonetheless.

\section{Standard and Customized Curricula}\label{sec:curricula}

\begin{figure}
  \centering
<<curricula,dependson=c('setup','data'),fig.width=6.4,fig.height=3>>=

bbb <- addOverallState(data)%>%filter(!is.na(Curriculum) & State!='NJ')%>%
    group_by(state,Curriculum,overall,Yr)%>%
    summarize(n=n())
staten <- addOverallState(data)%>%filter(!is.na(Curriculum) & State!='NJ')%>%group_by(state,Yr)%>%summarize(n=n())

for(i in 1:nrow(bbb)) bbb$n[i] <- bbb$n[i]/staten$n[staten$state==bbb$state[i] & staten$Yr==bbb$Yr[i]]

COLS <- c('#1b9e77','#d95f02','#7570b3')

ggplot(bbb,aes(Yr,n,fill=Curriculum,alpha=overall))+geom_col()+facet_grid(~state)+scale_alpha_manual(values=c(1,0.5))+scale_fill_manual(values=COLS)+scale_y_continuous(labels=percent)+labs(x='',y='% of Problems Worked',alpha='')


@
\caption{Percentage of worked problems coming from various courses
  (denoted by color, with Algebra II and Geometry bundled as ``$>$Algebra I''), from
  standard and customized variants, denoted by shading.}
\label{fig:curricula}
\end{figure}

Students' automatic progress through the Cognitive Tutor (CT) software
is normally governed by
the sequences of sections and units embedded in the software.
Without external meddling, the curriculum a student works on
determines the sequence, and thus what section he or she will be directed towards next after
mastering (or exhausting the problems) from a previous section.
In the CTAI effectiveness trial, the most common curriculum was,
naturally, Algebra I.
This came with three closely related variants, due to new software releases.
Students requiring more remediation were able to work on a less
advanced curriculum, called ``Bridge to Algebra,'' and more advanced
students could work on Algebra II or Geometry.

In the second year of the study, some high schools, primarily in
Texas, Michigan, and Kentucky, requested customized variants of the
curricula.
This was typically due to state standards, testing schedules, or local
scope and sequence guidelines.
These ``customized curricula'' altered the order of some sections and
units, and were usually particular to schools.

Figure \ref{fig:curricula} shows the percentage of worked problems
from each curriculum, from standard and customized varieties, by state
and year.
First, note that the vast majority of worked problems were from the
Algebra I sequence.
A small but notable number of less advanced problems were worked in
Kentucky in year 2, and some more advanced problems were worked in
Michigan and Louisiana.
Secondly, note the rise in ``customized curricula'' in year 2 in
Texas, Kentucky, and Michigan, the three states with the most students
in our dataset.
In particular, Texas shifted almost entirely to customized curricula
from years 1 to 2.


Throughout the school year, teachers could have a class of students
working on multiple curricula either sequentially, where the students
changed curricula in lock step, or simultaneously, where students
worked on different curricula at the same time. As an example, two
teachers located in Kentucky had their students working on Algebra I
throughout most of the year and then reassigned them to Algebra II in
the last month of school. In contrast, a different teacher in Kentucky had students variously
enrolled in three different curricula throughout the entire year
(Bridge-to-Algebra, Algebra I, and a customized Geometry curriculum),
while a year 2 teacher in Michigan enrolled students in
three curricula sequentially throughout the year: Algebra I
until November, followed by a customized curriculum until February, and
ending with a different customized curriculum until June. While there are numerous instances of these
uses of multiple curricula in year 2, there are also many occurrences
of teachers who had their students enrolled in the standard Algebra I
throughout the entire year, including all Connecticut teachers.
There were also teachers, mostly in Texas, who used customized
curricula exclusively throughout the
second year.

\section{Student Usage Across States and Years}\label{sec:usage}

<<usageMedians,dependson=c('data','setup'),results='asis'>>=
secByStud <- data%>%group_by(state,Yr,field_id) %>% summarize(numSec=n_distinct(section,na.rm=TRUE),
                                                              numUnit=n_distinct(unit,na.rm=TRUE),
                                                              time=sum(total_t1,na.rm=TRUE),
                                                              mastered=n_distinct(section[status=='graduated'],na.rm=TRUE),
                                                              nprob=n_distinct(unit,section,Prob1,na.rm=TRUE))

secByStud$time <- secByStud$time/3600000

secByStud <- within(secByStud,{
    numUnit[numUnit==0] <- NA
    numSec[numSec==0] <- NA
    time[time==0] <- NA
    mastered[mastered==0] <- NA
})

secByStud2 <- addOverallState(secByStud)

secByStud2 <-
  secByStud2 %>%
  group_by(state,Yr) %>%
  mutate(outlier.time = time > quantile(time,0.75,na.rm=TRUE) + IQR(time,na.rm=TRUE) * 1.5,
         outlier.sec = numSec > quantile(numSec,0.75,na.rm=TRUE) + IQR(numSec,na.rm=TRUE) * 1.5,
         outlier.prob = nprob > quantile(nprob,0.75,na.rm=TRUE) + IQR(nprob,na.rm=TRUE) * 1.5,
         outlier.unit = numUnit > quantile(numUnit,0.75,na.rm=TRUE) + IQR(numUnit,na.rm=TRUE) * 1.5
         ) %>%
  ungroup


tab <- ungroup(secByStud)%>%group_by(Yr)%>%dplyr::select(time,nprob,numSec,numUnit)%>%summarize_all(median,na.rm=TRUE)
tab$Yr <- NULL
tab <- as.data.frame(tab)
rownames(tab) <- c('Year 1','Year 2')
names(tab) <- c('Hours','Problems','Sections','Units')

xtable(tab,caption='Median numbers of hours, problems, sections, and units worked by each student in the dataset in the two years of the study. Students with no usage data were excluded.',label='tab:medUsage',digits=c(1,2,0,1,0))
@


Table \ref{tab:medUsage} shows the median numbers of hours,
problems, sections, and units worked on by each student in the dataset
in the two years of the study.
Apparently, usage decreased markedly in the second year: the median
of hours worked decreased by
\Sexpr{round(tab['Year 1','Hours']-tab['Year 2','Hours'])}, the median number of problems decreased by
\Sexpr{tab['Year 1','Problems']-tab['Year 2','Problems']}, and
the median number of sections decreased by
\Sexpr{tab['Year 1','Sections']-tab['Year 2','Sections']} from
years 1 to 2.
Yet, as discussed below, the median number of units worked increased
by \Sexpr{tab['Year 2','Units']-tab['Year 1','Units']}.


\begin{figure}
\centering
<<usageTime, fig.height=3,fig.width=6,dependson=c('data','setup','usageMedians')>>=


print(timeStateYear <- ggplot(filter(secByStud2,state!='NJ'),aes(Yr,time))+geom_boxplot(outlier.shape=NA)+
          geom_jitter(data = function(x) dplyr::filter_(x, ~ outlier.time), width=0.2)+
          facet_grid(~state)+coord_cartesian(ylim=c(0,110))+labs(x='',y='Hours on CT Software per Student'))
                                        #ggsave('timeStatYear.jpg',width=6,height=3)


@
\caption{Boxplots of hours each student spent on Cognitive Tutor
  software over by year and state. Students  with no timestamp data
  ($n=$\Sexpr{sum(is.na(secByStud[['time']]))}), with anomalous negative time
  ($n=$\Sexpr{sum(secByStud[['time']]<0,na.rm=TRUE)}) or with more than 110 hours ($n=$\Sexpr{sum(secByStud[['time']]>110,na.rm=TRUE)})
  were excluded.}
\label{fig:timeByStud}
\end{figure}

<<supplementalFigs1,include=FALSE,dependson=c('data','setup','usageMedians')>>=
secStateYear <- ggplot(filter(secByStud2,state!='NJ'),aes(Yr,numSec))+geom_boxplot(outlier.shape=NA)+
          geom_jitter(data = function(x) dplyr::filter_(x, ~ outlier.sec), width=0.2)+
          facet_grid(~state)+labs(x='',y='# of Sections Worked per Student')+coord_cartesian(ylim=c(0,200))
ggsave('secStateYear.jpg',width=6,height=3)

probStateYear <- ggplot(filter(secByStud2,state!='NJ'),aes(Yr,nprob))+geom_boxplot(outlier.shape=NA)+
          geom_jitter(data = function(x) dplyr::filter_(x, ~ outlier.prob), width=0.2)+
          facet_grid(~state)+labs(x='',y='# of Problems Worked per Student')+coord_cartesian(ylim=c(0,2000))
ggsave('probStateYear.jpg',width=6,height=3)

@


Figure \ref{fig:timeByStud} shows that the number of hours students
spent working on the CT software in some more detail, via
state-by-year boxplots.
Analogous figures for the numbers of problems and sections students
worked,
showed similar patterns.
Usage time varied substantially between students and across states and
years.
Students in Texas,
Connecticut, and New Jersey worked far fewer hours than students in
Kentucky, Louisiana, and Michigan.
Not every state reduced its usage from years 1 to 2---while students
in Texas, Kentucky and New Jersey used the software less in the second
year than in the first, students in Michigan, Louisiana, and
Connecticut increased their usage.

Overall, usage varied a bit more in year 2 than in year
1---the median absolute deviation of time spent was \Sexpr{round(mad(secByStud[['time']][secByStud[['Yr']]=='Yr 1'],na.rm=TRUE,constant=1),1)} hours in the first year, compared to \Sexpr{round(mad(secByStud[['time']][secByStud[['Yr']]=='Yr 2'],na.rm=TRUE,constant=1),1)} in  the second year.
The increase in variation seems to be driven both by increasing
between-state variation, and a between-student increase in Louisiana.
One intriguing possibility is that the amount of CT usage may have been better tailored to
teachers and students in the second year than in the first. Perhaps
usage increased for
students who stood to gain more from the software and decreased for
students who stood to gain less.

In contrast to the decreasing numbers of hours, problems, and sections students
worked in year 2, Table \ref{tab:medUsage} shows that the median number of units
students worked increased by
\Sexpr{round(tab['Year 2','Units']-tab['Year 1','Units'])}
in year 2.
This suggests students in year 2 were exposed, on average, to a
slightly wider range of topics.
Figure \ref{fig:unitsByStud} shows boxplots of the numbers of units
worked by state and year.
The geographic variation in units worked mirrors the pattern in Figure
\ref{fig:timeByStud}, with more usage in Kentucky, Michigan, and
Louisiana but less in Texas, Connecticut, and New Jersey.
However, in every state the median year 2 student worked at least as
many different units as the median year 1 student.
Variation in the number of units worked also increased slightly
from years 1 to 2---the interquartile range (IQR) increased in every state
except for Kentucky, where a decrease in IQR was accompanied by an
increase in the number of outliers.


\begin{figure}
\centering
<<unitsWorked,dependson=c('usageMedians','data','setup'),fig.width=6,fig.height=3>>=
secByStud2 <-
  secByStud2 %>%
  group_by(state,Yr) %>%
  mutate(outlier = numUnit > quantile(numUnit,0.75,na.rm=TRUE) + IQR(numUnit,na.rm=TRUE) * 1.5) %>%
  ungroup

print(unitsStateYear <- ggplot(filter(secByStud2,state!='NJ'),aes(Yr,numUnit))+geom_boxplot(outlier.shape=NA)+geom_jitter(data = function(x) dplyr::filter_(x, ~ outlier), width=0.2)+facet_grid(~state)+labs(x='',y='# Units Worked Per Student')+coord_cartesian(ylim=c(0,55)))
@
\caption{Boxplots of the number of units of Cognitive Tutor
  software each student worked, by year and state. Students
  working more than 55 units (
  \Sexpr{sum(secByStud[['numUnit']]>55,na.rm=TRUE)} of
  \Sexpr{nrow(secByStud)}) and students
  with no usage data (\Sexpr{sum(is.na(secByStud[['numUnit']]))})
  were excluded.}
\label{fig:unitsByStud}
\end{figure}

All in all, students used CT software less in year 2 than in year
1.
On the other hand, students in the second year tended to see a
slightly wider range of topics, and varied somewhat more in their usage.


\section{Working Units in Order---Or Not}\label{sec:order}

Overall, students used CT less in the second year than in the
first.
How was this difference distributed across CTAI units?

\begin{figure}
  \centering
<<whichUnits,fig.height=4,fig.width=6,dependson=c('data','setup')>>=

curricula <- read.csv('~/Box Sync/CT/data/sectionLevelUsageData/RAND_study_curricula.csv',stringsAsFactors=FALSE)
curricula <- subset(curricula,curriculum_name=='algebra i')
curricula$unit <- tolower(curricula$unit)
curricula <- subset(curricula,unit%in%intersect(curricula$unit[curricula$ct=='2007'],curricula$unit[curricula$ct=='2008r1']))
units <- curricula$unit[curricula$ct=='2007']
sectionStats <- read.csv('~/Box Sync/CT/data/sectionLevelUsageData/section_stats_withAbb.csv',stringsAsFactors=FALSE)
UnitName <- sectionStats$unit_name_abb[match(units,sectionStats$unit_id)]
UnitName[units=='inequality-systems-solving'] <- 'Systems of Lin. Ineq.'
UnitName[units=='intro-pythag-theorem'] <- 'Pythagorean Theorem'
UnitName[units=='linear-inequality-graphing'] <- 'Graphs of Lin. Ineq.'
UnitName[units=='linear-systems-solving'] <- 'Systems of Lin. Eq. Solving'
UnitName[units=='probability'] <- 'Probability'
UnitName[units=='unit-conversions'] <- 'Unit Conversions'

nstud <- data%>%filter(!is.na(unit))%>%group_by(Year)%>%summarize(nstud=n_distinct(field_id))
data$Unit <- data$unit
data$Unit[grep('unit-conversions',data$Unit)] <- 'unit-conversions'

unitLevel <- data%>%filter(Unit%in%units)%>%group_by(Unit,Year)%>%summarize(numWorked= n_distinct(field_id,na.rm=TRUE),numCP=sum(status=='changed placement',na.rm=TRUE),meanCP=mean(status=='changed placement',na.rm=TRUE))

unitLevel$perWorked <- unitLevel$numWorked/nstud$nstud[match(unitLevel$Year,nstud$Year)]

unitLevel$Unit <- factor(unitLevel$Unit,levels=units)
levels(unitLevel$Unit) <- UnitName

unitLevel$year <- factor(ifelse(unitLevel$Year=='Year 1',1,2))
print(unitsWorked <- ggplot(unitLevel,aes(x=Unit,y=perWorked,color=year,group=year))+geom_point()+geom_line()+theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5,size=10),legend.text=element_text(size=10),legend.position=c(.96,.82))+labs(x='',y='% Worked',color='Year')+scale_y_continuous(labels=percent))

@
\caption{The percentages of students with usage data who worked at
  least one problem from each unit in the Algebra I curriculum. The
  units are arranged in order for the standard curriculum. In the 2008
version of the software, the ``Unit Conversions'' unit was broken up
into two smaller units; for the sake of between-year comparisons, we
re-combined them.}
\label{fig:unitsWorked}
\end{figure}


Figure \ref{fig:unitsWorked} shows the units of algebra along the
horizontal axis, according to their order in the standard CTAI
curriculum.
The vertical axis shows the percentage of students with usage data who
worked each unit.

In year 1, the curve is almost monotonically decreasing, as one would
expect if students adhered to the curriculum.
Students varied in the number of units they worked---with the variation due
to both student ability and the amount of time allocated to CTAI
within a classroom---but they mostly followed the standard curriculum.
Students who worked fewer units stopped earlier in the sequence, and those who worked more units progressed farther.
Hence, earlier sections were worked by higher proportions of students
than later units.

In contrast, in year 2 students were much more likely to depart from the
standard unit order.
For instance, Figure \ref{fig:unitsWorked} suggests that some students
skipped ``Unit Conversions'' to work on  ``1st
Quadrant Linear Graphs'' or skipped ``1 step Linear Equations'' to work on ``Independent Variables in Linear Models''
In both these cases, the subsequent unit was worked on by a greater
proportion of students than the immediately prior unit.

Most strikingly, ``Linear Equations with Variables on Both Sides'' was worked by a
greater proportion of students in year 2 than in year 1, and by a
greater proportion of students than any of the previous six sections.
Presumably teachers and administrators wanted students to focus on
that unit, perhaps because they found it to be particularly effective,
because students tend to struggle with its main topic, or because its
topic may figure prominently in an upcoming standardized test.

\begin{figure}
  \centering
<<unitsWorkedCust,dependson=c('data','setup','whichUnits'),fig.height=4,fig.width=6>>=
### by customized curriculum (at school level)
cust <- data%>%filter(Year=='Year 2')%>%group_by(schoolid2,state)%>%summarize(cust=mean(overall=='Customized',na.rm=T))%>%arrange(cust)
data$cust <- ifelse(data$schoolid2%in%cust$schoolid2[cust$cust>0.8],'Customized','Standard')

nstudCust <- data%>%filter(!is.na(unit) & Year=='Year 2')%>%group_by(cust)%>%summarize(nstud=n_distinct(field_id))
unitLevelCust <- data%>%filter(Unit%in%units & Year=='Year 2' )%>%group_by(Unit,cust)%>%summarize(numWorked= n_distinct(field_id,na.rm=TRUE),numCP=sum(status=='changed placement',na.rm=TRUE),meanCP=mean(status=='changed placement',na.rm=TRUE))

unitLevelCust$perWorked <- unitLevelCust$numWorked/nstudCust$nstud[match(unitLevelCust$cust,nstudCust$cust)]

unitLevelCust$Unit <- factor(unitLevelCust$Unit,levels=units)
levels(unitLevelCust$Unit) <- UnitName
ggplot(unitLevelCust,aes(x=Unit,y=perWorked,color=cust,group=cust))+geom_point()+geom_line()+theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5,size=10),legend.text=element_text(size=10),legend.position=c(.90,.82))+labs(x='',y='% Worked',color='Curriculum',title='Year 2')+scale_y_continuous(labels=percent)
@
\caption{The percentages of year-2 students with usage data who worked at
  least one problem from each unit in the Algebra I curriculum. The
  units are arranged in order for the standard curriculum. Students
  are divided between those attending schools using primarily a customized
  curriculum and those using primarily the standard Algebra I curriculum.}
\label{fig:unitsWorkedCust}
\end{figure}

Most of the variation in unit order was driven by the rise, in year 2,
of customized curricula.
Figure \ref{fig:unitsWorkedCust} divides year-2 students into those
attending schools using primarily a customized curriculum, and those
attending schools using primarily a standardized
curriculum.\footnote{At least
  \Sexpr{trunc(100*min(cust[['cust']][cust[['cust']]>0.8]))}\% of
  problems worked by students at ``Customized'' schools were from a customized curriculum,
  and at most
  \Sexpr{ceiling(100*max(cust[['cust']][cust[['cust']]<0.8]))}\% of
  problems at ``Standard'' schools were from a customized curriculum.}
Students using a standardized curriculum followed the standard
sequence---more or less---while students using customized curricula
did not.
That said, there were some order violations in the standard group:
specifically, more students worked problems from units ``2-step Linear
Equations'' and ``Exponents'' than worked the preceding sections;
this suggests that some teachers used the reassignment tool to
prioritize particular topics.
Of course, teacher reassignment may have occurred in schools with customized
curricula as well---a possibility we will discuss in the next
section.

\begin{figure}
  \centering
<<unitsBySchool,dependson=c('data','setup','whichUnits'),fig.height=7,fig.width=6>>=
### by school
nstudSch <- data%>%filter(!is.na(unit) & Year=='Year 2')%>%group_by(schoolid2,state)%>%summarize(nstud=n_distinct(field_id))
unitLevelSch <- with(filter(data,Unit%in%units & Year=='Year 2'),expand.grid(Unit=unique(Unit),schoolid2=unique(schoolid2)))
unitLevelSch$state <- data$state[match(unitLevelSch$schoolid2,data$schoolid2)]
unitLevelSch$cust <- data$cust[match(unitLevelSch$schoolid2,data$schoolid2)]
unitLevelSch$numWorked <- with(filter(data,Year=='Year 2'),vapply(1:nrow(unitLevelSch),function(i)
    n_distinct(field_id[schoolid2==unitLevelSch$schoolid2[i] & Unit==unitLevelSch$Unit[i]]),1))

#unitLevelSch <- data%>%filter(Unit%in%units & Year=='Year 2')%>%group_by(Unit,schoolid2,cust,state)%>%summarize(numWorked= n_distinct(field_id,na.rm=TRUE),numCP=sum(status=='changed placement',na.rm=TRUE),meanCP=mean(status=='changed placement',na.rm=TRUE))

unitLevelSch$perWorked <- unitLevelSch$numWorked/nstudSch$nstud[match(unitLevelSch$schoolid2,nstudSch$schoolid2)]

unitLevelSch$Unit <- factor(unitLevelSch$Unit,levels=units)
levels(unitLevelSch$Unit) <- UnitName

ggplot(filter(unitLevelSch,state!='NJ'),aes(x=Unit,y=perWorked,color=schoolid2,group=schoolid2,linetype=cust))+geom_point(size=.5)+geom_line()+theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5,size=10),legend.text=element_text(size=10))+
                                        #,legend.position=c(.96,.82))+
    facet_grid(state~.)+scale_y_continuous(labels=percent)+scale_color_discrete(guide=FALSE)+
    labs(x='',y='% Worked',linetype='Curriculum',title='Year 2')
@
\caption{The percentages of year-2 students with usage data in each
  school who worked at least one problem from each unit in the Algebra I curriculum. The
  units are arranged in order for the standard curriculum. Schools are
  classified as either using primarily customized curricula (solid
  line) or using primarily the standard Algebra I curriculum (dotted).}
\label{fig:unitsBySchool}
\end{figure}

Figure \ref{fig:unitsBySchool} further decomposes the year-2 results
by school and state, showing a large amount of variation between
states, as well as variation between schools within states.
In Texas, every school used customized curricula, most of which seem to
prioritize some of the same units, for instance, ``Linear Patterns,''
``Independent Variables in Linear Models,'' and
``Linear Equations with Variables on Both Sides''
On the other hand, there was also variance between schools.
For instance, one school prioritized units ``2 Step Linear Equations''
and ``4-Quadrant Linear Graphs'' while nearly eliminating ``Linear Patterns.''

Between-school variation is evident in the other states, as well.
In four of the five Kentucky schools, nearly every student worked on
the first nine units; in the one Kentucky school that used a
customized curriculum, nearly every student worked on the first 13
units, omitted the 15th (``Lin. Mod. in General Form''), and worked on
the 16th and 17th (``Literal Equations'' and ``Linear Equations with Variables on Both Sides'').
In the remaining school, nearly every student worked on the first
section, but usage decreased rapidly from there.
In one Michigan school which used the standard curriculum,
no students seem to have worked on the ``Linear Models \& Ratios'' section.

If unit order and topic scaffolding are important to CT's mastery
learning mechanism, the wide variation in students' realized curricula
would seem to pose a problem.
The fact that the prescribed order was followed less in the second
year of the study, when CTAI was effective, than in the first year,
when it wasn't, suggests that the standard curriculum may play a
smaller role than one might otherwise imagine.

\section{Mastering the Material---Or Not}\label{sec:mastery}
The central idea behind mastery learning is that students progress
through the curriculum as they master skills.
In the context of CT, skills are clustered within sections, which are
in turn clustered within units.
Students progress from the current section to the next section after
mastering all of the current section's skills.
Ideally, students would master all of the skills in all of the
sections they work.

By default, the software operates by automatically moving students
from section to section based on the sequence of topics defined by the
curriculum they were currently enrolled in. In this
software-controlled sequencing, students ideally spend the time
necessary to learn the material of a section, are judged by the
software to have mastered the material, and then ``graduate'' to the
next section.
However, the software will also ``promote'' a student to the next
section if the student exhausts a section's material without mastering
its skills.
Additionally, teachers are able to modify a student's path within the curriculum.
They can ``reassign'' students from their current sections to other
sections earlier or later in the intended sequence, including sections
they worked on previously.
Finally, if the semester ends, or a student stops using CT for some
other reason, while in the middle of working through a
section, the section is designated ``final.''
All in all, each CT section a student encounters ends in one of four possible ways:
mastery, promotion, reassignment, or as the student's final section.

\begin{figure}
  \centering
<<overallStatus,dependson=c('data','setup'),fig.height=3,fig.width=6>>=
statusPerSec <- data%>%
    filter(!is.na(Curriculum) &!(state=='MI'&Curriculum=='Customized'&year==1))%>%
    group_by(field_id,Yr,state,section,Curriculum,status)%>%
        summarize(cp=any(status=='changed placement'))%>%group_by(state,Curriculum,Yr,field_id)%>%summarise(pcp=mean(cp,na.rm=TRUE),ncp=sum(cp,na.rm=TRUE))

statusOverall <-  data%>%filter(!is.na(status))%>%
    group_by(field_id,Yr,state,unit,section,Curriculum,overall)%>%summarize(status=max(status))

statusStateYr <- addOverallState(statusOverall) %>% group_by(state,Yr)%>%
    summarize(pgrad=mean(status=='graduated'),pfoi=mean(status=='final_or_incomplete'),
              pcp=mean(status=='changed placement'),pprom=mean(status=='promoted'))%>%
    melt(measure.vars=c('pgrad','pfoi','pcp','pprom'))

levels(statusStateYr$variable) <- list(Final='pfoi',Reassigned='pcp',Promoted='pprom',Mastered='pgrad')


 ggplot(filter(statusStateYr,state!='NJ'),aes(Yr,value,fill=variable))+geom_col()+facet_grid(~state)+labs(y='% of Sections Worked',x='',fill='Exit Status')+scale_y_continuous(labels=percent)

@
\caption{The distributions of outcomes of worked sections, by state and across the
  entire sample, in the two study years.}
\label{fig:overallStatus}
\end{figure}

\begin{table}
  \centering
 \begin{tabular}{rllllll}%|llllll}

&   \multicolumn{6}{c}{Year 1}\\%&\multicolumn{6}{c}{Year 2}\\
<<statusTab,dependson='overallStatus',results='asis'>>=
statusStateYr2 <- addOverallState(statusOverall)
statusStateYr2a <- statusStateYr2%>%group_by(state,Yr)%>%
    summarize(pgrad=sum(status=='graduated'),pfoi=sum(status=='final_or_incomplete'),
              pcp=sum(status=='changed placement'),pprom=sum(status=='promoted'))%>%
    melt(measure.vars=c('pgrad','pfoi','pcp','pprom'))
statusStateYr2b <- statusStateYr2%>%group_by(state,Yr)%>%
    summarize(pgrad=mean(status=='graduated'),pfoi=mean(status=='final_or_incomplete'),
              pcp=mean(status=='changed placement'),pprom=mean(status=='promoted'))%>%
    melt(measure.vars=c('pgrad','pfoi','pcp','pprom'))


levels(statusStateYr2a$variable) <- list(Final='pfoi',Reassigned='pcp',Promoted='pprom',Mastered='pgrad')
levels(statusStateYr2b$variable) <- list(Final='pfoi',Reassigned='pcp',Promoted='pprom',Mastered='pgrad')

tab1 <- dcast(subset(statusStateYr2a,Yr=='Yr 1'&state!='NJ'),variable~state)
tab2 <- dcast(subset(statusStateYr2a,Yr=='Yr 2'&state!='NJ'),variable~state)
#tab <- cbind(tab1,tab2[-1])

tab1b <- dcast(subset(statusStateYr2b,Yr=='Yr 1'&state!='NJ'),variable~state)
tab2b <- dcast(subset(statusStateYr2b,Yr=='Yr 2'&state!='NJ'),variable~state)


tab1$variable <- as.character(tab1$variable)
tab2$variable <- as.character(tab2$variable)
cat('&')
cat(names(tab1)[-1],sep='&')
cat('\\\\')
for(i in 1:nrow(tab1)){
    cat(tab1[i,1],round(unlist(tab1[i,-1])),sep='&')
    cat('\\\\')
}
cat('Total',round(colSums(tab1[,-1])),sep='&')
cat('\\\\')
cat('&\\multicolumn{6}{c}{Year 2}\\\\ \n')
for(i in 1:nrow(tab2)){
    cat(tab2[i,1],round(unlist(tab2[i,-1])),sep='&')
    cat('\\\\')
}
cat('Total',round(colSums(tab2[,-1])),sep='&')
cat('\\\\')

rownames(tab1) <- rownames(tab2) <- tab1$variable
tab1 <- as.matrix(tab1[,-1])
tab2 <- as.matrix(tab2[,-1])

rownames(tab1b) <- rownames(tab2b) <- tab1b$variable
tab1b <- as.matrix(tab1b[,-1])
tab2b <- as.matrix(tab2b[,-1])


@

\end{tabular}
\caption{Numbers of worked sections that ended in each of the
  four possible outcomes, across states and study years.}
\label{tab:overallStatus}
\end{table}

Figure \ref{fig:overallStatus} and Table \ref{tab:overallStatus} show
the proportions of worked sections in each state and study year that
ended with mastery, promotion, or reassignment, or as the
student's final section.
In the first year, about
\Sexpr{round(mean(tab1b['Mastered',c('TX','KY','MI','LA')])*100)}\% of
worked sections are mastered, except in Connecticut.
Other than in Texas, about
\Sexpr{paste(round(range(tab1b['Promoted',c('CT','KY','MI','LA')])*100),collapse='--')}\%
of sections end in promotion.
About \Sexpr{round(tab1b['Reassigned','TX']*100)}\% of sections in Texas
and \Sexpr{round(tab1b['Reassigned','CT']*100)}\%
in Connecticut end in reassignment, which is even rarer in the other states.

With the exception of Texas, sections tended to be completed similarly
in both years.
In Texas, however, the percentage of sections ending in reassignment increased by a factor
of about about
\Sexpr{round(tab2b['Reassigned','TX']/tab1b['Reassigned','TX'])},
to about \Sexpr{round(tab2b['Reassigned','TX']*100)}\%.
The proportion of Texas sections labeled ``Final'' increased as
well---the expected result of decreasing the overall number of worked
sections and holding fixed the likelihood of ending
usage while in the middle of a section.

Across states, sections ended in reassignment at a rate of about
\Sexpr{round(tab1b['Reassigned','Overall']*100)}\% in year 1 and
\Sexpr{round(tab2b['Reassigned','Overall']*100)}\% in year 2.

\subsection{Section Mastery and Curriculum}
\begin{figure}
  \centering
<<statusCur,dependson='overallStatus',fig.height=3,fig.width=6>>=
statusOverall$Curr2 <- with(statusOverall,
                        ifelse(Curriculum=='Algebra I',
                                     ifelse(overall=='Standard','Algebra I','Algebra I (Cust.)'),
                                     as.character(Curriculum)))
statusCurr <- statusOverall%>%filter(!is.na(Curriculum))%>% group_by(Curr2,Yr)%>%
    summarize(pgrad=mean(status=='graduated'),pfoi=mean(status=='final_or_incomplete'),
              pcp=mean(status=='changed placement'),pprom=mean(status=='promoted'))%>%
    melt(measure.vars=c('pgrad','pfoi','pcp','pprom'))
levels(statusCurr$variable) <- list(Final='pfoi',Reassigned='pcp',Promoted='pprom',Graduated='pgrad')
statusCurr$Curr2 <-
    factor(statusCurr$Curr2,levels=c('Bridge-to-Algebra','Algebra I','Algebra I (Cust.)','>Algebra I'))
levels(statusCurr$Curr2)[1] <- 'Bridge-to-Alg.'
print(ggplot(statusCurr,aes(Yr,value,fill=variable))+geom_col()+facet_grid(~Curr2)+
      labs(y='% of Sections Worked',x='',fill='Exit Status')+scale_y_continuous(labels=percent))

@
\caption{The distributions of outcomes of worked sections, by
  curriculum, in the two study years. (There were no Bridge-to-Algebra
sections in customized curricula in our dataset.)}
\label{fig:statusCur}
\end{figure}

A well-designed curriculum, can, in theory, play an important role in
students' attainment of mastery.
Students who work on appropriate problems that build on their current
set of skills should be more likely to master new skills than students
working on problems above their level.
What role did variations in the CT curriculum play in mastery during the effectiveness trial?

Figure \ref{fig:statusCur} shows the proportions of worked sections
that were mastered or ended in promotion, reassignment, or finality, in
standard and customized versions of each CT curriculum.
Mastery proportions do, indeed, depend on curriculum.
Specifically, students mastered sections from more advanced curricula
less frequently.
Sections from the most basic curriculum, Bridge to Algebra, were
mastered
\Sexpr{round(
  mean(statusOverall$status[statusOverall$Curriculum=='Bridge-to-Algebra']=='graduated',na.rm=T)*100)}\%
of the time;
those from Algebra I were mastered
\Sexpr{round(
  mean(statusOverall$status[statusOverall$Curriculum=='Algebra I']=='graduated',na.rm=T)*100)}\%
of the time, and those from more advanced curricula were mastered
at a rate of
\Sexpr{round(
  mean(statusOverall$status[statusOverall$Curriculum=='>Algebra I']=='graduated',na.rm=T)*100)}\%.
This is unsurprising, since more advanced curricula may be expected
to be more challenging.
However, it may suggest that some students studying advanced topics
would fare better in more standard curricula.

Algebra I sections from customized curricula tended to end in
reassignment more often than sections from the standard Algebra I
curriculum (\Sexpr{round(100*subset(statusCurr,Curr2=='Algebra I (Cust.)' &
  variable=='Reassigned')[['value']])}\% vs.
\Sexpr{round(100*subset(statusCurr,Curr2=='Algebra I' &
  variable=='Reassigned' & Yr=='Yr 2')[['value']],1)}\%, in year 2).
This may indicate an overall skepticism towards the Carnegie Learning standards
among certain schools and teachers, manifested in both adoption of
alternative curricula and reassignment.%\marginpar{does that make more sense?}


\section{Digging Deeper into Section Reassignment}\label{sec:cp}
The proportion of worked sections in our dataset ending in
reassignment was small.
Nevertheless, since reassignment represents the only mechanism by
which individual teachers can affect their students' progress through
the Cognitive Tutor, exploring patterns of reassignment can provide
insight into how CT was used.

<<cpDat,dependson=c('data','setup'),include=FALSE>>=
secLev <- data%>%filter(is.finite(status) & is.finite(timestamp) &is.finite(date))%>%
    group_by(field_id,unit,section,Year,Yr,classid2,schoolid2)%>%
    summarize(startDate=min(date),endDate=max(date),startTime=min(timestamp),endTime=max(timestamp),state=state[1],
              status=max(status),Curriculum=Curriculum[1],overall=overall[1],version=version[1])%>%
    arrange(endDate)

### cp over time
secLev <- within(secLev,endMonth <- factor(month(secLev$endDate,TRUE,TRUE),levels=c('Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar','Apr','May','Jun','Jul')))

@

\subsection{How Do Reassignment Patterns Vary?}

\begin{figure}
  \centering
<<vcs,fig.height=3,fig.width=6.5,dependson=c('data','setup')>>=
#load('vcMods.RData')
library(lme4)

vcDat <- data%>%filter(unit%in%unique(data$unit[data$curriculum=='Algebra I']))%>%group_by(field_id,unit,section,classid2,state)%>%summarize(status=max(status,na.rm=TRUE),startDate=min(date,na.rm=TRUE))
vcDat <- merge(vcDat,stud,all.x=TRUE,all.y=FALSE)

vcDat$month <- months(vcDat$startDate,TRUE)

vcDat$cp <- vcDat$status=='changed placement'
vcDat$mast <- vcDat$status=='graduated'

vcModYr <- list(yr1=glmer(cp~(1|field_id)+(1|classid2)+(1|schoolid2)+(1|state)+(1|unit),family=binomial,data=subset(vcDat,year==1)),yr2=glmer(cp~(1|field_id)+(1|classid2)+(1|schoolid2)+(1|state)+(1|unit),family=binomial,data=subset(vcDat,year==2)))

vcMods <- list()
for(st in c('TX','KY','MI'))
 for(yr in c(1,2))
     vcMods[[paste0(st,'_',yr)]] <- glmer(cp~(1|field_id)+(1|classid2)+(1|schoolid2)+(1|unit),family=binomial,data=subset(vcDat,state==st & year==yr))


vcFun <- function(nm){
    mod <- vcMods[[nm]]
    out <- unlist(summary(mod)$varcor)
    out <- data.frame(sig2=out,comp=names(out),stringsAsFactors=FALSE)
    out <- rbind(out,data.frame(sig2=pi^2/3,comp='resid'))
    out$state <- strsplit(nm,'_')[[1]][1]
    out$year <- strsplit(nm,'_')[[1]][2]
    out$sig2 <- out$sig2/sum(out$sig2)
    out
}

vcDat <- do.call('rbind',lapply(names(vcMods),vcFun))

yr1 <- data.frame(sig2=c(unlist(summary(vcModYr[[1]])$varcor),pi^2/3),
                          comp=c(names(summary(vcModYr[[1]])$varcor),'resid'),
                          state='Overall',
                  year='1')
yr1$sig2 <- yr1$sig2/sum(yr1$sig2)

yr2 <- data.frame(sig2=c(unlist(summary(vcModYr[[2]])$varcor),pi^2/3),
                          comp=c(names(summary(vcModYr[[2]])$varcor),'resid'),
                          state='Overall',
                  year='2')
yr2$sig2 <- yr2$sig2/sum(yr2$sig2)

vcDat <- rbind(vcDat,yr1,yr2)

vcDat$comp <- factor(vcDat$comp)
levels(vcDat$comp)=list(State='state',School='schoolid2',Class='classid2',Student='field_id',Unit='unit',Residual='resid')


vcDat$state <- factor(vcDat$state,levels=c('TX','KY','MI','Overall'))

vcs <- list()
for(s in unique(vcDat$state)) for(y in 1:2){
 ddd <- round(subset(vcDat,state==s&year==y,select=c(sig2))*100)
 rownames(ddd) <- gsub('\\d','',rownames(ddd))
 rownames(ddd)[nrow(ddd)] <- 'resid'
 vcs[[paste0(s,'_',y)]] <- ddd
}


ggplot(vcDat,aes(year,sig2,fill=comp))+geom_col()+facet_grid(~state)+scale_fill_manual(values=rev(c('white','grey','#e41a1c','#377eb8','#4daf4a','#984ea3')))+labs(x='',y='% Variance Explained',fill='',title='Model: Multilevel Logistic Unconditional')+scale_y_continuous(labels=percent)


@
\caption{Results from a set of eight multilevel logistic regressions
  predicting section reassignment. For each year, in the entire sample (``Overall'')
  and in the three states with the largest numbers of reassignments (Texas,
  Kentucky, and Michigan), we regressed a binary variable indicating
  whether a section ended in reassignment on random intercepts for
  school, class, student, and unit, and in the overall case, for state
  as well, and recorded their variance. The residual variance was set
  as the variance of the standard logistic distribution,
  $\pi/3$. These bar charts give the proportion of the total variance
  attributable to each random effect.}
\label{fig:vc}
\end{figure}
Teachers alone control reassignment.
Nevertheless, the factors influencing student reassignment vary at a
number of levels.
For instance, state and district standards may prod teachers into
reassigning students to particular units.
Some principals may encourage teachers to adhere to the official
curriculum and avoid reassignment.
Some students may be more prone to reassignment than others.
Certain units in the CTAI curriculum may be harder than others,
causing students to tarry and teachers to reassign.
Finally, a host of other factors, at these levels and others, may spur reassignment.

To better understand the source of the variation in
reassignment---what drives some, but not other, sections worked by
students to end in
reassignment---we fit a set of multilevel models.
We fit separate models to data from each the three states with the highest numbers of reassignments, Texas,
Kentucky, and Michigan, and in the sample as a whole, in each of the
two study years, yielding a total of eight models.
Each model was a logistic regression: a binary indicator for section
reassignment was regressed on a random intercept for unit, as well as
nested random intercepts for student, classroom, and school.
Models fit to data from all six states included an additional random
intercept for state.

Logistic regression can be represented in terms of an underlying
latent variable $Z^*$: student $i$ working section $sec$ is reassigned
when $Z_{sec,i}^*>0$.
The model for $Z^*$ is:
\begin{equation*}
 Z^*_{sec,i}=\alpha_0+\beta_{u[sec]}+\gamma_i+\delta_{c[i]}+\epsilon_{s[i]}+e_{sec,i}
\end{equation*}
Where $\alpha_0$ is an overall intercept,
and $\beta_{u[sec]}$, $\gamma_i$, $\delta_{c[i]}$, and $\epsilon_{s[i]}$
are random intercepts for the unit in which $sec$ appears, for student
$i$, for $i$'s classroom, and for $i$'s school, respectively.
Again, the model fit to all six states includes an additional random
intercept for state.
The random intercepts are modeled as independent and normally
distributed, each with its own variance.
The regression error $e_{sec,i}$ is given the standard logistic
distribution, with ``residual'' variance $\pi/3$.
It is convenient to represent variance in reassignment probabilities
in terms of the variance of $Z^*$.

Figure \ref{fig:vc} gives the variance components estimated from these
logistic regressions: variances of the random intercept terms, as a
percentage of the total variance of $Z^*$.
Overall, in both years of the study, the largest determinant of
reassignment was school, accounting for
\Sexpr{vcs[['Overall_1']]['schoolid','sig2']}\% of the variation in year 1,
and \Sexpr{vcs[['Overall_2']]['schoolid',1]}\% in year 2.
After school, state was the most important, accounting for
\Sexpr{vcs[['Overall_1']]['state',1]}\% and
\Sexpr{vcs[['Overall_2']]['state',1]}\% in the two years, and unit,
accounting for \Sexpr{vcs[['Overall_1']]['unit',1]}\% and \Sexpr{vcs[['Overall_2']]['unit',1]}\%.
Surprisingly, classroom and student-level factors only accounted for
\Sexpr{vcs[['Overall_1']]['classid',1]}\% and
\Sexpr{vcs[['Overall_1']]['field_id',1]}\% in year 1, respectively,
and \Sexpr{vcs[['Overall_2']]['classid',1]}\% and
\Sexpr{vcs[['Overall_2']]['field_id',1]}\% in year 2.
The pattern was similar in Texas---where school accounted for over half
the variation in reassignment in both years---and in Michigan to a
lesser extent.\footnote{Percentages in state specific models, in which
  there is no between-state variance, cannot be
  directly compared to those from the overall model.}
In Kentucky, unit played the largest role
(\Sexpr{vcs[['KY_1']]['unit',1]}\%) in year 1, and classroom played
the largest role in year 2 (\Sexpr{vcs[['KY_2']]['classid',1]}\%).
Across states and years, student level factors never accounted for
more than \Sexpr{max(sapply(vcs,function(x) x['field_id',1]))}\% of
the variation in reassignment.
Other than in Kentucky in year 2, classroom never accounted for more
than \Sexpr{sort(sapply(vcs,function(x) x['classid',1]),dec=T)[2]}\%  of
the variation.

\textbf{Summary.} Although teachers control reassignment, their decisions appear to be largely
determined by broader policies, occurring at the state or school level.


\subsection{When are Students Reassigned?}

The timing of reassignments can also provide a window into what drives
teachers' decisions to reassign students.
Figure \ref{fig:byMonth} shows the proportion of worked sections in
each month that end in reassignment.
In both years, reassignments were much more common in the second half
of the school year than in the first.
This may be the result of teachers learning how to use the software as
the year progresses, or responding the pressure of upcoming
standardized tests by accelerating students' progress and reassigning
students to relevant sections.

As we've seen, reassignment was more common in year 2 than in year 1.
In fact, reassignment increases fairly steadily over the entire length
of the study.
Through December of the first year, reassignment was rare. From
January through May of year 1, between one and two percent of sections
ended in reassignment.
Year 2 begin where year 1 left off, with one to two percent of
sections reassigned.
Finally, from February through May of the second year, the rate of
reassignment increased again.


\begin{figure}
  \centering
<<byMonth,dependson='cpDat',fig.height=3,fig.width=6>>=
secLevMonth <- secLev%>%group_by(endMonth,Year)%>%
 summarize(nsec=n(),CPper=mean(status=='changed placement',na.rm=TRUE))
secLev$cp <- as.numeric(secLev$status=='changed placement')

ggplot(filter(secLevMonth,nsec>100),aes(endMonth,CPper,group=Year,color=Year,size=nsec))+geom_point()+geom_line(size=1)+
    ## geom_smooth(aes(as.numeric(endMonth)+day(endDate)/31-0.5,cp,group=Year,
    ##                 color=Year,size=1),
    ##             method = "glm", formula = y ~ splines::bs(x, 4),data=secLev,method.args=list(family='binomial'),
                                        #            show.legend=FALSE)+
scale_y_continuous(breaks=seq(0,0.05,.01),labels=percent)+
    coord_cartesian(ylim=c(0,0.05))+
    labs(x='Month',y='% of Sections Ending in Reassignment',size='# Worked\nSections',color='')