-
Notifications
You must be signed in to change notification settings - Fork 3
Create Lessons
Each lesson is assumed to be in its own folder, typically under the top level lessons folder. By default, the lesson names are consecutive integers, which indicate the order of the lessons:
- the first lesson is in lessons/1/
- the second lesson is in lessons/2/
- etc.
In general, however, the lesson order is derived from a lesson_order array (possibly heterogeneous) defined in the global lessons/settings.json configuration file. (You can read more about configuration at Changing Tutorial Settings.) The ith element of lesson_order is the name of the ith lesson. To get the default order, we set lesson_order to be:
{
"lesson_order" : [
1,
2,
...
18
],
...
}lesson_order allows you to select only certain lessons to be viewable. It also allows you to change the order or insert new lessons into the sequence.
There is a standard file format. Each lesson needs three files:
-
instructions.html: what students should expect/try to do in this lesson -
observations: a tab-separated listing of type-level observations, including counts of that type and which features fire -
theta:a tab-separated listing of features and their optimal values
The system injects the instruction HTML directly into the page. Therefore, instructions.html should be a partial HTML page, e.g., the top-level node is typically a <div>. Here is the first bit from lessons/1/instructions.html:
<div id="welcome" class="showable_instructions">
<p>Welcome! This interactive visualization will help you understand
the popular technique of log-linear modeling.</p>
</div>
<div id="p1" class="showable_instructions">
<p><b>Try it out:</b> The sliders below control the parameters ("weights") of a log-linear model.
When you increase the <code>circle</code> weight, which filled shapes get
bigger? Which ones get smaller?</p>
</div>It is not necessary to set id or class attributes on HTML tags. However, you may wish to include them in order to further customize the tutorial: because the instructions are injected directly into the page, any embedded Javascript or CSS will apply.
The observations file has five tab-separated columns:
- a count,
- a
(row,column)position (0-indexed), - a conditioning context, possibly empty, in which it appears,
- a comma-separated list of features, and
- visualization instructions.
Each conditioning context X defines a separate distribution P(. | X). An empty conditioning context corresponds to a globally normalized models: all observations are governed by P(. | [no context]) = P(.).
This file provides a "conditional multinomial" view: each different type of object is listed once, per conditioning context, with an associated count. Positions are given as pairs (0-indexed). Here are the first two lines from lessons/1/observations, which describes a globally normalized model:
count position context features visualization
30 0,0 circle,solid shape=circle,fill=solid
15 0,1 circle shape=circle,fill=striped
The first line says that 30 instances of a solid circle (visualization) should appear in the first row and column (position = 0,0). Both the circle and solid features fire. In contrast, the second line says that 15 instances of a striped circle should appear to the right of the solid circles (first row, second column; position = 0,1).
Here the context column is empty; however, there still need to be four tabs per line.
For contrast, here is a sample observations fragment for a conditionally-normalized model (from lessons/16/observations) mimicking a bigram "shape model":
count position context features visualization
29 0,0 solid triangle triangle,solid,triangle & solid,same event,same shape,same fill shape=triangle,fill=solid
33 0,1 solid triangle triangle,striped,triangle & striped,same shape shape=triangle,fill=striped
...
2 2,1 hollow triangle square,striped,square & striped shape=square,fill=striped
0 2,2 hollow triangle square,hollow,square & hollow,same fill shape=square,fill=hollow
5 0,0 solid circle triangle,solid,triangle & solid,same fill shape=triangle,fill=solid
...
The fragment here defines three separate conditional distributions: P(. | solid triangle) (first two lines), P(. | hollow triangle) (next two), and P(. | solid circle) (last line). In the solid triangle context, we observe 29 solid triangles and 33 striped triangles; in the hollow triangle context, we observe 2 striped squares and 0 hollow squares; and in the solid circle context, we observe 5 solid triangles. (In the actual lesson, we observe both more contexts and outcomes per context.)
This example also demonstrates how feature names may contain any (ASCII) character except for tabs and commas.
The theta file describes which features to define. Like observations, this file is tab-separated. Note that every feature named in observations needs to be listed in theta!
The three columns of this file are:
- context
- feature, the name of the feature, and
- value, the true or optimal of this feature.
For example, here's the theta file for lesson 1:
context feature value
circle 1.1003734519487993
solid 0.6932745934710376
The file format is anachronistic, and I haven't had the time to update the Javascript back-end.
- The context column should always be empty. This empty context is how back-off is implemented.
- The feature column provides the name of the feature.
- The value column gives the "true" weights for each feature. Because we initially generated a number of the lessons from a maxent model, with known weights, I originally included this column as a "shortcut". However, as I said in issue 22, we should be to automatically find the weights given the counts. That said, this column is still used to generate new examples (when you change the textbox value of
N =, the number of observations in a context).