Skip to content

Create Lessons

jeisner edited this page Oct 10, 2014 · 15 revisions

Naming and Storing Lessons

Each lesson is assumed to be in its own folder, typically under the top level lessons folder. By default, the lesson names are consecutive integers, which indicate the order of the lessons:

In general, however, the lesson order is derived from a lesson_order array (possibly heterogeneous) defined in the global lessons/settings.json configuration file. (You can read more about configuration at Changing Tutorial Settings.) The ith element of lesson_order is the name of the ith lesson. To get the default order, we set lesson_order to be:

{
    "lesson_order" : [
	1,
	2,
	...
	18
    ],
    ...
}

lesson_order allows you to select only certain lessons to be viewable. It also allows you to change the order or insert new lessons into the sequence.

Defining Lesson Material

There is a standard file format. Each lesson needs three files:

  1. instructions.html: what students should expect/try to do in this lesson
  2. observations: a tab-separated listing of type-level observations, including counts of that type and which features fire
  3. theta: a tab-separated listing of features and their optimal values

instructions.html

The system injects the instruction HTML directly into the page. Therefore, instructions.html should be a partial HTML page, e.g., the top-level node is typically a <div>. Here is the first bit from lessons/1/instructions.html:

<div id="welcome" class="showable_instructions">
  <p>Welcome!  This interactive visualization will help you understand
  the popular technique of log-linear modeling.</p>
</div>

<div id="p1" class="showable_instructions">
  <p><b>Try it out:</b> The sliders below control the parameters ("weights") of a log-linear model.
    When you increase the <code>circle</code> weight, which filled shapes get
    bigger?  Which ones get smaller?</p>
</div>

It is not necessary to set id or class attributes on HTML tags. However, you may wish to include them in order to further customize the tutorial: because the instructions are injected directly into the page, any embedded Javascript or CSS will apply.

observations

The observations file has five tab-separated columns:

  1. a count,
  2. a (row,column) position (0-indexed),
  3. a conditioning context, possibly empty, in which it appears,
  4. a comma-separated list of features, and
  5. visualization instructions.

Each conditioning context X defines a separate distribution P(. | X). An empty conditioning context corresponds to a globally normalized models: all observations are governed by P(. | [no context]) = P(.).

This file provides a "conditional multinomial" view: each different type of object is listed once, per conditioning context, with an associated count. Positions are given as pairs (0-indexed). Here are the first two lines from lessons/1/observations, which describes a globally normalized model:

count	position	context	features	visualization
30	0,0		circle,solid	shape=circle,fill=solid
15	0,1		circle	shape=circle,fill=striped

The first line says that 30 instances of a solid circle (visualization) should appear in the first row and column (position = 0,0). Both the circle and solid features fire. In contrast, the second line says that 15 instances of a striped circle should appear to the right of the solid circles (first row, second column; position = 0,1).

Here the context column is empty; however, there still need to be four tabs per line.

For contrast, here is a sample observations fragment for a conditionally-normalized model (from lessons/16/observations) mimicking a bigram "shape model":

count	position	context	features	visualization
29	0,0	solid triangle	triangle,solid,triangle & solid,same event,same shape,same fill	shape=triangle,fill=solid
33	0,1	solid triangle	triangle,striped,triangle & striped,same shape	shape=triangle,fill=striped
...
2	2,1	hollow triangle	square,striped,square & striped	shape=square,fill=striped
0	2,2	hollow triangle	square,hollow,square & hollow,same fill	shape=square,fill=hollow
5	0,0	solid circle	triangle,solid,triangle & solid,same fill	shape=triangle,fill=solid
...

The fragment here defines three separate conditional distributions: P(. | solid triangle) (first two lines), P(. | hollow triangle) (next two), and P(. | solid circle) (last line). In the solid triangle context, we observe 29 solid triangles and 33 striped triangles; in the hollow triangle context, we observe 2 striped squares and 0 hollow squares; and in the solid circle context, we observe 5 solid triangles. (In the actual lesson, we observe both more contexts and outcomes per context.)

This example also demonstrates how feature names may contain any (ASCII) character except for tabs and commas.

theta

The theta file describes which features to define. Like observations, this file is tab-separated. Note that every feature named in observations needs to be listed in theta!

The three columns of this file are:

  1. context
  2. feature, the name of the feature, and
  3. value, the true or optimal of this feature.

For example, here's the theta file for lesson 1:

context feature value
        circle  1.1003734519487993
        solid   0.6932745934710376

The file format is anachronistic, and I haven't had the time to update the Javascript back-end.

  • The context column should always be empty. This empty context is how back-off is implemented.
  • The feature column provides the name of the feature.
  • The value column gives the "true" weights for each feature. Because we initially generated a number of the lessons from a maxent model, with known weights, I originally included this column as a "shortcut". However, as I said in issue 22, we should be to automatically find the weights given the counts. That said, this column is still used to generate new examples (when you change the textbox value of N = , the number of observations in a context).

Clone this wiki locally