Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions src/examples/learn_quant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Introduction
This module provides code used in the publication `Quantifiers of Greater Monotonicity are Easier to Learn` presented at SALT35 sponsored by the Linguistic Society of America.

This code provides an example of utilizing the `ultk` package for generating abstract data models of ordered referents in a universe and defining a grammar to generate unique quantifier expressions, as well as enumerating quantifier expressions and evaluating their meaning with respect to a universe of referents.
The example also includes code for training neural models to correctly verify a given quantifier expression, in addition to functions that compute a quantifier's degree of monotonicity, as described in the published manuscript.

For an introduction to the data structures and research question, please refer to the publication and refer to the (tutorial)[src/examples/learn_quant/notebooks/tutorial.ipynb].

It is highly recommended that the user review the docs of the (`hydra` package)[www.hydra.cc].

# Usage

## Generation
From the `src/examples` directory:
`python -m learn_quant.scripts.generate_expressions`: generates `generated_expressions.yml` files that catalog licensed `QuantifierModel`s given a `Grammar` and `QuantifierUniverse` given the config at `conf/expressions.yaml`.

Using `hydra`, you may refer to the recipe files at `conf/recipes/`:
`python -m learn_quant.scripts.generate_expressions recipe=3_3_3_xi.yaml`
This would generate unique expressions evaluated over a universe and depth specified in the selected recipe config.

You may also override specific parameters:
`python -m learn_quant.scripts.generate_expressions recipe=3_3_3_xi.yaml ++universe.m_size=4`

## Learning

### Sampling
At large universe sizes and genereation depths, the number of generated expressions can be too numerous for completing learning experiments for given compute resources.

After generating a list of expressions, you may sample them using `notebooks/randomize_expression_index.ipynb`. This generates a `.csv` file that simply draws the desired number of expressions and maps them to their ordering in the original `generated_expressions.yaml` file.

### Training with `slurm`
On your `slurm` configured node:

Uncomment the following lines in `conf/learn.yaml`:
```
# - override hydra/launcher: swarm
# - override hydra/sweeper: sweep
```

Run:
`HYDRA_FULL_ERROR=1 python -m learn_quant.scripts.learn_quantifiers --multirun training.lightning=true training.strategy=multirun training.device=cpu model=mvlstm grammar.indices=false`.

This command will read the config at `conf/learn`, prepare training data based on the chosen quantifier expressions, and run 1 training job per expression using the `hydra` `submitit` plugin **in parallel**. To specify specific `slurm` parameters, you may modify `conf/hydra/launcher/swarm.yaml`.

### Without Slurm
Run:
`HYDRA_FULL_ERROR=1 python -m learn_quant.scripts.learn_quantifiers training.lightning=true training.strategy=multirun training.device=cpu model=mvlstm grammar.indices=false`.

This command will read the config at `conf/learn.yaml`, prepare training data based on the chosen quantifier expressions, and sequentially run 1 training job for all expressions on your local machine.

### Tracking

If you would like to track experimental runs to MLFlow, you may run an `mlflow` server at the endpoint specified at `tracking.mlflow.host` and have `learn_quant.scripts.learn_quantifiers` track metrics to the server.

You may turn off tracking with MLFlow by setting the config value `tracking.mlflow.active` to `false`.

## Calculation of monotonicity
The `measures.py` script calculates monotonicity for specified quantifier expressions and at given universe sizes. This references the config `conf/learn.yaml`. For expressions, it references the generated expressions at the folder associated with the parameter values at the `expressions` keyspace. If universe parameters are defined at the `measures.monotonicity.universe` keyspace, they will define the size of the universe at which the monotonicity value will be calculated for each expression. `measures.expressions` specifies which expressions will be calculated.

Run `python -m learn_quant.measures` to generate a `.csv` file of the specified monotonicity measurements.

# Content Descriptions

- `scripts`: a set of scripts for generating `QuantifierModels` and measuring various properties of individual models and sets of models.
- `generate_expressions.py` - This script will reference the configuration file at `conf/expressions.yaml` to generate a `Universe` of the specified dimensions and generate all expressions from a defined `Grammar`. Outputs will be saved in the `outputs` folder. The script will the _shortest_ expression (ULTK `GrammaticalExpression`s) for each possible `Meaning` (set of `Referent`s) verified by licit permutations of composed functions defined in `grammar.yml`. In particular, ULTK provides methods for enumerating all grammatical expressions up to a given depth, with user-provided keys for uniqueness and for comparison in the case of a clash. By setting the former to get the `Meaning` from an expression and the latter to compare along length of the expression, the enumeration method returns a mapping from meanings to shortest expressions which express them.
- `learn_quantifiers.py` - This script will reference the configuration file at `conf/learn.yaml`. It loads expressions that are saved to the `output` folder after running the `generate_expressions.py` script. It transforms the data into a format that allows the training of a neural network the relationship between quantifier models and the truth values verified by a particular expression. The script then iterates through loaded expressions and uses Pytorch Lightning to train a neural model to verify randomly sampled models of particular sizes (determined by `M` and `X` parameters). Logs of parameters, metrics, and other artifacts are saved to an `mlruns` folder in directories specified by the configuration of the running `mlflow` server.
- `grammar.yml`: defines the "language of thought" grammar (a ULTK `Grammar` is created from this file in one line in `grammar.py`) for this domain, using the functions in [van de Pol 2023](https://pubmed.ncbi.nlm.nih.gov/36563568/).
- `measures.py`: functions to measure degrees of monotonicity of quantifier expressions according to Section 5 of [Steinert-Threlkeld, 2021](https://doi.org/10.3390/e23101335)
- `outputs`: outputs from the generation routines for creating `QuantifierModel`s and `QuantifierUniverse`s
- `quantifier.py`: Subclasses `ultk`'s `Referent` and `Universe` classes that add additional properties and functionality for quantifier learning with `ultk`
- `sampling.py` - Functions for sampling quantifier models as training data
- `set_primitives.py` - Optional module-defined functions for primitives of the basic grammar. Not used unless specified by the `grammar.typed_rules` key
- `training.py`: Base `torch` classes and helper functions. Referenced only when `training.lightning=false`. Not maintained.
- `training_lightning.py`: Primary training classes and functions. Uses `lightning`.
- `util.py`: utility functions, I/O, etc.

# TODO:
- Fully implement `hydra`'s `Structured Config` (example begun with `conf/expressions.py`)
- Show example of adding custom primitives with custom-implemented classes (`quantifiers_grammar_xprimitives`)
Empty file.
Empty file.
38 changes: 38 additions & 0 deletions src/examples/learn_quant/conf/expressions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import dataclasses
from dataclasses import dataclass, field
from omegaconf import DictConfig
import hydra
from hydra.core.config_store import ConfigStore


@dataclasses.dataclass
class ModeConfig:
name: str


@dataclasses.dataclass
class UniverseConfig:
m_size: int
x_size: int
weight: float
inclusive_universes: bool


@dataclasses.dataclass
class GrammarConfig:
depth: int


# Define a configuration schema
@dataclasses.dataclass
class Config:
mode: ModeConfig
universe: UniverseConfig
grammar: GrammarConfig


cs = ConfigStore.instance()
# Registering the Config class with the name 'config'.
cs.store(name="conf", node=Config)
cs.store(group="universe", name="base_config", node=UniverseConfig)
cs.store(group="grammar", name="base_config", node=GrammarConfig)
15 changes: 15 additions & 0 deletions src/examples/learn_quant/conf/expressions.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# This file is used to generate expressions given a desired recipe for a grammar and universe.
# Resulting expressions are saved in nested folders organized by the size of models generated at `./outputs`.

defaults:
- _self_
- recipe: ???

output: "learn_quant/outputs/"

mode:
- generate

save: true
time_trial_log: M${universe.m_size}_X${universe.x_size}_D${grammar.depth}_idx-${grammar.indices}.csv

4 changes: 4 additions & 0 deletions src/examples/learn_quant/conf/grammar/base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
path: "learn_quant/grammar.yml"
weight: 2.0
depth: 3
indices: true
7 changes: 7 additions & 0 deletions src/examples/learn_quant/conf/grammar/typed_primitives.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
defaults:
- _self_
- typed_rules: set_primitives
path: "learn_quant/grammar_xprimitives.yml"
weight: 2.0
depth: 3
indices: true
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
module_path: learn_quant.set_primitives
11 changes: 11 additions & 0 deletions src/examples/learn_quant/conf/hydra/launcher/swarm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
submitit_folder: ${hydra.sweep.dir}/.submitit/%j
timeout_min: 1440
_target_: hydra_plugins.hydra_submitit_launcher.submitit_launcher.SlurmLauncher
partition: gpu-l40
account: clmbr # Your account (if required by your cluster)
time: 2880 # Time in minutes (48 hours)
cpus_per_task: 1
mem_gb: 8
additional_parameters: {"gpus": "0", "time": "1-00"}
max_num_timeout: 10 # number of times to re-queue job after timeout
array_parallelism: 120 # number of jobs to launch in parallel
108 changes: 108 additions & 0 deletions src/examples/learn_quant/conf/learn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# This configuration is used to train Pytorch models generated quantifier expressions.
# It can be modified to either run a loop in a single process over multiple expressions, or to swarm learning jobs using `slurm`.


defaults:
- _self_
- model: null
# To use the slurm launcher, you need to set the following options in the defaults list:
# - override hydra/launcher: swarm # For use with launching multiple jobs via slurm
# - override hydra/sweeper: sweep

experiment_name: transformers_improved_2 # Name of the experiment to be created in MLFlow
notes: |
This run is to evaluate the neural learning quantifiers and logging in MLFlow.

tracking:
mlflow:
active: true
host: g3116 # This could be an IP address or a hostname (job name in slurm)
port: 5000
vars:
MLFLOW_SYSTEM_METRICS_ENABLED: "true"
MLFLOW_HTTP_REQUEST_MAX_RETRIES: "8"
MLFLOW_HTTP_REQUEST_BACKOFF_FACTOR: "60"

# Options to define where the expressions should be created and/or loaded, how they should be represented, and how they should be generated.
# The expressions are generated from a grammar, which is defined in the grammar.yml file.
# The grammar is used to generate the expressions, and the expressions are then used to create the dataset used by the training script.
expressions:
n_limit: 2000
output_dir: learn_quant/outputs/
grammar:
depth: 5
path: learn_quant/grammar.yml
indices: false # If set to true, the index primitives will be used in the grammar. Specific integer indices can also be set.
index_weight: 2.0
universe:
x_size: 4
m_size: 4
representation: one_hot
downsampling: true
generation_args:
batch_size: 1000
n_limit: 5000 # Minimum number of sample rows in dataset for a *single* class. Full dataset length is 2 * n_limit.
M_size: 12
X_size: 16
entropy_threshold: 0.01
inclusive: False
batch_size: 64
split: 0.8
target: "M${expressions.universe.m_size}/X${expressions.universe.x_size}/d${expressions.grammar.depth}"
index_file: "learn_quant/expressions_sample_2k.csv" # If set, examples will be trained in order according to the index file

training:
# Given an expressions file, the "resume" key will ensure that the training will continue from the designated expression in the file.
#resume:
# term_expression: and(and(not(subset_eq(A, B)), equals(cardinality(A), cardinality(B))), subset_eq(index(cardinality(A), union(A, B)), union(difference(A, B), difference(B, A))))
strategy: multirun
k_splits: 5
n_runs: 1
lightning: true
device: cpu
epochs: 50
conditions: false
early_stopping:
threshold: 0.05
monitor: val_loss
min_delta: 0.001
patience: 20
mode: min
check_on_train_epoch_end: false

optimizer:
_partial_: true
_target_: torch.optim.Adam
lr: 1e-3

criterion:
_target_: torch.nn.BCEWithLogitsLoss


# This section defines how the measures will be calculated.
# This is an example of how to use the measures module to calculate the monotonicity of the expressions.
# This will search for an expressions file that fits the given arguments and then calculate the monotonicity of the expressions.
# HYDRA_FULL_ERROR=1 python -m learn_quant.measures ++expressions.grammar.depth=3 ++expressions.grammar.index_weight=5.0 ++expressions.grammar.indices="[0,3]"
measures:
expressions:
- all
# - or(subset_eq(A, B), subset_eq(B, A))
monotonicity:
debug: false
direction:
- all
create_universe: false # This creates a universe for the purpose of evaluating monotonicity
universe:
x_size: 6
m_size: 6
# If you want to filter out certain representations in the universe, you can use the 'universe_filter' key.
# This will filter out models with indices of the given
#universe_filter:
# - 3
# - 4

# The hydra sweeper is used in tandem with the hydra slurm launcher to launch individual jobs for each expression.
hydra:
sweeper:
params:
+expressions.index: range(0, ${expressions.n_limit})
29 changes: 29 additions & 0 deletions src/examples/learn_quant/conf/recipe/3_3_3_xi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# @package _global_
defaults:
- /grammar/base@grammar
- /universe/base@universe
- _self_

name: base

grammar:
depth: 3
indices: false
weight: 2.0

universe:
m_size: 3
x_size: 3
inclusive_universes: ${universe.inclusive_universes}

measures:
expressions:
- subset_eq(A, B)
monotonicity:
debug: false
universe_filter:
- 3
- 4
direction:
- all

20 changes: 20 additions & 0 deletions src/examples/learn_quant/conf/recipe/4_4_3_i23.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# @package _global_
defaults:
- /grammar/base@grammar
- /universe/base@universe
- _self_

name: base

grammar:
depth: 3
indices:
- 0
- 3
weight: 5.0

universe:
m_size: 4
x_size: 4


29 changes: 29 additions & 0 deletions src/examples/learn_quant/conf/recipe/4_4_3_xi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# @package _global_
defaults:
- /grammar/base@grammar
- /universe/base@universe
- _self_

name: base

grammar:
depth: 3
indices: false
weight: 2.0

universe:
m_size: 4
x_size: 4
inclusive_universes: ${universe.inclusive_universes}

measures:
expressions:
- subset_eq(A, B)
monotonicity:
debug: false
universe_filter:
- 3
- 4
direction:
- all

19 changes: 19 additions & 0 deletions src/examples/learn_quant/conf/recipe/4_4_5_xi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# @package _global_
defaults:
- /grammar/base@grammar
- /universe/base@universe
- _self_

name: base

grammar:
depth: 5
indices: false
weight: 2.0

universe:
m_size: 4
x_size: 4
inclusive_universes: ${universe.inclusive_universes}


19 changes: 19 additions & 0 deletions src/examples/learn_quant/conf/recipe/base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# @package _global_
defaults:
- _self_
- /grammar/base@grammar
- /universe/base@universe

name: base

grammar:
depth: ${grammar.depth}
indices: ${grammar.indices}
path: ${grammar.path}
weight: ${grammar.weight}

universe:
m_size: ${universe.m_size}
x_size: ${universe.x_size}
inclusive_universes: ${universe.inclusive_universes}

Loading