Balanced Sentence Experiments

Previous experiments

Experiment: Search for antipodal pair representation of adjectives in gpt2

Based on the Less Wrong article The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable we want to try finding projections conatining pairs of adjectives or pairs of names with antipodal carchteristics.

Experiment Exploratory Analysis:

Direct logit attribution
Layer Attribution
Head Attribution

Patching experiments

Resiudal stream patching

More experiments

Examine the embedding space using SVD

Each of the following experiments should be done in for both polarities of conjunctio expecting in one induction behavior and the other extrapolation.

Experiment: 1

Dataset: Outter semantic relation, (positive and)

Intervention: Change the polarity of the conjunction, and check out logit difference for bouth positive and negative conjunctions.

Experiment: 2

Dataset: Outter semantic relation (gender sensitive)

Intervention: Change the polarity of the conjunction

Experiment: 3

Dataset: Inner semantic realtion of both noun and adjective

Intervention: Change the polarity of the conjunction

Experiment: 4

Dataset: Inner semantic realtion of both noun and adjective gender sensitive

Intervention: Change the polarity of the conjunction

Experiment: 5

Dataset: Inner semantic relation of adjectives

Intervention: Change the polarity of the conjunction

Experiment: 6

Dataset: Outter semantic relation

Intervention: Interchange the nouns

Experiment: 7

Dataset: Outter semantic relation (gender sensitive)

Intervention: Interchange the nouns

Experiment: 8

Dataset: Inner semantic relation of both nouns and adjectives

Intervention: Interchange one of the nouns by a random noun

Experiment: 9

Dataset: Inner semantic realtion of both noun and adjective gender sensitive

Intervention: Interchange one of the nouns by a random noun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balanced Sentence Experiments

Previous experiments

Experiment: Search for antipodal pair representation of adjectives in gpt2

Experiment Exploratory Analysis:

Patching experiments

More experiments

Each of the following experiments should be done in for both polarities of conjunctio expecting in one induction behavior and the other extrapolation.

Experiment: 1

Dataset: Outter semantic relation, (positive and)

Intervention: Change the polarity of the conjunction, and check out logit difference for bouth positive and negative conjunctions.

Experiment: 2

Dataset: Outter semantic relation (gender sensitive)

Intervention: Change the polarity of the conjunction

Experiment: 3

Dataset: Inner semantic realtion of both noun and adjective

Intervention: Change the polarity of the conjunction

Experiment: 4

Dataset: Inner semantic realtion of both noun and adjective gender sensitive

Intervention: Change the polarity of the conjunction

Experiment: 5

Dataset: Inner semantic relation of adjectives

Intervention: Change the polarity of the conjunction

Experiment: 6

Dataset: Outter semantic relation

Intervention: Interchange the nouns

Experiment: 7

Dataset: Outter semantic relation (gender sensitive)

Intervention: Interchange the nouns

Experiment: 8

Dataset: Inner semantic relation of both nouns and adjectives

Intervention: Interchange one of the nouns by a random noun

Experiment: 9

Dataset: Inner semantic realtion of both noun and adjective gender sensitive

Intervention: Interchange one of the nouns by a random noun

FilesExpand file tree

experiments.md

Latest commit

History

experiments.md

File metadata and controls

Balanced Sentence Experiments

Previous experiments

Experiment: Search for antipodal pair representation of adjectives in gpt2

Experiment Exploratory Analysis:

Patching experiments

More experiments

Each of the following experiments should be done in for both polarities of conjunctio expecting in one induction behavior and the other extrapolation.

Experiment: 1

Dataset: Outter semantic relation, (positive and)

Intervention: Change the polarity of the conjunction, and check out logit difference for bouth positive and negative conjunctions.

Experiment: 2

Dataset: Outter semantic relation (gender sensitive)

Intervention: Change the polarity of the conjunction

Experiment: 3

Dataset: Inner semantic realtion of both noun and adjective

Intervention: Change the polarity of the conjunction

Experiment: 4

Dataset: Inner semantic realtion of both noun and adjective gender sensitive

Intervention: Change the polarity of the conjunction

Experiment: 5

Dataset: Inner semantic relation of adjectives

Intervention: Change the polarity of the conjunction

Experiment: 6

Dataset: Outter semantic relation

Intervention: Interchange the nouns

Experiment: 7

Dataset: Outter semantic relation (gender sensitive)

Intervention: Interchange the nouns

Experiment: 8

Dataset: Inner semantic relation of both nouns and adjectives

Intervention: Interchange one of the nouns by a random noun

Experiment: 9

Dataset: Inner semantic realtion of both noun and adjective gender sensitive

Intervention: Interchange one of the nouns by a random noun