Replication of Internal Coherence Maximization (ICM)

Task description here. Essentially, this repo reimplemented ICM WITHOUT logical consistency fix, and ran it on a subset of TruthfulQA dataset by feeding results as few-shot examples to a base model.

Results

Across four runs for ICM, the accuracy ranged from 56.0% to 60% for 100 examples. All intermediate results are saved in results/icm_history.json.

The results may not exactly match the paper's numbers due to:

Different models used
In-context learning
Randomness in LLM generation and throughout ICM pipeline
This ICM version doesn't include logical consistency fix.

Other notes:

Zero-shot with chat model (Llama-3.1-405B-instruct) initially gave lower accuracy (50%) than base model (~65%) since 25-30% of responses were empty and must be skipped when parsing labels. The accuracy was raised after adding retry mechanism for empty response.

Set up

Require Python>=3.10, <=3.13 (test with Python 3.12, as specified in .python-version)
Get uv if you haven't to manage packages and run scripts.
Fork/clone the repo.
Run the set up code in the terminal

chmod +x setup.sh
sh setup.sh

You should be in the virtual environment named "praxis-sprint-icm". If not, manually activate it with

source .venv/bin/activate

Go to .env and fill in the secrets. You need a Hyperbolic API key.

Replication

To run the full pipeline from data loading, ICM prediction, evaluation to figure generation, run

uv run src/main.py

then generate figure with the command above.

You can also each evaluation scenario separately by running

uv run src/main.py --<scenario>

where <scenario> can be one of zero_shot_base, zero_shot_chat, few_shot_golden, few_shot_icm. Note that running few_shot_icm will also run ICM prediction if not done before, and could take around 30-40 mins for test set.

Run with -h to see all options, or check src/main.py for details.

uv run src/main.py -h

CRITICAL: Change seed and other configurations in src/init.py if you want different ICM results. Current seed was used AFTER the reported results.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
conversations		conversations
data		data
results		results
src		src
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Replication of Internal Coherence Maximization (ICM)

Results

Set up

Replication

About

Uh oh!

Releases

Packages

Languages

License

mychiffonn/icm

Folders and files

Latest commit

History

Repository files navigation

Replication of Internal Coherence Maximization (ICM)

Results

Set up

Replication

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages