Modeling Bottom-Up Information Quality During Language Processing

Installing dependencies

Clone the repository

git clone https://github.com/DiLi-Lab/Bottom-Up-Information.git
cd Bottom-Up-Information/

Create an environment, e.g with anaconda.

conda create --name bottomup python=3.10
conda activate bottomup

Install the necessary packages with pip:

pip install -r requirements.txt

Alternatively, you can directly create the conda environment and install dependencies using the provided environment.yml file:

conda env create -f environment.yml -n bottomup
conda activate bottomup

Accessing the Data

Human Reading Data with MoTR

We use the mouse-tracking for reading (MoTR) paradigm to collect human reading data (see this paper for details). We provide the post-processed reading measures data in the data/Human folder, which was used in the experiments. The codes for post-processing the raw MoTR data are in the post_processing folder.

The full range of reading data can be downloaded from here, including the raw MoTR data, the mouse association data (anologous to fixations in eye-tracking), and the reading measures data.

The link to the experiment in Chinese is here. The link to the experiment in English is here.

Half-occluded Image Data for MI estimations with LMs

We create noised input with the half-occluded image to estimate the mutual information (MI) between degraded bottom-up visual information and linguistic representations. We provide the post-processed MI / IG data in the data/LLM folder.

The images can be downloaded from here, including the full images, the half-occluded images, and some statistics of the images.

The code for generating the half-occluded images is adapted from here. Note, we fixed some bugs and modified the code to generate half-occluded images. We can provide the modified code upon request, but it is not the main focus of this repository.

Running the Experiments

The scripts for training the models are in the src folder. In addition, we provide scripts we used to check the half-masking effects to ensure the masking is even for upper and lower halves of the images. These scripts start with "check" (e.g., check_half_character_zh.html).

In the folder src/transocr are codes for training the transocr models (see this paper). The model architecture is adapted from this repository. Though we provide the modified code, to run the code, please follow the instructions in the original repository.

The bash scripts for running the experiments are in the scripts folder. You need to modify the paths and settings in the bash scripts before running them.

The scripts for calling the models and calculating the MI / IG on testing data sets are in the notebooks folder. The .ipynb files starts with "cal" (e.g., cal_entropy_en_words.ipynb) are simple baseline models. The .ipynb files starts with "finetune" are for calculating the MI / IG with LMs.

Data Analysis and Visualization

The R scripts for data analysis and visualization are in the analysis folder.

In the analysis folder, the folder precomputed and stats contain the pre-computed results and statistics used in the analysis. The folder visualization contains the generared figures in the paper.

Contact

Feel free to create an issue or email Cui Ding (cui.ding@uzh.ch) for any questions or suggestions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modeling Bottom-Up Information Quality During Language Processing

Installing dependencies

Accessing the Data

Human Reading Data with MoTR

Half-occluded Image Data for MI estimations with LMs

Running the Experiments

Data Analysis and Visualization

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
analysis		analysis
data		data
notebooks		notebooks
post_processing		post_processing
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
dev_requirements.txt		dev_requirements.txt
environment.yml		environment.yml
requirements.txt		requirements.txt

DiLi-Lab/Bottom-Up-Information

Folders and files

Latest commit

History

Repository files navigation

Modeling Bottom-Up Information Quality During Language Processing

Installing dependencies

Accessing the Data

Human Reading Data with MoTR

Half-occluded Image Data for MI estimations with LMs

Running the Experiments

Data Analysis and Visualization

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages