GitHub - neu-spiral/DMLEforAL

Dependency-aware Maximum Likelihood Estimation for Active Learning

Active learning with DMLE for
model parameter updates,
selecting 150 samples

Active learning with IMLE for
model parameter updates,
selecting 150 samples

Passive learning
using all 700 samples

This is the official repository for the paper Dependency-aware Maximum Likelihood Estimation for Active Learning published at TMLR 2025. It contains the code and resources for the DMLE (Dependency-aware Maximum Likelihood Estimation) approach, as introduced in our paper. DMLE addresses mismatches in the sample independence assumption in active learning by explicitly modeling natural dependencies between samples during model parameter estimation, while remaining fully compatible with standard active learning workflows.

Overview

Traditional active learning methods typically assume that samples are independent when estimating model parameters. This assumption is often violated in practice, especially under cyclic or sequential active learning, which can lead to inaccurate parameter estimates.

DMLE introduces a principled approach to:

Account for natural dependencies between samples during likelihood estimation.
Maintain compatibility with any active learning strategy (uncertainty-based, diversity-based, etc.).

DMLE focuses solely on improving parameter estimation; it does not modify the sample selection strategy, making it modular and easy to integrate with existing pipelines.

Features

Explicitly models sample dependencies in active learning.
Compatible with a wide range of active learning strategies.
Lightweight and easy to integrate with workflows.
Works on both synthetic and real-world datasets, supporting multiple data modalities.

Repository Structure

DMLEforAL/
│
├── assets/                     # Images and GIFs used in README and experiments
├── b_py_combined.bash          # Combined Bash script for preprocessing or setup
├── execute_combined_cpu.bash   # Script to execute experiments on CPU
├── data.py                     # Dataset loading and preprocessing
├── main.py                     # Main script to run experiments
├── model.py                    # Model definitions
├── plot_results_submission.py  # Script to plot the results
├── print_results_submission.py # Script to print the results
├── utils.py                    # Utility functions (metrics, helper routines)
├── README.md                   # This file
└── requirements.txt            # Python dependencies

Installation

Clone this repository:

git clone https://github.com/yourusername/dmle.git
cd dmle
pip install -r requirements.txt

Running Experiments & Datasets

main.py: Run this script to reproduce the results presented in the paper.
plot_results_submission.py: Generate accuracy plots from experiment results.
print_results_submission.py: Print the accuracy values at a specific active learning cycle.

You can run experiments directly with:

python main.py $init_size $num_queries $num_cycles $temperature $seed $selection $strategy $dataset $obj

where the arguments are:

init_size: Initial labeled set size.
num_queries: Number of samples queried per cycle.
num_cycles: Number of active learning cycles.
temperature: Temperature parameter for sampling.
seed: Random seed for reproducibility.
selection: Sampling method.
strategy: Active learning strategy (e.g., entropy, least confident, margin, BALD, coreset, etc.).
dataset: Dataset name (e.g., SVHN, EMNIST, Tiny-Imagenet).
obj: Objective function to optimize (e.g., DMLE, IMLE, etc.).

Datasets

SVHN: Download the data matrix and place it in the same folder: SVHN data.
EMNIST: Preprocessing takes time initially. Use the load_data function the first time to create numpy files. For subsequent runs, load the prepared data from these numpy files. The saving/loading code is commented accordingly during each step.
Tiny-Imagenet: Download the data and place it in the same folder: Tiny-Imagenet data. Preprocessing takes time initially. Use the load_data function the first time to create numpy files. For subsequent runs, load the prepared data from these numpy files. The saving/loading code is commented accordingly during each step.
Other datasets: Automatically downloaded via Keras/TensorFlow dataset repositories; no additional downloads are needed.

Customization

To change datasets, modify data.py.
To change the model architecture, modify model.py.
To apply the DMLE parameter estimation fix, use the functions provided in utils.py.

Experiments

DMLE has been evaluated on:

Synthetic datasets to demonstrate the impact of sample dependencies in cyclic active learning.
Real-world classification tasks showing improved model performance with fewer labeled samples.
Comparison with the traditional maximum likelihood estimation (IMLE) and the statistical bias mitigation approach proposed by Farquhar et al. (2021).

Results demonstrate that DMLE enhances parameter estimation and accelerates learning in active learning settings where sample selection introduces dependencies among data points.

Citation

If you use this code in your research, please cite our paper:

@article{
kalkanli2025dependencyaware,
title={Dependency-aware Maximum Likelihood Estimation for Active Learning},
author={Beyza Kalkanli and Tales Imbiriba and Stratis Ioannidis and Deniz Erdogmus and Jennifer Dy},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=qDVDSXXGK1},
note={}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dependency-aware Maximum Likelihood Estimation for Active Learning

Overview

Features

Repository Structure

Installation

Running Experiments & Datasets

Datasets

Customization

Experiments

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
assets		assets
README.md		README.md
b_py_combined.bash		b_py_combined.bash
data.py		data.py
execute_combined_cpu.bash		execute_combined_cpu.bash
main.py		main.py
model.py		model.py
plot_results_submission.py		plot_results_submission.py
print_results_submission.py		print_results_submission.py
requirements.txt		requirements.txt
utils.py		utils.py

neu-spiral/DMLEforAL

Folders and files

Latest commit

History

Repository files navigation

Dependency-aware Maximum Likelihood Estimation for Active Learning

Overview

Features

Repository Structure

Installation

Running Experiments & Datasets

Datasets

Customization

Experiments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages