Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions .github/workflows/publish_to_pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,19 @@ jobs:
environment:
name: dockerhub

permissions:
packages: write
contents: read
attestations: write
id-token: write

steps:
- name: Remove unnecessary files and check disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
df . -h

- name: Docker meta
id: meta
uses: docker/metadata-action@v5
Expand All @@ -111,12 +123,12 @@ jobs:

- name: Build and push
uses: docker/build-push-action@v6
id: push
with:
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
no-cache: true

- name: Generate artifact attestation
uses: actions/attest-build-provenance@v3
Expand Down
6 changes: 3 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [v1.0.0] - 2025-11-18
## [v1.0.0] - 2025-12-16

### Added
- Command line interface to transcribe audio with HuggingFace ASR models and export them as TextGrid
- Option to do forced alignment with the ASR model's vocabulary and add them as time intervals to TextGrid
- Gradio web app as an interactive wrapper around the command line structure
- Unit tests and overall package structure
- Unit tests and overall package structure
- Added Docker image building
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ WORKDIR /app
RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
uv sync --locked --no-install-project --no-editable
uv sync --locked --no-install-project --no-editable --no-dev

# Install our code
COPY . /app
Expand Down
67 changes: 43 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,24 @@ Automatically transcribe audio into the International Phonetic Alphabet (IPA) an
The AutoIPA project is a collaboration between Virginia Partridge of the UMass Center for Data Science and Artificial
Intelligence and Joe Pater of UMass Linguistics. Its goal is to make automated IPA transcription more useful
to linguists (and others!).
Our first step was to fine-tune a Wav2Vec 2.0 model on the Buckeye corpus, which you can try out here.
Our next steps will be to extend our work to other varieties of English and other languages.
Please reach out to us if you have any questions or comments about our work or have related work to share!
More details are on our [project website](https://websites.umass.edu/comphon/autoipa-automated-ipa-transcription/).

If you use our software, please cite our AMP paper:
Partridge, Virginia, Joe Pater, Parth Bhangla, Ali Nirheche and Brandon Prickett. 2025/to appear. [AI-assisted analysis of phonological variation in English](https://docs.google.com/presentation/d/1IJrfokvX5T_fKkiFXmcYEgRI2ZRwgFU4zU1tNC-iYl0/edit?usp=sharing). Special session on Deep Phonology, AMP 2025, UC Berkeley. To appear in the Proceedings of AMP 2025.
"""

> Partridge, Virginia, Joe Pater, Parth Bhangla, Ali Nirheche and Brandon Prickett. 2025/to appear. [AI-assisted analysis of phonological variation in English](https://docs.google.com/presentation/d/1IJrfokvX5T_fKkiFXmcYEgRI2ZRwgFU4zU1tNC-iYl0/edit?usp=sharing). Special session on Deep Phonology, AMP 2025, UC Berkeley. To appear in the Proceedings of AMP 2025.

## Basic Usage
This is project is structured in multiple subpackages based on their different external dependencies:
This project is structured in multiple subpackages based on their different external dependencies:
- **autoipaalign.core**: Core library and command-line interface for IPA transcription and forced alignments. Always installed.
- **autoipaalign.compare**: Tools for comparing alignments across different ASR systems, such as whisper and the Montreal Forced Aligner. Install with `pip install autoipaalign[compare]`.
- **autoipaalign.web**: Gradio web interface for interactive transcription. Install with `pip install autoipaalign[compare]`.
- **autoipaalign.compare**: Tools for comparing alignments across different ASR systems, such as whisper and the Montreal Forced Aligner. Install with `pip install autoipaalign[compare]`. You should also install the Montreal Forced Aligner, see instructions under [External Dependencies](#external-dependencies).
- **autoipaalign.web**: Gradio web interface for interactive transcription. Install with `pip install autoipaalign[web]`.

### Basic Installation
TODO: Pip install instructions coming soon.
You can install the `autoipaalign` package with `pip install autoipaalign`.

We recommend first creating and working in a [Conda Virtual Environment](https://realpython.com/python-virtual-environments-a-primer/#the-conda-package-and-environment-manager) for better integration with Pytorch and the Montreal Forced Aligner.


### Command-Line Interface
The `autoipaalign` command lets you transcribe audio and get TextGrid output files with or without forced alignment.
Expand Down Expand Up @@ -52,12 +53,27 @@ autoipaalign transcribe --audio-paths audio.wav --output-target output/ --asr.mo

### Web Interface
```bash
python -m autoipaalign_web.app
python -m autoipaalign.web.app
```
Then open your browser to the URL shown in the terminal.

## Advanced Usage

### External Dependencies

- **Montreal Forced Aligner** (optional, for MFA-based comparisons) should be installed when working with the optional `compare` package.
```bash
# Install via conda
conda install -c conda-forge montreal-forced-aligner
```

### Comparison Tools
Compare alignments from different ASR systems (coming soon).


## Development Environment


### Installing the Development Workspace
This project is structured using [uv workspaces](https://docs.astral.sh/uv/concepts/projects/workspaces/) based on [this template](https://github.com/konstin/uv-workspace-example-cable/tree/main).

Expand All @@ -66,27 +82,13 @@ This project is structured using [uv workspaces](https://docs.astral.sh/uv/conce
curl -LsSf https://astral.sh/uv/install.sh | sh
```

2. Clone the repository and install to set up development and testing dependencies::
2. Clone the repository and install to set up development and testing dependencies:
```bash
git clone <repository-url>
cd autoipaalign
uv sync --all-extras
```

### External Dependencies

- **Montreal Forced Aligner** (optional, for MFA-based comparisons) should be installed when working with the optional `compare` package.
TODO: update installation instructions for working wiht
```bash
# Install via conda
conda install -c conda-forge montreal-forced-aligner
```

### Comparison Tools

Compare alignments from different ASR systems (documentation coming soon).


### Running Tests

To run unit tests, you can run `uv run pytest` from the root of the repository or inside any of the package subfolders (e.g. `packages/autoipaalign-core`).
Expand All @@ -100,3 +102,20 @@ Run these checks as follows:
uv run ruff check .
uv run ruff format .
```

### Building Docker image for the web application
To make it easier to deploy and run the web application on HuggingFace Spaces, the application can be packaged as a [Docker](https://docs.docker.com) image.
We've provided a Dockerfile to build an image for the web app.

You can build an image named `autoipaalign` by running:
```bash
docker build -t autoipaalign .
```

Run a Docker container from this image on port 7860:
```bash
docker run -p 7860:7860 autoipaalign
```
You can then access the running web application at `http://localhost:7860`.

A Docker image is built and pushed to the UMass CDSAI Dockerhub at https://hub.docker.com/repository/docker/umasscds/autoipaalign/general each time a new version of the autoipaalign package is released.
7 changes: 4 additions & 3 deletions src/autoipaalign/web/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@
TITLE = "AutoIPA: Automated IPA transcription"

INTRO_BLOCK = f"""# {TITLE}
Experiment with producing phonetic transcriptions of uploaded or recorded audio using Wav2Vec2.0-based automatic
speech recognition (ASR) models!
Experiment with producing
[International Phonetic Alphabet (IPA)](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) transcriptions
of uploaded or recorded audio using Wav2Vec2.0-based automatic speech recognition (ASR) models!

The AutoIPA project is a collaboration between Virginia Partridge of the UMass Center for Data Science and Artificial
Intelligence and Joe Pater of UMass Linguistics. Its goal is to make automated IPA transcription more useful
Expand Down Expand Up @@ -250,7 +251,7 @@ def launch_demo():
interactive=True,
)

phone_aligned = gr.Checkbox(label="Add forced-alignments for predictions in their own TextGrid")
phone_aligned = gr.Checkbox(label="Add forced-alignments for predictions in their own TextGrid interval tier")

model_state = gr.State(value=initial_model)

Expand Down
3 changes: 2 additions & 1 deletion tests/core_smoke_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ def test_cli_main_callable():

# Check for expected error message in either stdout or stderr
expected_text = "The following arguments are required: {transcribe,transcribe-intervals}"
if expected_text not in stderr_output:
alternate_expected_text = "Expected one of {transcribe, transcribe-intervals}."
if not (expected_text in stderr_output or alternate_expected_text in stderr_output):
raise AssertionError(f"Expected error message not found. Output: {stderr_output}")
finally:
# Ensure stdout/stderr are always restored
Expand Down
Loading