diff --git a/.github/workflows/publish_to_pypi.yml b/.github/workflows/publish_to_pypi.yml index 818c61b..5641017 100644 --- a/.github/workflows/publish_to_pypi.yml +++ b/.github/workflows/publish_to_pypi.yml @@ -90,7 +90,19 @@ jobs: environment: name: dockerhub + permissions: + packages: write + contents: read + attestations: write + id-token: write + steps: + - name: Remove unnecessary files and check disk space + run: | + sudo rm -rf /usr/share/dotnet + sudo rm -rf "$AGENT_TOOLSDIRECTORY" + df . -h + - name: Docker meta id: meta uses: docker/metadata-action@v5 @@ -111,12 +123,12 @@ jobs: - name: Build and push uses: docker/build-push-action@v6 + id: push with: - context: . - file: ./Dockerfile + platforms: linux/amd64,linux/arm64 push: true tags: ${{ steps.meta.outputs.tags }} - labels: ${{ steps.meta.outputs.labels }} + no-cache: true - name: Generate artifact attestation uses: actions/attest-build-provenance@v3 diff --git a/CHANGELOG.md b/CHANGELOG.md index 07c005d..fae5399 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,4 @@ # Changelog - All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), @@ -7,10 +6,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] -## [v1.0.0] - 2025-11-18 +## [v1.0.0] - 2025-12-16 ### Added - Command line interface to transcribe audio with HuggingFace ASR models and export them as TextGrid - Option to do forced alignment with the ASR model's vocabulary and add them as time intervals to TextGrid - Gradio web app as an interactive wrapper around the command line structure -- Unit tests and overall package structure \ No newline at end of file +- Unit tests and overall package structure +- Added Docker image building \ No newline at end of file diff --git a/Dockerfile b/Dockerfile index 45bc562..09fea9a 100644 --- a/Dockerfile +++ b/Dockerfile @@ -17,7 +17,7 @@ WORKDIR /app RUN --mount=type=cache,target=/root/.cache/uv \ --mount=type=bind,source=uv.lock,target=uv.lock \ --mount=type=bind,source=pyproject.toml,target=pyproject.toml \ - uv sync --locked --no-install-project --no-editable + uv sync --locked --no-install-project --no-editable --no-dev # Install our code COPY . /app diff --git a/README.md b/README.md index 06eda89..3838eaa 100644 --- a/README.md +++ b/README.md @@ -5,23 +5,24 @@ Automatically transcribe audio into the International Phonetic Alphabet (IPA) an The AutoIPA project is a collaboration between Virginia Partridge of the UMass Center for Data Science and Artificial Intelligence and Joe Pater of UMass Linguistics. Its goal is to make automated IPA transcription more useful to linguists (and others!). -Our first step was to fine-tune a Wav2Vec 2.0 model on the Buckeye corpus, which you can try out here. -Our next steps will be to extend our work to other varieties of English and other languages. Please reach out to us if you have any questions or comments about our work or have related work to share! More details are on our [project website](https://websites.umass.edu/comphon/autoipa-automated-ipa-transcription/). If you use our software, please cite our AMP paper: -Partridge, Virginia, Joe Pater, Parth Bhangla, Ali Nirheche and Brandon Prickett. 2025/to appear. [AI-assisted analysis of phonological variation in English](https://docs.google.com/presentation/d/1IJrfokvX5T_fKkiFXmcYEgRI2ZRwgFU4zU1tNC-iYl0/edit?usp=sharing). Special session on Deep Phonology, AMP 2025, UC Berkeley. To appear in the Proceedings of AMP 2025. -""" + +> Partridge, Virginia, Joe Pater, Parth Bhangla, Ali Nirheche and Brandon Prickett. 2025/to appear. [AI-assisted analysis of phonological variation in English](https://docs.google.com/presentation/d/1IJrfokvX5T_fKkiFXmcYEgRI2ZRwgFU4zU1tNC-iYl0/edit?usp=sharing). Special session on Deep Phonology, AMP 2025, UC Berkeley. To appear in the Proceedings of AMP 2025. ## Basic Usage -This is project is structured in multiple subpackages based on their different external dependencies: +This project is structured in multiple subpackages based on their different external dependencies: - **autoipaalign.core**: Core library and command-line interface for IPA transcription and forced alignments. Always installed. -- **autoipaalign.compare**: Tools for comparing alignments across different ASR systems, such as whisper and the Montreal Forced Aligner. Install with `pip install autoipaalign[compare]`. -- **autoipaalign.web**: Gradio web interface for interactive transcription. Install with `pip install autoipaalign[compare]`. +- **autoipaalign.compare**: Tools for comparing alignments across different ASR systems, such as whisper and the Montreal Forced Aligner. Install with `pip install autoipaalign[compare]`. You should also install the Montreal Forced Aligner, see instructions under [External Dependencies](#external-dependencies). +- **autoipaalign.web**: Gradio web interface for interactive transcription. Install with `pip install autoipaalign[web]`. ### Basic Installation -TODO: Pip install instructions coming soon. +You can install the `autoipaalign` package with `pip install autoipaalign`. + +We recommend first creating and working in a [Conda Virtual Environment](https://realpython.com/python-virtual-environments-a-primer/#the-conda-package-and-environment-manager) for better integration with Pytorch and the Montreal Forced Aligner. + ### Command-Line Interface The `autoipaalign` command lets you transcribe audio and get TextGrid output files with or without forced alignment. @@ -52,12 +53,27 @@ autoipaalign transcribe --audio-paths audio.wav --output-target output/ --asr.mo ### Web Interface ```bash -python -m autoipaalign_web.app +python -m autoipaalign.web.app ``` Then open your browser to the URL shown in the terminal. ## Advanced Usage +### External Dependencies + +- **Montreal Forced Aligner** (optional, for MFA-based comparisons) should be installed when working with the optional `compare` package. + ```bash + # Install via conda + conda install -c conda-forge montreal-forced-aligner + ``` + +### Comparison Tools +Compare alignments from different ASR systems (coming soon). + + +## Development Environment + + ### Installing the Development Workspace This project is structured using [uv workspaces](https://docs.astral.sh/uv/concepts/projects/workspaces/) based on [this template](https://github.com/konstin/uv-workspace-example-cable/tree/main). @@ -66,27 +82,13 @@ This project is structured using [uv workspaces](https://docs.astral.sh/uv/conce curl -LsSf https://astral.sh/uv/install.sh | sh ``` -2. Clone the repository and install to set up development and testing dependencies:: +2. Clone the repository and install to set up development and testing dependencies: ```bash git clone cd autoipaalign uv sync --all-extras ``` -### External Dependencies - -- **Montreal Forced Aligner** (optional, for MFA-based comparisons) should be installed when working with the optional `compare` package. -TODO: update installation instructions for working wiht - ```bash - # Install via conda - conda install -c conda-forge montreal-forced-aligner - ``` - -### Comparison Tools - -Compare alignments from different ASR systems (documentation coming soon). - - ### Running Tests To run unit tests, you can run `uv run pytest` from the root of the repository or inside any of the package subfolders (e.g. `packages/autoipaalign-core`). @@ -100,3 +102,20 @@ Run these checks as follows: uv run ruff check . uv run ruff format . ``` + +### Building Docker image for the web application +To make it easier to deploy and run the web application on HuggingFace Spaces, the application can be packaged as a [Docker](https://docs.docker.com) image. +We've provided a Dockerfile to build an image for the web app. + +You can build an image named `autoipaalign` by running: +```bash +docker build -t autoipaalign . +``` + +Run a Docker container from this image on port 7860: +```bash +docker run -p 7860:7860 autoipaalign +``` +You can then access the running web application at `http://localhost:7860`. + +A Docker image is built and pushed to the UMass CDSAI Dockerhub at https://hub.docker.com/repository/docker/umasscds/autoipaalign/general each time a new version of the autoipaalign package is released. diff --git a/src/autoipaalign/web/app.py b/src/autoipaalign/web/app.py index d917a0e..0f66b50 100644 --- a/src/autoipaalign/web/app.py +++ b/src/autoipaalign/web/app.py @@ -16,8 +16,9 @@ TITLE = "AutoIPA: Automated IPA transcription" INTRO_BLOCK = f"""# {TITLE} -Experiment with producing phonetic transcriptions of uploaded or recorded audio using Wav2Vec2.0-based automatic -speech recognition (ASR) models! +Experiment with producing +[International Phonetic Alphabet (IPA)](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) transcriptions +of uploaded or recorded audio using Wav2Vec2.0-based automatic speech recognition (ASR) models! The AutoIPA project is a collaboration between Virginia Partridge of the UMass Center for Data Science and Artificial Intelligence and Joe Pater of UMass Linguistics. Its goal is to make automated IPA transcription more useful @@ -250,7 +251,7 @@ def launch_demo(): interactive=True, ) - phone_aligned = gr.Checkbox(label="Add forced-alignments for predictions in their own TextGrid") + phone_aligned = gr.Checkbox(label="Add forced-alignments for predictions in their own TextGrid interval tier") model_state = gr.State(value=initial_model) diff --git a/tests/core_smoke_test.py b/tests/core_smoke_test.py index 5d15ab4..2f595d5 100644 --- a/tests/core_smoke_test.py +++ b/tests/core_smoke_test.py @@ -57,7 +57,8 @@ def test_cli_main_callable(): # Check for expected error message in either stdout or stderr expected_text = "The following arguments are required: {transcribe,transcribe-intervals}" - if expected_text not in stderr_output: + alternate_expected_text = "Expected one of {transcribe, transcribe-intervals}." + if not (expected_text in stderr_output or alternate_expected_text in stderr_output): raise AssertionError(f"Expected error message not found. Output: {stderr_output}") finally: # Ensure stdout/stderr are always restored