Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ The Python API (`idtap`) is a sophisticated client library for interacting with
- **Integration tests**: `python python/api_testing/api_test.py` (requires live server auth)
- Test structure: Complete coverage of data models, client functionality, and authentication

**⚠️ IMPORTANT FOR CLAUDE: Before running the full test suite (`pytest idtap/tests/`), ALWAYS warn Jon first!**
- Some tests may require browser authorization for OAuth authentication
- Running tests without warning can waste time waiting for authorization that Jon doesn't realize is needed
- Best practice: Ask "Ready to run the full test suite? (May require browser authorization)" before executing

### Build/Package/Publish - AUTOMATED via GitHub Actions
**⚠️ IMPORTANT: Manual publishing is now automated. See "Automated Version Management" section below.**

Expand Down
5 changes: 4 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@ build = "*"
twine = "*"
requests-toolbelt = "*"
pyhumps = "*"
idtap = "*"
idtap = "==0.1.34"
numpy = "*"
pillow = "*"
matplotlib = "*"

[dev-packages]
responses = "*"
Expand Down
1,146 changes: 875 additions & 271 deletions Pipfile.lock

Large diffs are not rendered by default.

9 changes: 8 additions & 1 deletion docs/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Musical transcription data models:

transcription-models
audio-models
spectrogram

Utilities
---------
Expand Down Expand Up @@ -54,4 +55,10 @@ Audio Management

* :class:`idtap.AudioMetadata` - Audio file metadata
* :class:`idtap.AudioUploadResult` - Upload response
* :class:`idtap.Musician` - Performer information
* :class:`idtap.Musician` - Performer information

Spectrogram Analysis
~~~~~~~~~~~~~~~~~~~~

* :class:`idtap.SpectrogramData` - CQT spectrogram data and visualization
* :data:`idtap.SUPPORTED_COLORMAPS` - Available colormap names
90 changes: 90 additions & 0 deletions docs/api/spectrogram.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
Spectrogram Analysis
====================

Spectrogram data access and visualization for audio analysis.

.. currentmodule:: idtap

SpectrogramData
---------------

The :class:`SpectrogramData` class provides comprehensive access to Constant-Q Transform (CQT)
spectrograms for computational musicology and audio analysis.

.. autoclass:: SpectrogramData
:members:
:undoc-members:
:show-inheritance:

Key Features
~~~~~~~~~~~~

* **Constant-Q Transform (CQT)** - Log-spaced frequency bins for musical analysis
* **Intensity Transformation** - Power-law contrast enhancement (1.0-5.0)
* **Colormap Support** - 35+ matplotlib colormaps
* **Frequency/Time Cropping** - Extract specific frequency ranges or time segments
* **Matplotlib Integration** - Plot on existing axes for overlays with pitch contours
* **Image Export** - Save as PNG, JPEG, WebP, etc.

Quick Examples
~~~~~~~~~~~~~~

Load and display a spectrogram::

from idtap import SwaraClient, SpectrogramData

client = SwaraClient()
spec = SpectrogramData.from_audio_id("audio_id_here", client)

# Save basic visualization
spec.save("output.png", power=2.0, cmap='viridis')

Create matplotlib overlay with pitch contour::

import matplotlib.pyplot as plt

# Load spectrogram and piece data
spec = SpectrogramData.from_piece(piece, client)

# Create figure
fig, ax = plt.subplots(figsize=(12, 6))

# Plot spectrogram as underlay with transparency
im = spec.plot_on_axis(ax, power=2.0, cmap='viridis', alpha=0.7, zorder=0)

# Overlay pitch contour
times = [traj.start_time for traj in piece.trajectories]
pitches = [traj.pitch_contour[0] for traj in piece.trajectories]
ax.plot(times, pitches, 'r-', linewidth=2, zorder=1)

# Configure axes
ax.set_xlabel('Time (s)')
ax.set_ylabel('Frequency (Hz)')
plt.colorbar(im, ax=ax, label='Intensity')

plt.savefig('overlay.png', dpi=150, bbox_inches='tight')

Crop to specific region::

# Extract 200-800 Hz range, first 10 seconds
cropped = spec.crop_frequency(200, 800).crop_time(0, 10)
cropped.save("cropped.png", power=2.5, cmap='magma')

Supported Colormaps
~~~~~~~~~~~~~~~~~~~

.. autodata:: SUPPORTED_COLORMAPS
:annotation:

Available colormaps include: viridis, plasma, magma, inferno, hot, cool, gray, and many more.
See the matplotlib colormap documentation for visual examples.

Technical Details
~~~~~~~~~~~~~~~~~

* **Algorithm**: Essentia NSGConstantQ (Non-Stationary Gabor Constant-Q Transform)
* **Default Frequency Range**: 75-2400 Hz
* **Default Bins Per Octave**: 72 (high resolution for microtonal analysis)
* **Data Format**: uint8 grayscale (0-255), gzip-compressed
* **Time Resolution**: ~0.0116 seconds per frame (typical)
* **Frequency Scale**: Logarithmic (perceptually-uniform for music)
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Features
* **OAuth Authentication** - Secure Google OAuth integration with token storage
* **Rich Data Models** - Comprehensive classes for musical transcription data
* **Audio Management** - Upload, download, and manage audio files
* **Spectrogram Analysis** - CQT spectrogram visualization with matplotlib integration
* **Export Capabilities** - Export transcriptions to JSON and Excel formats
* **Permissions System** - Manage public/private visibility and sharing

Expand Down
4 changes: 4 additions & 0 deletions idtap/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from .classes.trajectory import Trajectory

from .enums import Instrument
from .spectrogram import SpectrogramData, SUPPORTED_COLORMAPS
from .audio_models import (
AudioMetadata,
AudioUploadResult,
Expand Down Expand Up @@ -74,6 +75,9 @@
"Trajectory",
"Instrument",
"login_google",
# Spectrogram
"SpectrogramData",
"SUPPORTED_COLORMAPS",
# Audio upload classes
"AudioMetadata",
"AudioUploadResult",
Expand Down
24 changes: 24 additions & 0 deletions idtap/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -684,6 +684,30 @@ def download_and_save_transcription_audio(self, piece: Union[Dict[str, Any], Pie
# Save file and return path
return self.save_audio_file(audio_data, filename, filepath)

def download_spectrogram_data(self, audio_id: str) -> bytes:
"""Download gzip-compressed spectrogram data.

Args:
audio_id: The audio recording ID

Returns:
Gzipped binary data containing uint8 spectrogram array
"""
endpoint = f"spec_data/{audio_id}/spec_data.gz"
return self._get(endpoint)

def download_spectrogram_metadata(self, audio_id: str) -> Dict[str, Any]:
"""Download spectrogram shape metadata.

Args:
audio_id: The audio recording ID

Returns:
Dictionary with 'shape' key: [freq_bins, time_frames]
"""
endpoint = f"spec_data/{audio_id}/spec_shape.json"
return self._get(endpoint)

def save_transcription(self, piece: Piece, fill_duration: bool = True) -> Any:
"""Save a transcription piece to the server.

Expand Down
Loading
Loading