Skip to content

rudolffu/euclidkit

Repository files navigation

euclidkit

PyPI version Read the Docs

A comprehensive Python package for Euclid archival data analysis, designed for use within the ESA Datalabs environment.

Overview

euclidkit facilitates advanced data exploration and visualization for Euclid Q1/(I)DR1 archival releases, including:

  • Data Access: Query and crossmatch sources with the Euclid MER catalogue
  • Spectroscopic Analysis: Access, download, and combine NISP spectra of archival sources
  • Unified Workflow: Streamlined tools for researchers working with Euclid spectroscopic data

The package is designed for efficient archive querying and Euclid spectrum compilation workflows.

Installation

Requirements

  • Python 3.11+
  • Access to ESA Datalabs environment (for data volumes)
  • COSMOS credentials for Euclid archive access

Basic Installation

pip install euclidkit

Development Installation

git clone https://github.com/rudolffu/euclidkit.git
cd euclidkit
pip install -e .

Quick Start

Setup Credentials

Store credentials in a private file under your home directory and restrict permissions:

mkdir -p ~/.euclidkit
touch ~/.euclidkit/.cred.txt
chmod 600 ~/.euclidkit/.cred.txt

Edit ~/.euclidkit/.cred.txt manually with your preferred editor (do not put credentials in shell history).

Use two lines:

  1. COSMOS username
  2. COSMOS password

Configuration

Create and edit the user config file:

euclidkit init-config --output ~/.euclidkit/euclidkit_config.yaml --template basic

Then edit ~/.euclidkit/euclidkit_config.yaml and set the credential path.

Set the credential path in the config:

data:
  credentials_file: /home/<user>/.euclidkit/.cred.txt

Basic Usage

# Note: the Python import path is currently still `euclidkit`.
from euclidkit.core.data_access import EuclidArchive

# Initialize archive connection
archive = EuclidArchive(environment='PDR')
archive.login()

# Crossmatch your sources with Euclid MER catalogue
results = archive.crossmatch_sources(
    user_table="my_sources.csv",
    radius=1.0,  # arcseconds
    output_file="crossmatch_results.fits"
)

# Query for available spectra
spectra_table = archive.query_spectra_sources(
    crossmatch_table=results,
    output_file="spectra_sources.fits"
)

# Combine spectra into a single FITS file
combined_file = archive.combine_spectra_to_fits(
    spectra_table=spectra_table,
    output_file="my_combined_spectra.fits"
)

Command Line Interface

Crossmatching Sources

# Crossmatch user table with Euclid MER catalogue
euclidkit crossmatch \
    --input my_sources.csv \
    --output crossmatch_results.fits \
    --radius 1.0 \
    --verbose

# Submit the entire table as a single async job (no batching). The output file
# will contain TAP job metadata instead of immediate crossmatch results.
euclidkit crossmatch \
    --input my_sources.csv \
    --output crossmatch_results.fits \
    --full-async

# When using the IDR environment the command defaults to the WIDE field and
# writes results to wide_<filename>. Use --idr-field DEEP to query the deep stack:
euclidkit crossmatch \
    --input my_sources.csv \
    --output crossmatch_results.fits \
    --environment IDR \
    --idr-field DEEP

Uploading Tables

# Upload a FITS table to your Euclid TAP workspace
euclidkit upload-table \
    --input my_sources.fits \
    --table-name my_workspace_table \
    --description "Sources awaiting deep crossmatch" \
    --overwrite

# Upload CSV data as-is (format inferred automatically)
euclidkit upload-table \
    --input trimmed_sources.csv \
    --table-name trimmed_sources

Querying Spectra

# Query spectra from crossmatch results
euclidkit query-spectra \
    --crossmatch crossmatch_results.fits \
    --output spectra_sources.fits \
    --verbose

# Query spectra by object IDs and auto-combine
euclidkit query-spectra \
    --object-ids 123456,789012,345678 \
    --output spectra_sources.fits \
    --combine-output my_spectra.fits \
    --max-spectra 100 \
    --verbose

Building Cutana Input

# Build Cutana CSV from a source table with object_id or ra/dec columns
euclidkit query-cutana \
    --sources my_sources.fits \
    --output cutana_input.csv \
    --instrument VIS \
    --cutout-size arcsec \
    --cutout-size-value 15

# NISP example with explicit filters
euclidkit query-cutana \
    --sources my_sources.fits \
    --output cutana_input_nisp.csv \
    --instrument NISP \
    --nisp-filters NIR_Y,NIR_H \
    --environment IDR \
    --idr-field DEEP \
    --cutout-size arcsec \
    --cutout-size-value 15

Compiling Spectra

# Compile individual spectra into chunked FITS files
euclidkit compile-spectra \
    --spectra-table spectra_sources.fits \
    --output-dir ./output \
    --prefix compiled_spectra \
    --max-extensions 1000 \
    --verbose

Note: for canonical compilation from local Datalabs FITS volumes, --workers 2 is often not faster due to shared-storage I/O contention. Prefer --workers 1 unless benchmarking on your setup shows a clear gain.

Key Features

Data Archive Integration

  • Multiple Environments: Support for PDR, IDR, OTF, and REG archive environments
  • Efficient Queries: Batch processing with TAP table uploads for large datasets
  • Crossmatching: Position-based matching with configurable search radius

Spectroscopic Tools

  • Spectrum Access: Direct access to Euclid data volumes on ESA Datalabs
  • FITS Compilation: Combine individual spectra into multi-extension FITS files
  • Metadata Preservation: Maintain source IDs, coordinates, and provenance information

Analysis Pipeline

  • Quality Control: Spectrum validation and quality assessment

Data Environment

ESA Datalabs Integration

This package is optimized for the ESA Datalabs environment with direct access to:

  • Euclid Q1 Data: /data/euclid_q1/ (35 TB volume)

API Reference

Core Classes

EuclidArchive

Main interface to the Euclid science archive.

archive = EuclidArchive(environment='PDR')
archive.login(credentials_file='~/.euclidkit/.cred.txt')

# Crossmatch sources
results = archive.crossmatch_sources(
    user_table="sources.csv",
    radius=1.0,
    output_file="results.fits"
)

# Query spectra
spectra = archive.query_spectra_sources(
    crossmatch_table=results,
    output_file="spectra.fits"
)

# Get individual spectrum
spectrum_hdu = archive.get_individual_spectrum(
    datalabs_path="/data/euclid_q1/path",
    file_name="spectrum_file.fits", 
    hdu_index=42
)

# Combine spectra
combined = archive.combine_spectra_to_fits(
    spectra_table=spectra,
    output_file="combined.fits",
    max_spectra=1000
)

SpectrumCompiler

Advanced spectrum compilation with chunking support.

from euclidkit.core.spectra import SpectrumCompiler

compiler = SpectrumCompiler(max_extensions=1000)

# Compile into chunked files
output_files = compiler.compile_spectra(
    spectra_table=spectra_table,
    output_dir="./output",
    output_prefix="compiled_spectra"
)

# Create single FITS file
single_file = compiler.compile_single_fits(
    spectra_table=spectra_table,
    output_file="all_spectra.fits"
)

# Generate metadata table
metadata = compiler.create_metadata_table(
    spectra_table=spectra_table,
    output_files=output_files,
    output_dir="./output"
)

Workflow Examples

Complete Spectroscopic Analysis Pipeline

from euclidkit.core.data_access import EuclidArchive
from euclidkit.core.spectra import SpectrumCompiler
import pandas as pd

# 1. Initialize archive
archive = EuclidArchive(environment='PDR')
archive.login()

# 2. Load your QSO candidates
qso_candidates = pd.read_csv('qso_candidates.csv')

# 3. Crossmatch with Euclid MER catalogue
crossmatches = archive.crossmatch_sources(
    user_table=qso_candidates,
    radius=2.0,  # 2 arcsecond radius
    output_file='qso_crossmatches.fits'
)

# 4. Find available spectra
spectra_sources = archive.query_spectra_sources(
    crossmatch_table=crossmatches,
    output_file='qso_spectra_sources.fits'
)

print(f"Found {len(spectra_sources)} spectra for {len(crossmatches)} crossmatches")

# 5. Create combined FITS file (for small samples)
if len(spectra_sources) <= 1000:
    combined_spectra = archive.combine_spectra_to_fits(
        spectra_table=spectra_sources,
        output_file='qso_combined_spectra.fits'
    )
    print(f"Combined spectra saved to: {combined_spectra}")

# 6. Or use chunked compilation for large samples
else:
    compiler = SpectrumCompiler(max_extensions=2000)
    output_files = compiler.compile_spectra(
        spectra_table=spectra_sources,
        output_dir='./spectra_chunks',
        output_prefix='qso_spectra'
    )
    print(f"Created {len(output_files)} chunked files")

archive.logout()

Diagnostics

Check your installation and environment:

# Check all components
euclidkit diagnostics

# Check specific components
euclidkit diagnostics --check-deps --check-data

Archive Environments

  • PDR: Public Data Release
  • IDR: Internal Data Release (only accessible to Euclid Consortium members)

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Documentation

For detailed documentation and examples, visit:

Support

Author

Yuming Fu (@rudolffu)

License

This project is licensed under the GNU General Public License - see the LICENSE file for details.

Acknowledgments

  • ESA Euclid Mission and Euclid Consortium
  • ESA Datalabs and Euclid Data Space infrastructure team
  • Astropy and astroquery communities

Changelog

Latest Changes

  • Spectroscopic Pipeline: Complete pipeline for accessing and combining Euclid spectra
  • CLI Integration: Added --combine-output option to query-spectra command
  • TAP Upload: Improved query performance using TAP table uploads
  • FITS Compilation: Efficient multi-extension FITS file creation
  • Error Handling: Robust handling of long filenames and missing data

See CHANGELOG.md for detailed version history.

About

A comprehensive Python package for Euclid archival data analysis

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages