Skip to content

Latest commit

 

History

History
152 lines (110 loc) · 7.94 KB

File metadata and controls

152 lines (110 loc) · 7.94 KB

The PHANGS-ALMA Pipeline

Preface

Contents

This is the PHANGS post-processing and science-ready data product pipeline. This pipeline processes data from calibrated visibilities to science-ready spectral cubes and maps. The procedures and background for key parts of the pipeline are discussed in the Astrophysical Journal Supplements Paper PHANGS-ALMA Data Processing and Pipeline. Please consult that paper for more background and details.

What this pipeline is for

This pipeline is devised to process data from radio interferometer observations (from, e.g., ALMA or VLA). It is applied to calibrated visibilities, as those generated by the CASA software, and delivers science-ready spectral cubes and moment maps, along with associated uncertainty maps. In this regard, the PHANGS-ALMA pipeline offers a flexible alternative to the scriptForImaging script distributed by ALMA. A detailed list of the derived data products can be found in Section 7 of the paper mentioned above. The pipeline can also process Total Power data from ALMA.

Pipeline and configuration files

This repository contains the scripts that comprise the PHANGS-ALMA pipeline. Configuration files for a large set of PHANGS projects, including the live version of the files for the PHANGS-ALMA CO survey, exist in a separate repository. We include a frozen set of files that can be used to reduce PHANGS-ALMA as examples here. If you need access to those other repositories or need examples, please request access as needed.

Contact

For issues, the preferred method is to open an issue on the GitHub issues page.

Installation

We recommend installing the pipeline in a separate Conda environment.

The pipeline works in Python>=3.12, CASA>=6.7.3, and is pip installable:

pip install git+https://github.com/phangsTeam/phangs_imaging_scripts.git

Or, if using a local installation:

cd /path/to/phangs_imaging_scripts
pip install -e .

If you are using a monolithic CASA installation, you can run this within the CASA shell. You may need to add pip to PATH to get CASA to install Astropy. For that, you can see details here. Note that if you are running within monolithic CASA and want to make use of the single dish imaging capabilities, you will need to run a pipeline version.

By default, the PHANGS-ALMA pipeline will not install CASA-related packages, and so will by default not be able to image data if running through pure Python. If running inside monolithic CASA, these packages will already exist. If you want CASA capabilities, then install the optional casa dependencies:

pip install phangsPipeline[casa] @ git+https://github.com/phangsTeam/phangs_imaging_scripts.git

For local installations:

cd /path/to/phangs_imaging_scripts
pip install -e '.[casa]'

To check this has installed, in Python you can then import the pipeline:

import phangsPipeline as ppl

You will also need to download analysisUtils. Make sure to grab the latest version, and append the location of these scripts in your PATH.

On the first run, you may get an error about downloading CASA data. In this case, ensure the directory it lists exists and rerun. You can change this data path by editing config.py in ~/.casa.

We maintain an older release of the pipeline here. This is somewhat more agnostic to CASA versions, but is unlikely to work with the latest CASA releases going forwards.

Running the pipeline

There are two ways that this pipeline might be useful. First, it provides an end-to-end path to process calibrated ALMA data (or VLA data) of the sort produced by the scriptForPI script distributed by ALMA into spectral cubes and maps. That end-to-end approach is described in "Workflow for most users." Second, the phangsPipeline directory contains a number of modules for use inside and outside CASA that should have general utility. These are written without requiring any broader awareness of the pipeline infrastructure and should just be generally useful. These are files named casaSOMENAME.py and scSOMEOTHERNAME.py and, to a lesser extent, utilsYETANOTHERNAME.py.

Workflow for most users

If you just want to use the pipeline then you will need to do three things:

  1. Run scriptForPI.py to apply the observatory-provided calibration to your data (this is outside the pipeline remit). The pipeline picks up from there, it does not replace the ALMA observatory calibration and flagging pipeline.
  2. Make configuration files ("key files") that describe your project. Usually you can copy and modify an existing project to get a good start. We provide PHANGS-ALMA as an example.
  3. Run the pipeline scripts

The Easiest Way This release includes the full PHANGS-ALMA set of keys and the scripts we use to run the pipeline for PHANGS-ALMA. These are heavily documented - copy them to make your own script and configuration and follow the documentation in those scripts to get started. To be specific:

  • The PHANGS-ALMA keys to reduce the data end-to-end from the archive are in: phangs-alma_keys/
  • The script to run the pipeline is: run_pipeline_phangs-alma.py

These can run the actual PHANGS-ALMA reduction, though in practice we used slightly more complex versions of a few programs to manage the workflow. Copying and modifying these are your best bet, especially following the patterns in the key files.

A few details on procedure

The full procedure is described in our ApJ Supplements paper and the programs themselves are all in this repository, so we do not provide any extremely detailed docs here. Many individual routines are documented, though we also intend to improve the documentation in the future. Therefore, we just note that broadly, the pipeline runs in four stages:

  1. Staging Stage and process uv-data. This step includes continuum subtraction, line extraction, and spectral regridding.
  2. Imaging Image and deconvolve the uv-data. This runs in several steps: dirty imaging, clean mask alignment, multi-scale deconvolution, re-masking, and single convolution.
  3. Post-Process Process deconvolved data into science-ready data cubes. This stage includes merging with the Total Power and mosaicking.
  4. Derived Products Convolution, noise estimation, masking, and calculation of science-ready data products.

The simplest way to run these is to edit run_pipeline_phangs-alma.py to point at your key files, and run.

Chunked imaging

For large cubes, it may be beneficial to farm out each cube slice to a different machine (within some HPC environment) and work on them in serial. For this, the ImagingChunkedHandler exists. There are some example scripts on how to use this in the scripts directory.

Contents of the pipeline in more detail

Architecture: The pipeline is organized and run by a series of "handler" objects. These handlers organize the list of targets, array configurations, spectral products, and derived moments and execute loops.

The routines to process individual data sets are in individual modules, grouped by theme (e.g., casaImagingRoutines or scNoiseRoutines). These routines do not know about the larger infrastructure of arrays, targets, etc. They generally take an input file, output file, and various keyword arguments.

A project is defined by a series of text key files in a "key_directory". These define the measurement set inputs, configurations, spectral line products, moments, and derived products.

User Control: For the most part the user's job is to define the key files and to run some scripts.