Skip to content

0.2.138 Release#393

Open
dancoates wants to merge 68 commits intomainfrom
draft-feb-26
Open

0.2.138 Release#393
dancoates wants to merge 68 commits intomainfrom
draft-feb-26

Conversation

@dancoates
Copy link

@dancoates dancoates commented Feb 1, 2026

Please do not forget to do terraform apply before deployment.

chrisvittal and others added 25 commits December 9, 2025 16:20
Primarily for dependabot alerts about urllib3

## Security Assessment
- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating
- This change has a low security impact

### Impact Description
Dependency update to resolve security issue.

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
…hail-is#15205)

## Change Description

Fixes hail-is/hail-security#67

Currently, worker VMs are created with external IP addresses. However,
there is no operational reason why they actually need those, as we don't
actually need/want our workers to be accessible from the public
internet. This change updates our worker VM config (sent to GCP for
worker creation) to exclude giving an external IP.

## Security Assessment
- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating
- This change has a medium security impact

### Impact Description

While removing public IPs from our workers makes things generally more
secure, this change still relates to changing our worker VM
configuration and communication.

### Appsec Review

- [ ] Required: The impact has been assessed and approved by appsec
What?

- Refactors hail/build.mill to use mill's multi-file builds.
- Organises dependencies via their maven coordinate to avoid typing full
coordinate each time.
- Uses scala 3 syntax

Why?

- modules like `shadeazure` and `ir-gen` are independent of the root
module's cross value
- listing all mvn coordinates is useful for a subsequent change where I
define boms for various pyspark distributions.
- Scala 3 syntax is a lot nicer. I don't think the overheads are
particularly strenuous.

This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
## Change Description

Stop mirroring back an invalid "next" page specified at login.

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating
- This change has a low security impact


### Impact Description

Prevents a potential reflected XSS

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
…5210)

A user reported a 500 when trying to log in on zulip ([#Hail Batch
support > unable to log in -- 500
error](https://hail.zulipchat.com/#narrow/channel/223457-Hail-Batch-support/topic/unable.20to.20log.20in.20--.20500.20error/with/563247607)).

We run into this internal server error because we were failing
[this](https://github.com/hail-is/hail/blob/5d54486fcac06f6b7b2e8af380f812d1552343fc/auth/auth/auth.py#L438)
assertion, as were not handling the 'inactive' status. This change adds
the 'inactive' status to the list of errors, and also makes sure to set
an http error status when rendering the `account-errors.html` template.

## Security Assessment
- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating
- This change has low security impact

### Impact Description
Change the response codes to one portion of the login flow, and make it
so that the server doesn't fail when handling login of an inactive user.

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
Replace generated `build-info.properties` with a generated Scala package
object.
Renamed terms to be less shouty.
Retired useless `BuildConfiguration` enum.

This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
## Change Description

Updates cloudnat to have a more realistic number of ports, allow the
number to float, and add routers and nats for all regions where worker
VMs might be created

## Security Assessment

Delete all except the correct answer:
- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a medium security impact

### Impact Description

Networking change, but in analogy to what already exists, and in the
service of adding a secure layer between our worker vms and the outside
world


### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
Updates the dev deploy build script to link us out to the
`batch.hail.is` UI pages rather than `ci.hail.is` pages.

This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
Later versions of scalac will issue the following warning:
```
object AnyRefMap in package mutable is deprecated (since 2.13.16): Use scala.collection.mutable.HashMap instead for better performance.
```
Seems fair enough. Spark have also followed suit: 
https://gitmemories.com/apache/spark/issues/48128

This change has no impact on the Broad-managed hail batch deployment in
GCP.
## Change Description

Sets the max connections per VM higher, and makes it a power of 2, as
required by the API

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Small config change to un-break the terraform file and give more
connection space to VMs

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
## Change Description

Fixes a bunch of the easy-to-fix 2.13 deprecation warnings.

## Security Assessment
- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
…15231)

This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
## Change Description

Add documentation for refreshing the trivy-scanner gsa key

## Security Assessment

- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
TUnion was never implemented and so is safe to delete.  
This change does not impact the Broad-managed hail batch deployment in
GCP.
## Change Description

Python dependency bump

## Security Assessment
This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a medium security impact

### Impact Description

Dependency updates

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
There have been open FIXMEs to remove this EType in favor of custom
value readers and writers. This change does that.

## Security Assessment
- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
Use the regular backend `persist`/`unpersist` for BlockMatrix persist
rather than a call to the nonexistent `persist_blockmatrix` that was
added in hail-is#12864.

Fixes hail-is#15229 

## Security Assessment
- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
While this was convenient for 2.12 (build-in method for 2.13), it was
triggering a pattern compilation for every fileListEntry. Lifting the
pattern out removes the need for this utility.

This change does not impact the blah blah blah
LiftMeOut takes a non-strict argument.
This change does not affect the broad-managed batch service in gcp.
Non-functional change only.
Quality of life change for me while I scan through IR generation
and try to determine which child IRs are strict or non-strict and
update their corresponding block-args in the pretty printer.

This change does not affect the broad-managed batch service in gcp.
…15237)

This message can contain '...' or "..." in different locales.

## Security Assessment

[Edited by Chris L:

- This change might impact the Hail Batch instance as deployed by Broad
Institute in GCP

This change prevents an accidental DOS on our worker VMs whereby the
docker pull mechanism gets stuck in an infinite loop of retrying futile
requests.
Adding for completeness.

This operation is used internally by the compiler and isn't exposed to
python. We were never getting an assertion from the prettyprinter as
this operation is generated and immediately executed in
LowerAndExecuteShuffles.

No affect batch.
Adds blockargs for AggFilter
This change does not affect the broad-managed batch service in GCP.
Missing AggGroupBy blockargs.
This change cannot affect the broad-managed batch service in gcp.
patrick-schultz and others added 4 commits February 3, 2026 10:12
…is#15238)

## Change Description

Use the wartremover scala compiler plugin to enforce using the
`override` keyword when implementing/overriding a method from a parent
class. This is generally considered a best practice, and e.g. helps with
refactoring when deleting a method or changing its signature, as any
implementing/overriding methods in child classes will now produce
compiler errors.

## Security Assessment
- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
## Change Description

Updates worker VMs to only use ipv4 network stacks.

This:
- Makes more sense given that they are now behind an ipv4 only cloudNAT
- Will hopefully prevent a race condition causing some docker accesses
to fail

Brief description and justification of what this PR is doing.

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Unlikely to have any security impact. At the margins it's an improvement
because it reduces the number of interfaces that the VM opens, and
forces the expected "ipv4 interface behind a cloud NAT" networking
model.


### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
## Change Description

Deletes the vestigial `EmitRegion` class. Once upon a time, it was used
to package together a MethodBuilder, and the Region argument to the
method (which was usually the first argument). But now it confuses more
than it helps, like in the common case where we also pass a CodeBuilder,
which itself has a reference to a MethodBuilder.

## Security Assessment
- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP
…5257)

This change does not affect the broad-managed batch service in gcp.
ehigham and others added 14 commits February 13, 2026 18:20
This is a fun change...

Starts the process of modularising the scala code by extracting "utils"
into its own module.
Some utility classes have been moved into the `is.hail.collection`
package, mirroring scala's stdlib. Most of the noise in this PR comes
from changes to imports.

There are no functional changes herein, only moves and deletes of unused
code.
For now, the tests remain in the root hail module. I need to think how
to move the utils tests into the new module without breaking CI.

Dependency declaration and duplication to be handled in a subsequent
change.

This change has no impact on the Broad-managed hail batch deployment in
GCP
## Change Description

Another minor bug fix following the dockerhub proxy change

## Security Assessment

- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP

(just a test change)
## Change Description

Reconciles the `gcp-broad/main.tf` terraform file with the current
system state, and applies similar fixes to the more generic `gcp/`
edition of the file. Also a little tidy-up now the ukbb is removed.

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

No actual impact, we are reconciling with current state not applying any
changes

### Appsec Review

- [ ] Required: The impact has been assessed and approved by appsec
## Change Description

Adds an auto-refresh to batch and job status pages. 

### Screenshot Examples:

Enabled:
<img width="569" height="174" alt="image"
src="https://github.com/user-attachments/assets/ecbc829e-6bf1-42b4-8492-df712d7ec9a1"
/>

Disabled:
<img width="568" height="167" alt="image"
src="https://github.com/user-attachments/assets/a6d1f70a-09df-4367-9d6e-7be906f5cb3a"
/>

Completed already (no auto refresh):
<img width="572" height="150" alt="image"
src="https://github.com/user-attachments/assets/24f30a27-80fe-4430-a74c-af449fbb8733"
/>

## Security Assessment

Delete all except the correct answer:
- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Entirely browser-side javascript loaded from a static js file.
Functionality is just to refresh a page automatically every 60 seconds

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
This change does not affect the broad-managed batch service in gcp.
Avoid using names in the compiled function cache so that
alpha-equivalent IRs share a cache entry.
Compile requires the ir argument is `Compilable`. This is already
enforced by the `compileLowerer`.

This change has no impact on the broad-managed batch service in gcp.
Interpreting `LiftMeOut` seems like a bad idea as these get lifted to
`ReleationalLet`s. Require that they've been lifted and evaluated before
the IR can be interpreted.
This change does not affect the broad-managed batch service in gcp.
IRandomness was used by the previous random implementation.
It's use was limited to tests and distributedSort and can be
replaced with the ThreefryRandomEngine in all cases.

This change does not affect the broad-managed batch service 
in gcp.
## Change Description

Updates pip dependencies

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Good housekeeping to stay ahead of any potential vulnerabilities

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
…is#15293)

## Change Description

Address an error in check_pip_requirements. Our requirements got
invalided because virtualenv 20.37 got yanked after we pinned to it.

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Fixes dependency mismatch

### Appsec Review

- [ ] Required: The impact has been assessed and approved by appsec
## Change Description

Fixes hail-is#15145. We were accidentally using the python built-in `id`
function to build logs instead of using the batch/job id that was
clearly intended. In job_private where we were at least using the right
`id`, the shadowing of a python built-in was unnecessary since the id
tuple is trivial to recreate, so do that instead there as well.

## Security Assessment

Delete all except the correct answer:
- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Fixes some bad logic while creating a log message (see original bug
report)

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
## Change Description

Adds a little helper message when users have their accounts marked as
inactive

Example screenshot (with hail-team@broadinstitute.org wired through as
the support email):
<img width="747" height="252" alt="image"
src="https://github.com/user-attachments/assets/aed40027-fb54-470d-9640-de4e592b5f46"
/>

Sample prerendered email content:
<img width="351" height="299" alt="image"
src="https://github.com/user-attachments/assets/f8745535-55e5-4334-975e-a04a4320301e"
/>


## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating
- This change has a medium security impact

### Impact Description

- Passes some config through to the inactive users page so that we can
give them a more helpful message about why they were inactivated
- Renders mailto links which include url-encoded usernames and emails to
prepopulate the email message. I believe this is not going to be
exploitable because (1) we have restrictions at input-time on those
fields and (2) the user with an exploit email is only going to be able
to render the email message at themself. But, @hail-is/appsec please
verify this!

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
@jmarshall
Copy link

In order to attempt a dev deploy, we applied the terraform today as of d5af360.

TomConner and others added 11 commits February 23, 2026 14:44
## Change Description

Codacy flags `assert` even in test code. This config file should make
Codacy exclude `test` directories from its scans.

## Security Assessment

- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP

---------

Co-authored-by: Chris Llanwarne <cjllanwarne@users.noreply.github.com>
## Change Description

Removes the broken azure FS checks to unblock CI.

Note: Doesn't go the next logical step further and remove the azure FS
code itself. That can be discussed as a follow up. This is just removing
the tests that are broken, specifically to unblock CI

## Security Assessment

- This change cannot impact the Hail Batch instance as deployed by Broad
Institute in GCP

### Impact Rating

- This change has no security impact

### Impact Description

Removes tests for a backend FS which is not enabled in GCP.

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
## Change Description

Fixes hail-is#14967 . Removes a redundant LEFT JOIN in the job queries which
was (1) never used and (2) causing job rows to be duplicated based on
entries in the attempts table

### Screenshot Examples

#### Before:

Duration search:
<img width="1526" height="385" alt="image"
src="https://github.com/user-attachments/assets/b556ab63-9c18-447e-88b4-72464d52d489"
/>

Free text search:
<img width="1522" height="401" alt="image"
src="https://github.com/user-attachments/assets/6740cdae-20fd-4ede-9ec7-88742bedd58d"
/>

#### After:
Duration search:
<img width="1535" height="438" alt="image"
src="https://github.com/user-attachments/assets/36a9fcb7-171b-465c-9756-63c83bb011ba"
/>

Free text search:
<img width="1522" height="415" alt="image"
src="https://github.com/user-attachments/assets/113497d7-002b-4278-914c-450e98abb335"
/>


## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Removes a LEFT JOIN which was unused in the actual query logic but which
caused duplication in result rows.

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
## Change Description

Fixes another instance of our deploy batch failing because of a
rewritten docker image

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Low - to - none. Just updates a test expectation.

### Appsec Review

- [ ] Required: The impact has been assessed and approved by appsec
## Change Description

- Replaces links to the old ci batch details page with links to the
newer batch/batches page.
- Redirects check details from PRs to the PR details page instead of the
build details page.
- The PR details still has all build jobs in the same layout as the ci
batch details page did, plus more information, plus links to the build
batch if needed, plus restart links

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Removes one page and updates links to other, already existing, pages.

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
## Change Description

Fixes hail-is#15230 

Adds options to filter jobs by exit code

## Security Assessment

This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Updated to SQL queries in analogy to existing query functions. Handles
validated user input coerced into a strict integer requirement before
use

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec
## Change Description

Addresses some issues with the recent CI PR redirecting by:

- Making ci.hail.is/batches/batchid links redirect to the PR page, IFF
the CI batch being linked to is part of an active PR (otherwise
continues to the historical batch page)
- Making the PR page work even after PRs are closed. It does this by
making API / GH calls directly to build state rather than relying on
in-memory state. So we can still access PR info and batch history even
after PRs merge.

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

- Impacts the CI service which is developer only. 
- Adds some helper redirections - only for developers - and displaying
any content at the redirect target still relies on their original
permissions
- Adds some logic to allow PR pages to display content from
already-closed PRs as well as still-open PRs

### Appsec Review

- [ ] Required: The impact has been assessed and approved by appsec
…15273)

## Change Description

Fixes hail-is#15011 

Google vends a tool for parallelized file downloads in their Python
client library:
https://docs.cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.transfer_manager.html
Using this instead of our copier code to download from GCS buckets to
local storage lets us download larger files which our copier code cannot
handle without frequent timeouts & failures.

I compared the performance when a job downloads an input file using the
copier vs using the Google library:

| File size | Copier code | transfer_manager |
|--------|--------|--------|
| 500B | 1s | 0.8s |
| 5M | 0.6s | 0.8s |
| 1G | 50s | 2.7s |
| 5G | 100s | 13.8s |
| 10G | 219s | 27s |
| 21G | failed | 54s |
| 54G | failed | 135s |

Performance is similar for smaller files with variance mostly due to
network bandwidth. At 1G, the copier starts to hit transient errors on
some chunks & retries. Beyond 20G, jobs using the copier failed in the
input stage after an indeterminate length of time.

## Security Assessment

Delete all except the correct answer:
- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

Delete all except the correct answer:
- This change has a medium security impact

### Impact Description

Replace this content with a description of the impact of the change:

This change instantiates a new Google client to download files from GCS
buckets. It uses the same local credential files as the existing custom
clients we've created.

### Appsec Review

- [x] Required: The impact has been assessed and approved by appsec

---------

Co-authored-by: Chris Llanwarne <cjllanwarne@users.noreply.github.com>
…ail-is#15273)" (hail-is#15310)

## Change Description

Rollback localizer changes in hail-is#15273 while we diagnose and verify a fix
for smaller files / smaller VMs (potentially hail-is#15309?)

## Security Assessment

- This change potentially impacts the Hail Batch instance as deployed by
Broad Institute in GCP

### Impact Rating

- This change has a low security impact

### Impact Description

Rollback to status quo as of yesterday afternoon

### Appsec Review

- [ ] Required: The impact has been assessed and approved by appsec
…#15308)

The correct switch is `-v` for VOLUME. `--files` must have slipped
through
the review, futher proof no-one reads the comments!

This change does not impact the broad-managed batch service in gcp.
…il-is#14757)

## Change Description

Adds tools for analyzing benchmark variability and detecting minimal
slowdowns when running benchmarks on Hail Batch. The analysis is based
on methods from Laaber et al.'s paper on software microbenchmarking in
the cloud.

Key additions:
- New Jupyter notebook for exploring benchmark variability 
- Statistical analysis tools for computing confidence intervals and
detecting performance changes
- Functions for determining minimal detectable slowdowns using different
testing strategies
- Improved benchmark data import/export capabilities
- Enhanced visualization tools for benchmark results

## Security Assessment

This change has no security impact

### Impact Description

Analysis and visualization code that operates on benchmark data only,
with no access to sensitive information or systems.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.