Upstream generic APIs needed to finish BubbleTeaCI migration without ad hoc wrappers

## Summary

To finish the BubbleTeaCI migration on top of `tensor4all-rs` without app-layer hacks, we need a small set of **generic upstream capabilities** in the tensor/network/grid stack.

This issue is intentionally **not** asking to upstream BubbleTeaCI-specific abstractions such as `TTFunction`, `BasicContractOrder`, or domain-specific physics code.

The goal is to add the missing **general-purpose primitives / ergonomic wrappers** so that a downstream BubbleTeaCI-like library can be implemented:

- without dense-materialization fallbacks for core operations,
- without manual site-index rewriting in app code,
- without hardcoding one particular quantics site layout in downstream lowering logic,
- without reimplementing grid metadata operations downstream.

Related:

- #404

## Migration target

The relevant BubbleTeaCI-side use cases are the "basic" ones:

1. Grid metadata and layout-aware operations
2. `TTFunction`-like functions backed by `TreeTN`
3. Addition / contraction / truncation
4. Quantics transforms (`shift`, `affine`, `fourier`, related linear operators)
5. Quantics TCI as the constructor path for gridded functions

I am **not** including AD-related work here.

## Design boundary

The right split, as I see it, is:

### Should live upstream in `tensor4all-rs` / `quanticsgrids-rs`

- Generic grid metadata/layout operations
- Grid-aware quantics operator construction
- General partial site contraction primitives
- Vector/tensor-valued quantics TCI
- Site-space alignment helpers for `TreeTN`

### Should stay downstream in BubbleTeaCI / Tensor4all.jl

- `TTFunction` wrapper type
- `BasicContractOrder` user-facing semantics
- Variable-name DSLs and physics-facing APIs
- `times_dV` policy / application semantics

## Current downstream hacks this issue is trying to eliminate

These are examples of the kinds of workarounds currently needed downstream:

1. **Dense fallback for multi-variable transforms**
   - multi-dimensional `shift` / `affine` must currently materialize to dense values and resample there instead of staying in TT/TreeTN form.

2. **Manual operator-space rewrites**
   - downstream code has to manually rewrite operator `true_index` mappings before applying a quantics operator to a state.

3. **Manual packed tensor construction for component-valued functions**
   - downstream code currently has to pack grid + component axes into a dense tensor, then factorize it into a `TreeTN`.

4. **Downstream-only contraction lowering**
   - downstream code has to convert a semantic contraction description ("multiply these variables, contract those variables, keep these external legs, order result like this") into raw site positions itself.

5. **Downstream grid wrappers**
   - downstream code currently reimplements grid helpers such as projection / refinement / structural equality.

The rest of this issue is a concrete list of what should be added or improved upstream so those hacks disappear.

## Required work items

## 1. Grid/layout API: make `DiscretizedGrid` usable as a stable layout descriptor

This is foundational. Downstream code should not need to reverse-engineer site layout from raw `rs` arrays and implicit unfolding assumptions.

### Requested capabilities

1. Add structural `PartialEq` for `quanticsgrids::DiscretizedGrid`
   - Already tracked separately in #404.

2. Expose stable variable-to-site layout queries
   - Need a public way to ask:
     - which site positions belong to variable `v`
     - what the local dimensions of those sites are
     - how this depends on unfolding scheme

3. Add the core metadata-preserving grid operations now reimplemented downstream
   - `project_grid`
   - `reduce_grid`
   - `refine_grid`
   - `rename_variables`
   - a "grid-after-contraction" helper
   - a "grid-after-fourier-transform / site-block reversal" helper if that transformation is expected to stay layout-aware

4. Make layout semantics explicit in public API
   - In particular, downstream code needs to distinguish at least:
     - grouped / unfused-per-variable layouts
     - interleaved / fused-per-bit layouts

### Why this matters

Without this, downstream contraction/transform code must hardcode layout assumptions. That is brittle and pushes a backend concern into the app layer.

### Likely home

This probably belongs primarily in `quanticsgrids-rs`, but it should be tracked here because it directly blocks `tensor4all-rs`-based migration work.

## 2. Quantics transforms: add grid-aware, grouped-aware multi-variable operators

`tensor4all-quanticstransform` already has useful low-level pieces, but the current API shape is still too close to one specific site encoding.

Examples:

- `shift_operator(r, ...)`
- `shift_operator_multivar(r, ..., nvariables, target_var)`
- `affine_operator(r, params, bc)`

These are good primitives, but they are not yet enough for a clean BubbleTeaCI migration.

### Missing capability

Downstream needs to apply transforms to `TreeTN` states whose site layout comes from a `DiscretizedGrid`, including grouped layouts, without first densifying the function.

### Requested capabilities

1. Add grid-aware operator constructors
   - Possible API direction:

```rust
pub fn shift_operator_on_grid(
    grid: &DiscretizedGrid,
    offsets: &[i64],
    bc: &[BoundaryCondition],
    vars: &[usize],
) -> Result<QuanticsOperator>;

pub fn affine_operator_on_grid(
    grid: &DiscretizedGrid,
    params: &AffineParams,
    bc: &[BoundaryCondition],
    input_vars: &[usize],
    output_vars: &[usize],
) -> Result<QuanticsOperator>;
```

I am not attached to these exact signatures. The important part is: **operator construction should be expressed in terms of the actual grid/layout**, not just raw `(r, nvariables)` assumptions.

2. Support grouped layouts as a first-class case
   - Downstream should not have to fall back to dense for grouped `shift` / `affine`.

3. Support "operate on subset of variables, leave all other sites untouched"
   - This is needed for BubbleTeaCI-style transforms where only continuous-variable sites change and component/index legs must remain untouched.

4. Keep using the existing `LinearOperator` mapping model
   - `LinearOperator::set_input_space_from_state` and `set_output_space_from_state` already exist and are useful.
   - The new transform constructors should compose cleanly with that API.

### Why this matters

This is the main blocker behind current dense fallback in downstream transform code.

## 3. Quantics TCI: support vector/tensor-valued outputs directly

Current `quanticscrossinterpolate(...)` is fundamentally scalar-valued:

```rust
pub fn quanticscrossinterpolate<V, F>(
    grid: &DiscretizedGrid,
    f: F,
    initial_pivots: Option<Vec<Vec<i64>>>,
    options: QtciOptions,
) -> Result<(QuanticsTensorCI2<V>, Vec<usize>, Vec<f64>)>
where
    F: Fn(&[f64]) -> V + 'static
```

This is good for scalar functions, but BubbleTeaCI functions are often naturally vector- or tensor-valued.

### Missing capability

Downstream currently has to:

1. sample all component values densely,
2. pack them into a combined dense tensor,
3. factorize that tensor back into a `TreeTN`.

That defeats the point of having a TT-native constructor path.

### Requested capabilities

At least one of the following upstream approaches is needed:

1. **Tensor-valued / shaped QTCI**
   - Accept a callback that returns multiple components and a declared output shape.

2. **Component-wise QTCI helper**
   - Build one TT per component and then combine them into a single `TreeTN` / chain layout with explicit component-leg placement metadata.

3. **Direct `TreeTN` constructor path for shaped outputs**
   - Allow QTCI to output a `TreeTN` whose extra site(s) represent output components.

### Possible API direction

```rust
pub fn quanticscrossinterpolate_tensor<V, F>(
    grid: &DiscretizedGrid,
    output_shape: &[usize],
    f: F,
    initial_pivots: Option<Vec<Vec<i64>>>,
    options: QtciOptions,
) -> Result<(TreeTN<TensorDynLen, usize>, TensorOutputLayout, Vec<usize>, Vec<f64>)>
where
    F: Fn(&[f64]) -> Vec<V> + 'static;
```

Again, the exact signature is less important than the capability:

- vector/tensor-valued function input,
- output component layout described explicitly,
- no forced dense detour downstream.

### Why this matters

A TT-backed `TTFunction` wrapper downstream is only clean if the backend can construct component-valued functions directly.

## 4. TreeTN partial contraction API: express sitewise multiply/contract/external/result-order directly

This is the largest structural gap.

BubbleTeaCI contraction is not just "contract two full networks". It needs a generic primitive of the form:

- these site groups are identified / multiplied,
- these site groups are contracted away,
- these site groups stay external,
- the output site order is prescribed.

This is a backend concern once expressed in terms of site positions / site IDs. The semantic layer (`BasicContractOrder`, variable-name DSLs) can stay downstream.

### Requested capability

Add a generic sitewise partial-contraction primitive to `tensor4all-treetn`.

### Possible API direction

```rust
pub struct SiteContractionSpec<V, I> {
    pub multiply_pairs: Vec<(I, I)>,
    pub contract_pairs: Vec<(I, I)>,
    pub external_a: Vec<I>,
    pub external_b: Vec<I>,
    pub result_order: Vec<SiteSource<I>>,
    pub center: V,
}

pub fn contract_sitewise<T, V>(
    a: &TreeTN<T, V>,
    b: &TreeTN<T, V>,
    spec: &SiteContractionSpec<V, <T::Index as IndexLike>::Id>,
    options: ContractionOptions,
) -> Result<TreeTN<T, V>>;
```

The exact type design can differ, but the backend needs a primitive with this expressive power.

### Important constraints

1. This should work for chain topologies first, but if possible the abstraction should not hardcode MPS-only assumptions into the public API.
2. Result site ordering must be explicit.
3. The API should be site/index based, not variable-name based.

### Why this matters

Without this primitive, downstream has to own a large amount of contraction lowering logic that really belongs close to TreeTN contraction and site-space manipulation.

## 5. TreeTN site-space alignment helpers: remove downstream manual reindexing before `add` / transforms

There are already useful low-level APIs:

- `replaceind`
- `replaceinds`
- `share_equivalent_site_index_network`
- `LinearOperator::{set_input_space_from_state, set_output_space_from_state}`

These are good building blocks, but downstream still needs higher-level ergonomic wrappers.

### Requested capabilities

1. Add a helper to align one network's site indices to another network with equivalent site space
   - Possible direction:

```rust
pub fn reindex_site_space_like<T, V>(
    source: &TreeTN<T, V>,
    template: &TreeTN<T, V>,
) -> Result<TreeTN<T, V>>;
```

2. Add an aligned-add helper
   - Possible direction:

```rust
pub fn add_aligned(&self, other: &Self) -> Result<Self>;
```

This would:
- verify equivalent site-index network,
- reindex `other` if needed,
- then call the existing `add`.

### Why this matters

BubbleTeaCI-style addition wants "same logical layout" semantics, not "the caller must manually rewrite site IDs before addition".

## 6. Keep existing operator-space alignment APIs, but document them as the recommended path

This is not a new blocker, but worth calling out.

`LinearOperator::set_input_space_from_state` and `set_output_space_from_state` already exist and are exactly the sort of generic ergonomic helpers downstream needs.

### Requested improvement

1. Keep these APIs stable.
2. Document them prominently in transform/operator docs as the intended way to align operator external site indices with a target state.
3. Where new grid-aware transform constructors are added, make sure examples use these methods.

### Why this matters

This is one place where the backend already has the right abstraction. New work should build on it rather than downstream reimplementing the same logic.

## Concrete acceptance criteria

I would consider this umbrella issue complete when the following can be implemented downstream **without ad hoc backend workarounds**:

1. **Vector/tensor-valued TCI constructor path**
   - Build a `TreeTN`-backed function directly from a vector- or matrix-valued callback on a `DiscretizedGrid`.

2. **Grouped-layout multi-variable transforms**
   - Apply `shift` / `affine`-style operators to grouped-layout grids without dense-materialization fallback.

3. **Partial contraction with explicit output ordering**
   - Support contractions of the form
     - multiply some variables,
     - contract others,
     - preserve others,
     - and prescribe output variable/index order
   - all expressed downstream in semantic terms but lowered onto a generic backend sitewise contraction primitive.

4. **Addition on equivalent logical site layouts**
   - Downstream should not have to manually reindex site spaces before adding two logically equivalent functions.

5. **Grid metadata operations available upstream**
   - downstream should not have to own its own `project_grid` / `reduce_grid` / `refine_grid` / `rename_variables` implementation.

## Explicit non-goals

This issue is **not** asking for:

- upstream `TTFunction`
- upstream `BasicContractOrder`
- upstream BubbleTeaCI-specific variable-name DSLs
- upstream physics/domain logic
- AD-related features

## Suggested implementation decomposition

If someone wants to attack this incrementally, this seems like a reasonable order:

1. `quanticsgrids-rs`: grid equality + layout queries + grid metadata ops
2. `tensor4all-quanticstransform`: grouped/grid-aware multi-variable transform constructors
3. `tensor4all-quanticstci`: vector/tensor-valued interpolation path
4. `tensor4all-treetn`: partial site contraction primitive
5. `tensor4all-treetn`: site-space alignment / aligned-add ergonomics

## Why I am opening this as one umbrella issue

These items are tightly coupled from a downstream migration perspective:

- if transforms remain layout-specific, downstream needs dense fallback,
- if TCI remains scalar-only, downstream needs dense packing/factorization,
- if partial contraction remains app-owned, downstream has to hardcode site-lowering logic,
- if grid metadata ops remain downstream, app code keeps duplicating backend semantics.

So while the actual implementation may land as several PRs and even a few dependency-level changes, the migration blocker is one coherent backend gap.


Upstream generic APIs needed to finish BubbleTeaCI migration without ad hoc wrappers #405

Description

Summary

Migration target

Design boundary

Should live upstream in tensor4all-rs / quanticsgrids-rs

Should stay downstream in BubbleTeaCI / Tensor4all.jl

Current downstream hacks this issue is trying to eliminate

Required work items

1. Grid/layout API: make DiscretizedGrid usable as a stable layout descriptor

Requested capabilities

Why this matters

Likely home

2. Quantics transforms: add grid-aware, grouped-aware multi-variable operators

Missing capability

Requested capabilities

Why this matters

3. Quantics TCI: support vector/tensor-valued outputs directly

Missing capability

Requested capabilities

Possible API direction

Why this matters

4. TreeTN partial contraction API: express sitewise multiply/contract/external/result-order directly

Requested capability

Possible API direction

Important constraints

Why this matters

5. TreeTN site-space alignment helpers: remove downstream manual reindexing before add / transforms

Requested capabilities

Why this matters

6. Keep existing operator-space alignment APIs, but document them as the recommended path

Requested improvement

Why this matters

Concrete acceptance criteria

Explicit non-goals

Suggested implementation decomposition

Why I am opening this as one umbrella issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Should live upstream in `tensor4all-rs` / `quanticsgrids-rs`

1. Grid/layout API: make `DiscretizedGrid` usable as a stable layout descriptor

5. TreeTN site-space alignment helpers: remove downstream manual reindexing before `add` / transforms