Skip to content

feat: async recheck support #1062

Open
technicallyty wants to merge 2 commits intomainfrom
technicallyty/STACK-2402-krakatoa-recheck
Open

feat: async recheck support #1062
technicallyty wants to merge 2 commits intomainfrom
technicallyty/STACK-2402-krakatoa-recheck

Conversation

@technicallyty
Copy link
Contributor

@technicallyty technicallyty commented Mar 10, 2026

Description

we recently updated comet to no longer lock on recheck, pushing concurrency responsibility to the application.

changes:

  • route CheckTx through the app-side mempool insert worker instead of BaseApp.runTx
  • fix Cosmos rechecked tx tracking to replace entries by signer/nonce identity rather than pointer identity. the previous code was not allowing cosmos fee replacement txs

Closes: STACK-2455


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • tackled an existing issue or discussed with a team member
  • left instructions on how to review the changes
  • targeted the main branch

@linear
Copy link

linear bot commented Mar 10, 2026

@technicallyty technicallyty changed the title recheck support feat: async recheck support Mar 10, 2026
Comment on lines +17 to +20
// TODO: do we even do recheck anymore?
if request.Type == abci.CheckTxType_Recheck {
return &abci.ResponseCheckTx{Code: abci.CodeTypeOK}, nil
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can expect this to never be called with this mempool, yes? cc @mattac21

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@aljo242
Copy link
Contributor

aljo242 commented Mar 10, 2026

@greptile whats the scoop man

@technicallyty
Copy link
Contributor Author

@greptile

@vladjdk
Copy link
Member

vladjdk commented Mar 10, 2026

@greptile hello

@greptile-apps
Copy link

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adapts the EVM mempool's CheckTx handler to a new CometBFT version that no longer holds the global lock during recheck, delegating concurrency management to the application. The two main changes are:

  • check_tx.go: NewCheckTxHandler now routes all new CheckTx requests through the same async insert worker (evmQueue/cosmosQueue) used by the rest of the app, bypassing BaseApp.runTx entirely. Recheck requests are short-circuited and always return OK. The debug parameter is now threaded through from app.Trace().
  • tx_store.go: CosmosTxStore gains a secondary keys map[string]int (keyed by signer-address/nonce) so that Cosmos tx replacement is matched by semantic identity (same signer + nonce) rather than Go pointer equality. This fixes a bug where rechecked cosmos fee-replacement txs were not being recognized as replacements.

Issues found:

  • The success ResponseCheckTx always returns GasWanted: 0 and GasUsed: 0, which can disable CometBFT's mempool-level gas cap if the chain does not operate in exclusive mode.
  • The blocking receive on errC in the handler (<-errC) has no context propagation or timeout, unlike the Insert method which uses a select/ctx.Done() pattern — a stalled insert worker would hang CheckTx indefinitely.
  • A // TODO: do we even do recheck anymore? comment is shipped in production code with no resolution.
  • The new keys map in CosmosTxStore has no corresponding cleanup path; if eviction is ever added, the map will need a RemoveTx counterpart.

Confidence Score: 2/5

  • This PR has a meaningful behavioral regression risk: GasWanted always returning 0 from CheckTx can silently disable CometBFT's mempool gas cap, and the blocking errC receive lacks context/timeout handling.
  • The core async-routing logic is sound and well-tested. However, two issues in check_tx.go reduce confidence: (1) the success response omits GasWanted/GasUsed, which breaks CometBFT's mempool-level gas accounting unless the chain is always in exclusive mode — this assumption is not enforced or documented; (2) the blocking <-errC receive has no timeout or context cancellation, unlike every other insertion site in the codebase. An unresolved TODO in production code further signals incomplete design work.
  • mempool/check_tx.go requires the most attention — specifically the GasWanted omission and the missing context propagation on the errC block.

Important Files Changed

Filename Overview
mempool/check_tx.go Core behavioral change: CheckTx now routes through the async insert worker instead of runTx. Recheck is a no-op (OK returned unconditionally). Three issues: unresolved TODO, no context propagation when blocking on errC, and GasWanted/GasUsed always returns 0 which can break CometBFT mempool gas accounting.
mempool/tx_store.go Adds a secondary keys map (signer/nonce → slice index) to allow Cosmos tx replacement by identity rather than pointer. Logic is correct for the add/replace path. The keys map has no corresponding cleanup in a RemoveTx path (none exists), which could lead to unbounded growth, but is not an immediate bug.
evmd/mempool.go Single-line change: passes app.Trace() as the debug flag to NewCheckTxHandler, correctly propagating the application's trace setting to error responses.
mempool/check_tx_test.go New test file with comprehensive coverage: EVM CheckTx (insert, duplicate, fee replacement, underpriced replacement, nonce gap, queue full, malformed bytes) and Cosmos CheckTx (insert, single/multi-signer fee replacement). Also verifies recheck is a no-op. Good coverage of the new handler behavior.
mempool/mempool_test.go Refactors test setup helpers: extracts setupMempool with configurable insertQueueSize and initial nonces, adds setupMempoolWithAccountNonces and setupMempoolWithInsertQueueSize variants, registers bank types for Cosmos tx encoding, and adds fee-parameterized tx builders. No logic issues.
go.mod Updates cometbft replace directive to a newer pre-release commit (d939d1e7c932) that removes the recheck lock. Also bumps golang.org/x/sync to v0.20.0, google.golang.org/grpc to v1.79.2, and sasha-s/go-deadlock to v0.3.7.

Sequence Diagram

sequenceDiagram
    participant C as CometBFT
    participant H as CheckTxHandler
    participant Q as InsertQueue (async worker)
    participant EVM as LegacyPool (EVM)
    participant COS as RecheckMempool (Cosmos)

    C->>H: RequestCheckTx (New)
    H->>H: Decode tx bytes
    alt Decode error
        H-->>C: ResponseCheckTx (error code)
    else Decoded OK
        H->>Q: insert(ctx, tx)
        note over Q: EVM tx → evmQueue.Push()<br/>Cosmos tx → cosmosQueue.Push()
        Q-->>H: errC (channel)
        H->>H: block on <-errC (no timeout)
        Q->>EVM: txPool.Add() [EVM path]
        Q->>COS: insertCosmosTx() [Cosmos path]
        alt Insert error
            Q->>H: errC ← error
            H-->>C: ResponseCheckTx (error code)
        else Insert OK
            Q->>H: errC ← nil
            H-->>C: ResponseCheckTx (OK, GasWanted=0)
        end
    end

    C->>H: RequestCheckTx (Recheck)
    H-->>C: ResponseCheckTx (OK, no-op)
Loading

Last reviewed commit: 4e043cf

Data: result.Data,
Events: types.MarkEventsToIndex(result.Events, nil),
}, nil
return &abci.ResponseCheckTx{Code: abci.CodeTypeOK}, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GasWanted / GasUsed always return 0 — breaks CometBFT mempool gas accounting

The previous handler forwarded the gas values computed by runTx:

return &abci.ResponseCheckTx{
    GasWanted: int64(gInfo.GasWanted),
    GasUsed:   int64(gInfo.GasUsed),
    ...
}

The new handler always returns GasWanted: 0 and GasUsed: 0. CometBFT uses GasWanted from ResponseCheckTx to enforce the block-level mempool gas cap (max_gas in config). When every transaction reports wanting 0 gas, CometBFT's gas cap is never triggered, which effectively disables that safeguard for the entire mempool.

This is safe only if the mempool always operates in exclusive mode (CometBFT's own clist-mempool is disabled). If it is possible for the app to run with OperateExclusively = false, clients and validators will see incorrect gas estimates from CheckTx and the CometBFT-side gas cap will not function.

At a minimum, the function should document that GasWanted is intentionally omitted and under what assumptions this is valid. If the gas values are still meaningful, consider computing them from the decoded tx's declared gas limit before the async insert:

// For EVM transactions the gas limit is embedded in the transaction itself.
// Return it so that CometBFT's mempool gas cap still fires correctly.
return &abci.ResponseCheckTx{Code: abci.CodeTypeOK, GasWanted: int64(gasLimit)}, nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile gas values aren't populated in regular checktx anyways. this is not a regression

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the diff, the old code explicitly called runTx and used the returned gInfo.GasWanted and gInfo.GasUsed values in the response:

gInfo, result, anteEvents, err := runTx(request.Tx, nil)
// ...
return &abci.ResponseCheckTx{
    GasWanted: int64(gInfo.GasWanted),
    GasUsed:   int64(gInfo.GasUsed),
    ...
}

So the old handler was populating gas values from runTx. Are you saying that in practice runTx was returning 0 for those fields, or that there's a different reason they weren't meaningful?

If the gas values were genuinely unused or always zero before, then you're right this isn't a regression. But if runTx was returning valid gas values, then this change does remove that information from CheckTx responses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true that the values were always zero? I thought you do get gas wanted and used values back from checktx?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not typically true in standard Cosmos SDK implementations. Looking at the old code:

gInfo, result, anteEvents, err := runTx(request.Tx, nil)
// ...
return &abci.ResponseCheckTx{
    GasWanted: int64(gInfo.GasWanted),
    GasUsed:   int64(gInfo.GasUsed),
    ...
}

In a typical Cosmos SDK app during CheckTx:

  • GasWanted is set to the transaction's declared gas limit
  • GasUsed is the actual gas consumed by the ante handlers (signature verification, fee deduction, etc.)

These values are normally non-zero and are used by:

  1. CometBFT's mempool gas cap (max_gas in config)
  2. Clients to get accurate gas estimates from CheckTx queries
  3. Fee estimation and transaction simulation

However, there could be something specific to your EVM implementation where:

  • The ante handlers don't consume gas the same way
  • Gas tracking is handled differently for EVM transactions
  • The values genuinely weren't being set correctly before

If runTx was genuinely returning zero values in practice, that would indicate a pre-existing issue with your gas accounting. But based on standard Cosmos SDK behavior, the old code should have been returning valid gas values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile why would those values be populated if checkTx is just a mempool admission api and not a tx execution check?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In standard Cosmos SDK, CheckTx does execute the ante handlers (signature verification, fee deduction, sequence checks, etc.), which:

  1. Consume gas - AnteHandlers use gas for crypto operations, state reads, etc. That's the GasUsed value
  2. Validate the transaction - They check if the tx is properly formatted, has sufficient fees, valid signatures, etc.
  3. Return the declared gas limit - That's GasWanted, read from the transaction itself

So CheckTx is more than just "should we admit this to the mempool?" - it's "can this transaction pass ante handlers?" The gas values from that execution are used for:

  • CometBFT's mempool gas cap - prevents the mempool from accepting too many high-gas transactions
  • Client gas estimation - wallets query CheckTx to estimate gas before broadcasting
  • Fee market calculations - some chains use CheckTx gas usage for dynamic fee calculations

In your new implementation:

You're bypassing runTx entirely and just routing to the async insert worker. If the insert worker doesn't run ante handlers synchronously, then you're right - there's no gas consumed to report.

However, GasWanted (the declared gas limit) is still available from the decoded transaction and is still meaningful for CometBFT's mempool gas cap. You could extract it from the decoded tx before the async insert if you want to preserve that safeguard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile cometbft reduces the fields in the response anyway: https://github.com/cometbft/cometbft/blob/1bb8b386fc366bc4655dede0535e16d1ad669c7d/rpc/core/mempool.go#L58-L64. gas has never been returned here

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, my apologies. I can see that:

  1. OperateExclusively = true is hardcoded in your codebase (evmd/mempool.go:19)
  2. You're using SetInsertTxHandler and SetReapTxsHandler, which completely bypasses CometBFT's clist-mempool
  3. The RPC endpoint does strip gas fields before returning to clients

When operating exclusively, CometBFT's internal mempool is disabled and never sees or uses the GasWanted values from ResponseCheckTx. All gas accounting is handled by your app-side ExperimentalEVMMempool.

So you're correct - this change doesn't break anything that was previously working. The gas values were never meaningfully used in CheckTx for your setup.

Copy link
Contributor

@mattac21 mattac21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also rebase this off of main? the tests are pretty broken on feat/krakatoa but if you go to main you should be able to get all of the system unit and integration tests passing


ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
errC, err := mempool.insert(ctx, tx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mempool.Insert function already waits on the errC or ctx to be done like you are doing here. could we use that instead of using the private mempool.insert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data: result.Data,
Events: types.MarkEventsToIndex(result.Events, nil),
}, nil
return &abci.ResponseCheckTx{Code: abci.CodeTypeOK}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true that the values were always zero? I thought you do get gas wanted and used values back from checktx?

Comment on lines 18 to +19
index map[sdk.Tx]int
keys map[string]int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys is now serving the same purpose as index right? we can remove index now I think

s.mu.Lock()
defer s.mu.Unlock()

if key, ok := cosmosTxKey(tx); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm thinking through this, im not actually sure this replacement is safe to do without rechecking all txs for this account, when we replace a tx we may want to just remove all txs > replaced nonce from the tx store for this account and then let them just be included in the next block.

I think this isn't safe because we would need to recheck all txs after the replaced one onto of the state of this new tx, which we are not doing. for example if we recheck txs 4 5 and 6 of an an account and include them in the tx store, then someone can replace tx 4 with a completely different tx that may have invalidated 5 and 6, but we are not rechecking those against the new tx 4's context, which may then cause the proposal to be invalid.

I think this is an issue for evm txs in the legacypool too that we need to address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on a separate PR to address this issue in both evm and cosmos tx stores

select {
case err := <-errC:
if err != nil {
return sdkerrors.ResponseCheckTxWithEvents(err, gInfo.GasWanted, gInfo.GasUsed, anteEvents, false), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were the anteEvents also always nil here? we are missing out on events now as well with this? should we modify the response of mempool.Insert to return some of this info?

Copy link
Contributor Author

@technicallyty technicallyty Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i checked on v0.53.x and main, calling broadcast tx sync never returned anything other than code, tx hash, and a log if it failed.

you can see comet stripping down the response here: https://github.com/cometbft/cometbft/blob/1bb8b386fc366bc4655dede0535e16d1ad669c7d/rpc/core/mempool.go#L58-L64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, tested on gaia as well earlier and this was the case too

@technicallyty technicallyty changed the base branch from feat/krakatoa to main March 11, 2026 17:17
@technicallyty technicallyty changed the base branch from main to feat/krakatoa March 11, 2026 17:17
@technicallyty technicallyty force-pushed the technicallyty/STACK-2402-krakatoa-recheck branch from f7e5c47 to c54ef7b Compare March 11, 2026 17:55
@technicallyty technicallyty changed the base branch from feat/krakatoa to main March 11, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants