Skip to content

fix: data race between Simulate and Commit in IAVL tree #725

@rootulp

Description

@rootulp

Summary

There is a data race between BaseApp.Simulate() and BaseApp.Commit() when they access the IAVL tree concurrently. This race is not covered by the checkStateMu mutex added in #715, which only protects the CreateQueryContext() path.

Race trace

Reproduced by running go test -v -race -run TestV2SubmitMethods -count=3 github.com/celestiaorg/celestia-app/v8/pkg/user/v2 after bumping to cosmos-sdk v0.52.2.

Read sideSimulate()getContextForTx()GetConsensusParams() reads the IAVL tree:

Read at 0x00c005c0a420 by goroutine 1190:
  github.com/cosmos/iavl.(*MutableTree).Get()
      mutable_tree.go:174
  cosmossdk.io/store/iavl.(*Store).Get()
      iavl/store.go:209
  cosmossdk.io/store/cachekv.(*Store).Get()
      cachekv/store.go:61
  cosmossdk.io/store/gaskv.(*Store).Get()
      gaskv/store.go:37
  github.com/cosmos/cosmos-sdk/runtime.coreKVStore.Get()
      runtime/store.go:65
  cosmossdk.io/collections.Map[...].Get()
      map.go:81
  cosmossdk.io/collections.Item[...].Get()
      item.go:31
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).GetConsensusParams()
      baseapp/baseapp.go:552
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).getContextForTx()
      baseapp/baseapp.go:694
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).runTx()
      baseapp/baseapp.go:892
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Simulate()
      baseapp/test_helpers.go:28

Write sideCommit() writes to the same IAVL tree:

Previous write at 0x00c005c0a420 by goroutine 505:
  github.com/cosmos/iavl.(*MutableTree).SaveVersion()
      mutable_tree.go:778
  cosmossdk.io/store/iavl.(*Store).Commit()
      iavl/store.go:135
  cosmossdk.io/store/rootmulti.commitStores()
      rootmulti/store.go:1193
  cosmossdk.io/store/rootmulti.(*Store).Commit()
      rootmulti/store.go:489
  github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Commit()
      baseapp/abci.go:968

Root cause

PR #715 added checkStateMu to protect the CreateQueryContext() path from racing with Commit(). However, the Simulate() path goes through a different code path:

Simulate() → runTx(execModeSimulate) → getContextForTx() → GetConsensusParams()

getContextForTx() creates a context from checkState and then reads the IAVL store (via GetConsensusParams()) without holding checkStateMu. Meanwhile, Commit() holds checkStateMu while calling cms.Commit() which writes to the same IAVL tree via SaveVersion().

Suggested fix

Extend checkStateMu to also protect the Simulate() path, or at minimum protect the getContextForTx() / runTx() calls when they read from the IAVL store during simulation. For example, acquiring a read lock on checkStateMu in Simulate() would prevent it from running concurrently with Commit().

Upstream reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions