Skip to content

Add error property to "degraded" events#6188

Merged
mcmire merged 12 commits intomainfrom
add-error-to-rpc-endpoint-degraded
Aug 12, 2025
Merged

Add error property to "degraded" events#6188
mcmire merged 12 commits intomainfrom
add-error-to-rpc-endpoint-degraded

Conversation

@mcmire
Copy link
Contributor

@mcmire mcmire commented Jul 24, 2025

Explanation

We would like to start tracking the HTTP status code in metrics when the NetworkController:rpcEndpointDegraded event is published because the maximum number of retries for a request has been exceeded. The HTTP status is stored in the error that Cockatiel captures, so we just need to include it in the event payload.

References

Closes #6190.

Checklist

@mcmire
Copy link
Contributor Author

mcmire commented Jul 24, 2025

@metamaskbot publish-previews

@github-actions
Copy link
Contributor

Preview builds have been published. See these instructions for more information about preview builds.

Expand for full list of packages and versions.
{
  "@metamask-previews/account-tree-controller": "0.6.0-preview-889b5df9",
  "@metamask-previews/accounts-controller": "32.0.0-preview-889b5df9",
  "@metamask-previews/address-book-controller": "6.1.1-preview-889b5df9",
  "@metamask-previews/announcement-controller": "7.0.3-preview-889b5df9",
  "@metamask-previews/app-metadata-controller": "1.0.0-preview-889b5df9",
  "@metamask-previews/approval-controller": "7.1.3-preview-889b5df9",
  "@metamask-previews/assets-controllers": "73.0.0-preview-889b5df9",
  "@metamask-previews/base-controller": "8.0.1-preview-889b5df9",
  "@metamask-previews/bridge-controller": "37.0.0-preview-889b5df9",
  "@metamask-previews/bridge-status-controller": "37.0.0-preview-889b5df9",
  "@metamask-previews/build-utils": "3.0.3-preview-889b5df9",
  "@metamask-previews/chain-agnostic-permission": "1.0.0-preview-889b5df9",
  "@metamask-previews/composable-controller": "11.0.0-preview-889b5df9",
  "@metamask-previews/controller-utils": "11.11.0-preview-889b5df9",
  "@metamask-previews/delegation-controller": "0.6.0-preview-889b5df9",
  "@metamask-previews/earn-controller": "4.0.0-preview-889b5df9",
  "@metamask-previews/eip1193-permission-middleware": "1.0.0-preview-889b5df9",
  "@metamask-previews/ens-controller": "17.0.1-preview-889b5df9",
  "@metamask-previews/error-reporting-service": "2.0.0-preview-889b5df9",
  "@metamask-previews/eth-json-rpc-provider": "4.1.8-preview-889b5df9",
  "@metamask-previews/foundryup": "1.0.0-preview-889b5df9",
  "@metamask-previews/gas-fee-controller": "24.0.0-preview-889b5df9",
  "@metamask-previews/json-rpc-engine": "10.0.3-preview-889b5df9",
  "@metamask-previews/json-rpc-middleware-stream": "8.0.7-preview-889b5df9",
  "@metamask-previews/keyring-controller": "22.1.0-preview-889b5df9",
  "@metamask-previews/logging-controller": "6.0.4-preview-889b5df9",
  "@metamask-previews/message-manager": "12.0.2-preview-889b5df9",
  "@metamask-previews/messenger": "0.0.0-preview-889b5df9",
  "@metamask-previews/multichain-account-service": "0.2.1-preview-889b5df9",
  "@metamask-previews/multichain-api-middleware": "1.0.0-preview-889b5df9",
  "@metamask-previews/multichain-network-controller": "0.11.0-preview-889b5df9",
  "@metamask-previews/multichain-transactions-controller": "4.0.0-preview-889b5df9",
  "@metamask-previews/name-controller": "8.0.3-preview-889b5df9",
  "@metamask-previews/network-controller": "24.0.1-preview-889b5df9",
  "@metamask-previews/notification-services-controller": "15.0.0-preview-889b5df9",
  "@metamask-previews/permission-controller": "11.0.6-preview-889b5df9",
  "@metamask-previews/permission-log-controller": "4.0.0-preview-889b5df9",
  "@metamask-previews/phishing-controller": "13.1.0-preview-889b5df9",
  "@metamask-previews/polling-controller": "14.0.0-preview-889b5df9",
  "@metamask-previews/preferences-controller": "18.4.1-preview-889b5df9",
  "@metamask-previews/profile-sync-controller": "22.0.0-preview-889b5df9",
  "@metamask-previews/rate-limit-controller": "6.0.3-preview-889b5df9",
  "@metamask-previews/remote-feature-flag-controller": "1.6.0-preview-889b5df9",
  "@metamask-previews/sample-controllers": "1.0.0-preview-889b5df9",
  "@metamask-previews/seedless-onboarding-controller": "2.4.0-preview-889b5df9",
  "@metamask-previews/selected-network-controller": "23.0.0-preview-889b5df9",
  "@metamask-previews/signature-controller": "32.0.0-preview-889b5df9",
  "@metamask-previews/token-search-discovery-controller": "3.3.0-preview-889b5df9",
  "@metamask-previews/transaction-controller": "59.0.0-preview-889b5df9",
  "@metamask-previews/user-operation-controller": "38.0.0-preview-889b5df9"
}

@mcmire mcmire changed the title WIP - Add error to NetworkController:rpcEndpointDegraded Add error property to "degraded" events Jul 24, 2025
[Data] extends [void]
? (data: AdditionalData) => void
: (data: Data & AdditionalData) => void
? (data: Data extends void ? AdditionalData : Data & AdditionalData) => void
Copy link
Contributor Author

@mcmire mcmire Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an interesting one to figure out.

The purpose of this utility type is to extend the type for a Cockatiel event with additional payload data. We need to do this because NetworkController:rpcEndpointDegraded and NetworkController:rpcEndpointAvailable are based on existing Cockatiel events but with additional properties we add.

However, this type did not properly account for when EventListener was CockatielEvent<FailureReason<unknown> | void> (which is the same thing as saying (data: void | { error: Error } | { value: unknown }) => void).

If we say the following

AddToCockatielEventData<
  (data: void | { error: Error } | { value: unknown }) => void,
  { foo: 'bar' }
>

we want this to resolve to:

(data: { foo: 'bar' } | ({ error: Error } & { foo: 'bar' }) | ({ value: unknown } & { foo: 'bar' })) => void

(Essentially, we want the void to be treated as {}.)

But instead this was resolving as:

(data: ((void | { error: Error } | { value: unknown }) & { foo: 'bar' }) => void

which distributes to:

(data: (void & { foo: 'bar' }) | ({ error: Error } & { foo: 'bar' }) | ({ value: unknown } & { foo: 'bar' })) => void

This is incorrect, because void & { foo: 'bar' } doesn't make sense.

What we want is for TypeScript to distribute void | { error: Error } | { value: unknown } over the condition. So first, despite the comment here, we don't want [Data] extends [void], because that prevents TypeScript from applying the distribution.

But the second thing is that even if we say Data extends void then TypeScript still won't apply the distribution, because the condition is actually in the wrong place. What we want is not a type union of three function types, but a single type with a type union for the arguments.

Even if the condition were in the place we wanted it, TypeScript seems to distribute the type union only in a particular circumstance. Looking a bit deeper at the TypeScript 2.8 release notes where this concept was introduced, it says: "Distributive conditional types are automatically distributed over union types during instantiation." In this case, we also need to place the condition so it becomes a part of the type that we want to "return" from this utility type and is not used to simply determine the return type. This way TypeScript evaluates the condition when that return type is used and not beforehand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change make sense.

@mcmire mcmire marked this pull request as ready for review August 11, 2025 21:41
@mcmire mcmire requested review from a team as code owners August 11, 2025 21:41
Copy link
Contributor

@cryptodev-2s cryptodev-2s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left one minor change suggestion in regards to changelog


## [Unreleased]

### Added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Changed ? because it feels a bit strange to have under Added section Update as a first keyword ?

Copy link
Contributor Author

@mcmire mcmire Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Keep a Changelog spec says that "Added" is for new features and "Changed" is for changes in existing functionality. I've always found this to be a bit lacking, though.

The way I've been treating "Changed" is that it is for changes in existing behavior or functionality that do not come with changes to the API itself. For instance, maybe a method was updated so that the logic is different but returns data in the same shape. "Added", on the other hand, is for changes that extend the API in some way, so this could be something larger like a new class, function, method, etc. but also something smaller like a new argument or a new property on a type.

I guess I could have said something like:

- `createServicePolicy`: Add `error` property to the payload for `onDegraded` ([#6188](https://github.com/MetaMask/core/pull/6188))
  - This can be used to access the error produced by the last request when the maximum number of retries is exceeded
  - This property will not be present if the degraded event merely represents a slow request

Would that sound like it fits better? Or what is your view on Added vs. Changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looks good

[Data] extends [void]
? (data: AdditionalData) => void
: (data: Data & AdditionalData) => void
? (data: Data extends void ? AdditionalData : Data & AdditionalData) => void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change make sense.

@mcmire mcmire merged commit 5e541a2 into main Aug 12, 2025
223 checks passed
@mcmire mcmire deleted the add-error-to-rpc-endpoint-degraded branch August 12, 2025 14:31
github-merge-queue bot pushed a commit to MetaMask/metamask-extension that referenced this pull request Aug 25, 2025
<!--
Please submit this PR as a draft initially.
Do not mark it as "Ready for review" until the template has been
completely filled out, and PR status checks have passed at least once.
-->

## **Description**

<!--
Write a short description of the changes included in this pull request,
also include relevant motivation and context. Have in mind the following
questions:
1. What is the reason for the change?
2. What is the improvement/solution?
-->

Currently, we only track when an Infura RPC endpoint becomes degraded or
unavailable. Now, we would like to have similar insights about custom
RPC endpoints so that we can take more informed decisions to improve
reliability for other chains. We'd also like to improve the tracking for
Infura endpoints so that we can understand failures better.

This commit updates the handlers for the
`NetworkController:rpcEndpointDegraded` and
`NetworkController:rpcEndpointUnavailable` messenger events so that they
create a Segment event regardless of the type of endpoint. The event now
includes the HTTP status code.

While making these changes, it was noticed that the sampling rate for
these Segment event was incorrect. It should have been 1%, not 10%. This
has also been corrected. This ensures that we don't store more data in
Segment and our downstream services than necessary.

[![Open in GitHub
Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/MetaMask/metamask-extension/pull/34605?quickstart=1)

## **Changelog**

<!--
If this PR is not End-User-Facing and should not show up in the
CHANGELOG, you can choose to either:
1. Write `CHANGELOG entry: null`
2. Label with `no-changelog`

If this PR is End-User-Facing, please write a short User-Facing
description in the past tense like:
`CHANGELOG entry: Added a new tab for users to see their NFTs`
`CHANGELOG entry: Fixed a bug that was causing some NFTs to flicker`

(This helps the Release Engineer do their job more quickly and
accurately)
-->

CHANGELOG entry: null

## **Related issues**

Progresses MetaMask/core#6188.

## **Manual testing steps**

1. Check out this branch, run `yarn`, run `yarn webpack --watch` (or
`yarn start`).
2. Install
[FoxyProxy](https://chromewebstore.google.com/detail/foxyproxy/gcknhkkoolaabfmlnjonogaaifnjlfnp).
Pin the extension.
3. Open the options for the extension (e.g. right-click on the extension
icon and go to Options) and add a new proxy that points to
`127.0.0.1:8080`. Don't enable it yet, though.
4. Install
[`mitmproxy`](https://docs.mitmproxy.org/stable/overview/installation/).
5. Create a Python script somewhere on your computer with the following
contents:
https://gist.github.com/mcmire/1d43ce690d3a974217126cd584f79b7d. This
script will cause all requests to Linea, ZKSync, and Flare to respond
with 500.
6. Run `mitmproxy -s <path to your script>` in an open terminal session.
This will run the proxy server.
7. Go back to FoxyProxy and enable the proxy you created earlier (click
on the icon and then just click on the new proxy).
8. Open the background / service worker.
9. Open MetaMask in full-screen mode and enable Linea.
10. Check the background / service worker window. After a few minutes or
so, you should see a line that says `Creating Segment event "RPC Service
Degraded" with
{"chain_id_caip":"eip155:59144","rpc_endpoint_url":"linea-mainnet.infura.io","http_status":500}`.
After a few more minutes, you should see `Creating Segment event "RPC
Service Unavailable" with
{"chain_id_caip":"eip155:59144","rpc_endpoint_url":"linea-mainnet.infura.io","http_status":500}`.
11. Go back to MetaMask and add ZKSync as a network. Disable Linea and
enable this one instead.
12. Check the background / service worker window. After a few minutes or
so, you should see similar "degraded" and "unavailable" lines as above,
but with the data
`{"chain_id_caip":"eip155:324","rpc_endpoint_url":"mainnet.era.zksync.io","http_status":500}`.
13. Click on FoxyProxy and disable the proxy temporarily (click on
"Disable").
14. Go back to MetaMask and add Flare as a network
(https://chainid.network/chain/14/). Use
`https://flare-api.flare.network/ext/C/rpc` as the RPC endpoint.
15. Click on FoxyProxy and re-enable the proxy.
16. Check the background / service worker window. After a few minutes or
so, you should see similar "degraded" and "unavailable" lines as above,
but with the data
`{"chain_id_caip":"eip155:14","rpc_endpoint_url":"flare-api.flare.network","http_status":500}`.

## **Screenshots/Recordings**

(N/A)

### **Before**

<!-- [screenshots/recordings] -->

### **After**

<!-- [screenshots/recordings] -->

## **Pre-merge author checklist**

- [x] I've followed [MetaMask Contributor
Docs](https://github.com/MetaMask/contributor-docs) and [MetaMask
Extension Coding
Standards](https://github.com/MetaMask/metamask-extension/blob/main/.github/guidelines/CODING_GUIDELINES.md).
- [x] I've completed the PR template to the best of my ability
- [x] I’ve included tests if applicable
- [x] I’ve documented my code using [JSDoc](https://jsdoc.app/) format
if applicable
- [ ] I’ve applied the right labels on the PR (see [labeling
guidelines](https://github.com/MetaMask/metamask-extension/blob/main/.github/guidelines/LABELING_GUIDELINES.md)).
Not required for external contributors.

## **Pre-merge reviewer checklist**

- [ ] I've manually tested the PR (e.g. pull and build branch, run the
app, test code being changed).
- [ ] I confirm that this PR addresses all acceptance criteria described
in the ticket it closes and includes the necessary testing evidence such
as recordings and or screenshots.

---------

Co-authored-by: MetaMask Bot <metamaskbot@users.noreply.github.com>
Co-authored-by: Mark Stacey <mark.stacey@consensys.net>
Co-authored-by: Gauthier Petetin <gauthierpetetin@hotmail.com>
Co-authored-by: MetaMask Bot <37885440+metamaskbot@users.noreply.github.com>
Co-authored-by: seaona <54408225+seaona@users.noreply.github.com>
Co-authored-by: Dario Anongba Varela <dario.anongba@gmail.com>
Co-authored-by: Unik0rnMaggie <128788650+Unik0rnMaggie@users.noreply.github.com>
Co-authored-by: sleepytanya <104780023+sleepytanya@users.noreply.github.com>
Co-authored-by: Guillaume Roux <guillaumeroux123@gmail.com>
Co-authored-by: Maarten Zuidhoorn <maarten@zuidhoorn.com>
Co-authored-by: OGPoyraz <omergoktugpoyraz@gmail.com>
Co-authored-by: Howard Braham <howrad@gmail.com>
Co-authored-by: Jongsun Suh <jongsun.suh@icloud.com>
Co-authored-by: aphex <52055541+wenfix@users.noreply.github.com>
Co-authored-by: Hassan Malik <41640681+hmalik88@users.noreply.github.com>
Co-authored-by: David Murdoch <187813+davidmurdoch@users.noreply.github.com>
Co-authored-by: Lwin <147362763+lwin-kyaw@users.noreply.github.com>
Co-authored-by: jvbriones <1674192+jvbriones@users.noreply.github.com>
Co-authored-by: Lionell Briones <llenoil@gmail.com>
Co-authored-by: hunty <hunter.goodreau@consensys.net>
Co-authored-by: Harika <153644847+hjetpoluru@users.noreply.github.com>
github-merge-queue bot pushed a commit to MetaMask/metamask-extension that referenced this pull request Aug 25, 2025
<!--
Please submit this PR as a draft initially.
Do not mark it as "Ready for review" until the template has been
completely filled out, and PR status checks have passed at least once.
-->

## **Description**

<!--
Write a short description of the changes included in this pull request,
also include relevant motivation and context. Have in mind the following
questions:
1. What is the reason for the change?
2. What is the improvement/solution?
-->

Currently, we only track when an Infura RPC endpoint becomes degraded or
unavailable. Now, we would like to have similar insights about custom
RPC endpoints so that we can take more informed decisions to improve
reliability for other chains. We'd also like to improve the tracking for
Infura endpoints so that we can understand failures better.

This commit updates the handlers for the
`NetworkController:rpcEndpointDegraded` and
`NetworkController:rpcEndpointUnavailable` messenger events so that they
create a Segment event regardless of the type of endpoint. The event now
includes the HTTP status code.

While making these changes, it was noticed that the sampling rate for
these Segment event was incorrect. It should have been 1%, not 10%. This
has also been corrected. This ensures that we don't store more data in
Segment and our downstream services than necessary.

[![Open in GitHub
Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/MetaMask/metamask-extension/pull/34605?quickstart=1)

## **Changelog**

<!--
If this PR is not End-User-Facing and should not show up in the
CHANGELOG, you can choose to either:
1. Write `CHANGELOG entry: null`
2. Label with `no-changelog`

If this PR is End-User-Facing, please write a short User-Facing
description in the past tense like:
`CHANGELOG entry: Added a new tab for users to see their NFTs`
`CHANGELOG entry: Fixed a bug that was causing some NFTs to flicker`

(This helps the Release Engineer do their job more quickly and
accurately)
-->

CHANGELOG entry: null

## **Related issues**

Progresses MetaMask/core#6188.

## **Manual testing steps**

1. Check out this branch, run `yarn`, run `yarn webpack --watch` (or
`yarn start`).
2. Install
[FoxyProxy](https://chromewebstore.google.com/detail/foxyproxy/gcknhkkoolaabfmlnjonogaaifnjlfnp).
Pin the extension.
3. Open the options for the extension (e.g. right-click on the extension
icon and go to Options) and add a new proxy that points to
`127.0.0.1:8080`. Don't enable it yet, though.
4. Install
[`mitmproxy`](https://docs.mitmproxy.org/stable/overview/installation/).
5. Create a Python script somewhere on your computer with the following
contents:
https://gist.github.com/mcmire/1d43ce690d3a974217126cd584f79b7d. This
script will cause all requests to Linea, ZKSync, and Flare to respond
with 500.
6. Run `mitmproxy -s <path to your script>` in an open terminal session.
This will run the proxy server.
7. Go back to FoxyProxy and enable the proxy you created earlier (click
on the icon and then just click on the new proxy).
8. Open the background / service worker.
9. Open MetaMask in full-screen mode and enable Linea.
10. Check the background / service worker window. After a few minutes or
so, you should see a line that says `Creating Segment event "RPC Service
Degraded" with
{"chain_id_caip":"eip155:59144","rpc_endpoint_url":"linea-mainnet.infura.io","http_status":500}`.
After a few more minutes, you should see `Creating Segment event "RPC
Service Unavailable" with
{"chain_id_caip":"eip155:59144","rpc_endpoint_url":"linea-mainnet.infura.io","http_status":500}`.
11. Go back to MetaMask and add ZKSync as a network. Disable Linea and
enable this one instead.
12. Check the background / service worker window. After a few minutes or
so, you should see similar "degraded" and "unavailable" lines as above,
but with the data
`{"chain_id_caip":"eip155:324","rpc_endpoint_url":"mainnet.era.zksync.io","http_status":500}`.
13. Click on FoxyProxy and disable the proxy temporarily (click on
"Disable").
14. Go back to MetaMask and add Flare as a network
(https://chainid.network/chain/14/). Use
`https://flare-api.flare.network/ext/C/rpc` as the RPC endpoint.
15. Click on FoxyProxy and re-enable the proxy.
16. Check the background / service worker window. After a few minutes or
so, you should see similar "degraded" and "unavailable" lines as above,
but with the data
`{"chain_id_caip":"eip155:14","rpc_endpoint_url":"flare-api.flare.network","http_status":500}`.

## **Screenshots/Recordings**

(N/A)

### **Before**

<!-- [screenshots/recordings] -->

### **After**

<!-- [screenshots/recordings] -->

## **Pre-merge author checklist**

- [x] I've followed [MetaMask Contributor
Docs](https://github.com/MetaMask/contributor-docs) and [MetaMask
Extension Coding
Standards](https://github.com/MetaMask/metamask-extension/blob/main/.github/guidelines/CODING_GUIDELINES.md).
- [x] I've completed the PR template to the best of my ability
- [x] I’ve included tests if applicable
- [x] I’ve documented my code using [JSDoc](https://jsdoc.app/) format
if applicable
- [ ] I’ve applied the right labels on the PR (see [labeling
guidelines](https://github.com/MetaMask/metamask-extension/blob/main/.github/guidelines/LABELING_GUIDELINES.md)).
Not required for external contributors.

## **Pre-merge reviewer checklist**

- [ ] I've manually tested the PR (e.g. pull and build branch, run the
app, test code being changed).
- [ ] I confirm that this PR addresses all acceptance criteria described
in the ticket it closes and includes the necessary testing evidence such
as recordings and or screenshots.

---------

Co-authored-by: MetaMask Bot <metamaskbot@users.noreply.github.com>
Co-authored-by: Mark Stacey <mark.stacey@consensys.net>
Co-authored-by: Gauthier Petetin <gauthierpetetin@hotmail.com>
Co-authored-by: MetaMask Bot <37885440+metamaskbot@users.noreply.github.com>
Co-authored-by: seaona <54408225+seaona@users.noreply.github.com>
Co-authored-by: Dario Anongba Varela <dario.anongba@gmail.com>
Co-authored-by: Unik0rnMaggie <128788650+Unik0rnMaggie@users.noreply.github.com>
Co-authored-by: sleepytanya <104780023+sleepytanya@users.noreply.github.com>
Co-authored-by: Guillaume Roux <guillaumeroux123@gmail.com>
Co-authored-by: Maarten Zuidhoorn <maarten@zuidhoorn.com>
Co-authored-by: OGPoyraz <omergoktugpoyraz@gmail.com>
Co-authored-by: Howard Braham <howrad@gmail.com>
Co-authored-by: Jongsun Suh <jongsun.suh@icloud.com>
Co-authored-by: aphex <52055541+wenfix@users.noreply.github.com>
Co-authored-by: Hassan Malik <41640681+hmalik88@users.noreply.github.com>
Co-authored-by: David Murdoch <187813+davidmurdoch@users.noreply.github.com>
Co-authored-by: Lwin <147362763+lwin-kyaw@users.noreply.github.com>
Co-authored-by: jvbriones <1674192+jvbriones@users.noreply.github.com>
Co-authored-by: Lionell Briones <llenoil@gmail.com>
Co-authored-by: hunty <hunter.goodreau@consensys.net>
Co-authored-by: Harika <153644847+hjetpoluru@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add error property to NetworkController:rpcEndpointDegraded

2 participants