-
Notifications
You must be signed in to change notification settings - Fork 129
Testing
yarn test # unit and jsdom tests
yarn test:puppeteer # puppeteer (docker required)
yarn test:ios # webdriverio (browserstack account required)- Vitest
- JSDOM
- React Testing Library
- Puppeteer
- Browserless
- Docker
- WebdriverIO
- Github Actions
If a bug is platform specific, put the platform in brackets at the beginning of the title. If the bug is on all platforms, the prefix can be omitted.
| Prefix | Meaning |
|---|---|
[Mobile] |
iOS / Mobile Safari / Android |
[iOS] |
iOS / Mobile Safari |
[iOS Capacitor] |
iOS Capacitor build, but not Mobile Safari |
[Android] |
Android |
[Chrome] |
Desktop Chrome |
| (no prefix) | Issue present on all platforms |
When reporting a bug, use these standard three headings: Steps to Reproduce, Current Behavior, and Expected Behavior. Describing something as "wrong", "not working", "broken", etc, is not sufficient. Broken behavior can only be understood in terms of the difference between current and expected behavior.
These headings should be populated as follows:
Describe the exact steps needed for someone else to trigger the unexpected behavior.
The current (wrong) behavior that is observed when the steps are followed. Typically this refers to the
mainbranch. (When describing a regression in a PR, this can refer to the PR branch and should be accompanied by a commit hash for clarity.This should only describe the result of following the steps. Any conditions required to observe the behavior should go in Steps to Reproduce.
The expected (intended) behavior that should occur when the steps are followed. Typically this refers to the behavior that has not yet been implemented. (When describing a regression on a PR branch, this can refer to the existing, correct behavior on
main.)Be specific.
e.g.
- NO:
Should work correctly.- NO:
Thought should be expanded.- YES:
bshould be expanded.Often the best approach is to state the expected specific behavior followed by the expected general behavior:
bshould be expanded.- Subthoughts with no siblings should be expanded.
Here's a real example from #2733:
- x - b - a - =sort - Alphabetical - Desc
- Set the cursor on
x.- Activate New Subthought Above (Meta + Shift + Enter).
- Move cursor up/down.
- Cursor up moves the cursor from the empty thought to
a.- Cursor down: Nothing happens.
- Cursor up should move the cursor from the empty thought to
x.- Cursor down should move the cursor from the empty thought to
b.
The project has multiple levels of automated testing, from single function unit tests up to realistic end-to-end (E2E) tests that run tests against an actual device or browser.
Use the lowest level that is sufficient for your test case. If your test case does not require a DOM, use a unit test. If it requires a DOM but is not browser or device-specific, use an RTL test. Higher level tests may provide a more realistic testing environment, but they are slower and, in the case of webdriverio on browserstack, cost per minute of usage.
You can find the test files spread throughout the project in __test__ directories.
⚡️⚡️⚡️ 1–20ms
Basic unit tests are great for testing pure functions directly.
Related tests: actions, selectors, util
⚡️⚡️⚡️ 1–20ms
The shortcut tests require dispatching Redux actions but do not need a DOM. You can use the helpers createTestStore and executeShortcut to operate directly on a Redux store, then make assertions about store.getState(). This allows shortcuts to be tested independently of the user device.
Related tests: shortcuts
⚡️⚡️ 1–1000ms
Anything that tests a rendered component requires a DOM. If there are no browser or device quirks, you can get away with testing against an emulated DOM (jsdom) which is cheaper and faster than a real browser.
- React Testing Library (RTL)
Related tests: components
⚡️ 1–2s
yarn test:puppeteerE2E, or End-to-End, tests involve running a real browser or device and controlling it with an automation driver. You can perform common user actions like touch, click, and type. These tests are the slowest and most expensive to run.
- puppeteer (Chrome) - Requires docker
- webdriverio (Mobile devices) - Requires a browserstack account
To run WebdriverIO tests, add (under construction)BROWSERSTACK_USERNAME=your_username and BROWSERSTACK_ACCESS_KEY=your_access_key to .env.test.local in the project root and run yarn test:e2e:ios.
Related tests: e2e
⚡️ 1–2s
Snapshot tests are a specific type of puppeteer test used to prevent visual regressions. They automate taking a screenshot on your PR branch and then comparing it to a reference screenshot in main. If the screenshot differs by a certain number of pixels, then it is considered a regression and the test will fail. In the case of a failed snapshot test, a visual diff will be generated that allows you to see why it failed.
Do not use snapshot tests for testing behavior (such as the result of a user action). Instead, select DOM elements by aria label or data-testid. Use snapshot tests for covering visual regressions such as positioning, layout, svg rendering, and general appearance of components.
In the following example, the superscript position broke so the snapshot test failed. The expected snapshot is on the left; the current snapshot is on the right.

When running the tests locally, a link to the visual diff will be output in your shell. When running the tests in GitHub Actions, the visual diff can be downloaded from the artifact link added to the test output under "Upload snapshot diff artifact":
If you are absolutely sure that the change is desired, and your PR was supposed to change the visual appearance of em, then run the snapshot test with -u to update the reference snapshot.
testFlags are used to alter runtime behavior of the app during tests. This is generally forbidden, as the automated test environment should be as close as possible to production so that it is testing the same behavior the end user sees. But there are some conditions that are difficult or impossible to create through normal user behavior (e.g. network latency) or that can enhance test readability (e.g. visualizations) when runtime alternation is warranted.
You can enable drop target visualization boxes by running em.testFlags.simulateDrop = true in the JS console or setting testFlags.simulateDrop to true in https://github.com/cybersemics/em/blob/ad173daa1d01c12003e33973f863072fdc852023/src/e2e/testFlags.ts#L18-L19.
Various test cases that may need to be tested manually.
- Enter edit mode (#1208)
- Preserve editing: true (#1209)
- Preserve editing: false (#1210)
- No uncle loop (#908)
- Tap hidden root thought (#1029)
- Tap hidden uncle (#1128-1)
- Tap empty Content (#1128-2)
- Scroll (#1054)
- Swipe over cursor (#1029-1)
- Swipe over hidden thought (#1147)
- Preserve editing on switch app (#940)
- Preserve editing clicking on child edge (#946)
- Auto-Capitalization on Enter (#999)
Test enter and leave on each of the following actions:
-
New Thought
-
New Subthought
-
Move Thought Up/Down
-
Indent/Outdent
-
SubcategorizeOne/All
-
Toggle Pin Children
-
Basic Navigation
- x - y - z - r - o - m - o - n -
Word Wrap
- a - This is a long thought that after enough typing will break into multiple lines. - forcebreakkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk - c -
Toggle Table View
- a - =view - Table - b - b1 - c - c1 -
Table View - Column 2 Descendants
- a - =view - Table - c - c1 - c2 - c3 -
Table View - Vertical Alignment
- a - =view - Table - b - b1 - b2 - b3 - c - c1 - c2 - c3- a - =view - Table - b - This is a long thought that after enough typing will break into multiple lines. - c - c1- a - =view - Table - This is a long thought that after enough typing will break into multiple lines. - b1 - b2 - c - c1- a - =view - Table - This is a long thought that after enough typing will break into multiple lines. - b1 - b2 - c - c1 -
Expand/collapse large number of thoughts at once
- one - =pinChildren - true - a - =view - Table - c - c1 - c2 - c3 - c4 - This is a long thought that after enough typing will break into multiple lines. - b1 - b2 - oof - woof - x - =pinChildren - true - y - y1 - z -
Nested Tables
- a - =view - Table - b - =view - Table - b1 - x - b2 - y
It looks like we must use fake timers if we want the
storestate to be updated based on database operations (e.g., if we useinitialize()to reload the state). I think this is because thethoughtspaceoperations are asynchronous and don't call the store operations prior to the test ending. (I'm not sure why we didn't get other errors that made this clear.)
https://github.com/cybersemics/em/pull/2741
// Use fake timers here to ensure that the store operations run after loading into the db
vi.useFakeTimers()
await initialize()
await vi.runAllTimersAsync()In the event of a flaky GitHub Actions workflow, it can be useful to manually trigger multiple runs to flush out failures. The following shell function can be used to automate this process:
ghworkflow() {
# get repo url
repo_default=$(git remote get-url origin)
workflow_default="puppeteer.yml"
branch_default=$(git rev-parse --abbrev-ref HEAD)
# prompt user for the repo
read -p "Repository: ($repo_default) " input_repo
repo=${input_repo:-$repo_default}
# prompt the user for the workflow
read -p "Workflow: ($workflow_default) " input_workflow
workflow=${input_workflow:-$workflow_default}
# prompt the user for the branch
read -p "Branch: ($branch_default) " input_branch
branch=${input_branch:-$branch_default}
# prompt the user for the number of runs
read -p "Number of runs: (10) " input_runs
runs=${input_runs:-10}
# To trigger the workflow on a PR from a fork, we need to push it to a repo we control.
git push origin "$branch"
for i in $(seq 1 $runs); do
echo "Triggering workflow run #$i..."
gh workflow run "$workflow" \
--repo "$repo" \
--ref "$branch" \
--field rerun_id="run_$i"
# avoid flooding GitHub API
sleep 1
done
}Aside: workflow_dispatch must be enabled to allow manual workflow triggers.
This is already set on all the em workflows, so you shouldn't need to worry about it.
on:
workflow_dispatch:
inputs:
rerun_id:
description: 'Optional ID for tracking repeated runs'
required: falsegit bisect performs a binary search over a range of commits between a known good state (no bug) and a known bad state (bug) to efficiently find the first commit that introduced a regression. Identifying the exact commit will provide a vital clue about the cause of the bug and will inform the solution.
Finding the beginning of the search range is somewhat arbitrary. If you know that a regression was introduced very recently, sometimes you can just go back a few weeks. Otherwise you should go back far enough to ensure that you find the good commit (before the regression was introduced). I recommend 1–2 years. It’ll quickly pare down when the search space is cut in half each time (i.e. log2 of n, where n is the number of commits). Any longer than a couple years and the codebase will have changed so much that it will be slow/difficult to install old versions of everything and recreate the environment. If the regression is that old, it probably requires approaching it as a novel bug anyway as the code has changed so much, it would be impossible to git revert.
Once you identify the good commit (hopefully on the first attempt), run git bisect good and git will take over from there, automatically checking out the next commit until it has narrowed down the source of the problem.
Your only job at each step is:
yarn install- Restart dev server if halted.
- Test for the regression.
- Run
git bisect badif the regression is still present andgit bisect goodif it is gone.
Record the commit hash it gives you at the very end and you’ve found the source of the regression! Often I take one more step of testing the bad commit again and the commit right before it (should be good) just to be extra sure. If any good/bad determination was mistaken along the way then it will throw off the whole process and the final result will not be accurate. But if you are precise and methodical, you can search through hundreds of commits in a matter of minutes to find the offending commit.
-
Avoid coupling Puppeteer tests to Redux state or other implementation details. e.g.
The use of
em.testHelpers.getStateis tightly coupling the test to various parts of the Redux state (implementation details), which we really want to avoid. It's important that integration tests behave like a normal user and do not have access to what is "under the hood."The few times we add a backdoor in existing tests are as last resorts, when there is no other way to test something. Now that we have dedicated test engineers, we need to maintain high standards and work hard to promote separation of concerns and maintainability.
https://github.com/cybersemics/em/pull/3172#discussion_r2274819907
-
No arbitrary
sleep; instead wait for a specific condition