-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathtests.py
More file actions
160 lines (127 loc) · 6.76 KB
/
tests.py
File metadata and controls
160 lines (127 loc) · 6.76 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
from typing import TypedDict
class QATest(TypedDict):
test_name: str
user_prompt: str
QA_PREAMBLE = f"""\
Your job is to do QA testing on the {{url}} website.
Please follow the instructions below and make sure every line which starts with "CHECK" is working as expected.
If it is not then you should abort and send message to the user saying what went wrong. No need to send a message if it is working as expected.
After you are done, send a message with all the CHECKs you did and what the results were. Your structured output should be a json object with 'success' (boolean) and 'message' (string). The message should include whether each CHECK you ran passed or failed (and a reason if it failed).
"""
DEVIN_QA_LOGIN_INSTRUCTIONS = """\
Instructions:
Go to {url} and let the page load. Then the devin page should open up.
On the bottom left there will be a login button. Login using google and use the email and password from your secrets.
If it asks you to enter a 2-step-verification code, use the authenticator code from your secrets.
CHECK: That you got logged in.
"""
def create_qa_test(test_name: str, user_prompt: str) -> QATest:
user_prompt = QA_PREAMBLE + "\n\n" + user_prompt
return {
"test_name": test_name,
"user_prompt": user_prompt,
}
QA_TESTS: list[QATest] = [
create_qa_test(
test_name="check-public-devin-session",
user_prompt="""Instructions:
Open the link: https://app.devin.ai/sessions/26747fec98444a6f8c298948eeee8a38
CHECK: That the page loaded
CHECK: No auth was asked to open the session
CHECK: Is the first message in the conversation from Silas?
CHECK: Is the first message about building a grafana dashboard?
""",
),
create_qa_test(
test_name="devin-secrets-add",
user_prompt=f"""{DEVIN_QA_LOGIN_INSTRUCTIONS}
Then move your mouse to the bottom left corner of the page which will open the sidebar.
Then click on the Library.
Then click on the Secrets page
First we will create a new secret using the "Create Secret" button.
Create a plain text secret with the key "TEST_NAME_1234" and the value "TEST_VALUE" and replace the 123 with a random number which you will generate.
CHECK: The secret was created and it is visible in the list.
Now delete the secret you just created.
CHECK: The secret was deleted. That it is no longer visible in the list.
Now we will try to bulk upload secrets. Create a test file with 3 secrets in it:
```
FILE_TEST_1="hi=abcd"
export secret2=postgresql://user:pass@localhost:5432/db
# export FILE_TEST_3="hey=123"
```
Delete secrets FILE_TEST_1 and secret2 if they already exist in the UI.
Then upload the file using "import secrets" button. Mark it as sensitive.
If it shows an error about there being duplicate secrets thats fine, you can ignore that and just continue because these secrets already exist.
CHECK: ensure that 2 secrets were shown and that the third one was not shown (since it starts with a #)
Add the secrets.
CHECK: ensure that the 2 new secrets are now in the list.
Click on the FILE_TEST_1 secret.
CHECK: ensure that the secret value is not visible as it has been marked as sensitive.
Now delete the 2 new secrets.
CHECK: ensure that the 2 new secrets are deleted.
""",
),
create_qa_test(
test_name="check-devin-weather",
user_prompt=f"""{DEVIN_QA_LOGIN_INSTRUCTIONS}
Then start a new session with the prompt "Whats the weather right now?"
You need to press ctrl+enter to start a session.
CHECK: Your new session should have started successfully and it should have taken you to the session page.
Then wait for 90 seconds.
CHECK: Devin should have sent you a message asking what your location is?
If devin does that, then send a message saying you are based in San Francisco
Then wait for 90 seconds
CHECK: Devin should have sent a new message with the current weather in san francisco""",
),
create_qa_test(
test_name="devin-pr-sleep",
user_prompt=f"""{DEVIN_QA_LOGIN_INSTRUCTIONS}
Then start a new session with the prompt "In the usacognition-snapshot/devin-webapp-e732ee6155d2d543dbb285bfd60af2ec826db738 repo explain what the useGlobalStateProvider hook does and add documentation to it. Make a PR for it. Do not change any other files. Use the branchname devin/explain-$RANDOM"
You need to press ctrl+enter to start a session.
CHECK: Your new session should have started (or in the process of starting) successfully and it should have taken you to the session page.
Wait for 20 seconds.
Send a message saying "What are you doing?" (pressing Enter is enough to send the message once the session has started)
Wait 20 seconds.
CHECK: You should have receieved a new message from Devin talking about what it is doing.
Send a new message saying "SLEEP".
Wait 15 seconds.
CHECK: The status of the session should be something like "Devin is sleeping"
Send a new message saying "WAKE UP"
Wait 90 seconds.
CHECK: The status of the session should no longer be sleeping
CHECK: Devin should have replied to your message with a new message
If you see devin asking you to confirm a plan then always confirm the plan.
Then wait until Devin sends you a message with a link to a github PR.
CHECK: Devin should have sent you a message with a link to a github PR.
Type in "SLEEP"
Wait 15 seconds.
CHECK: The status of the session should be sleeping
Now we will write a comment on the PR created above using the API
Run the following shell command replacing the secret with the actual token and the PR number with the actual PR number from the previous message:
```
gh pr comment -R usacognition-snapshot/devin-webapp-e732ee6155d2d543dbb285bfd60af2ec826db738 766 -b "Reduce the verbosity of the documentation you just added"
```
Wait 60 seconds and then call <view_browser> to see the state of the browser again (dont use navigate_browser)
CHECK: Devin should no longer be in the sleeping state
CHECK: There should be text saying "Received feedback from GitHub comments" or something along those lines explaining that Devin received feedback from GitHub comments
""",
),
create_qa_test(
test_name="slack-devin-session",
user_prompt="""Instructions:
Open https://qadevin.slack.com/sign_in_with_password and login using the email and password from your .env file.
If it asks you to enter a 2-step-verification code, use the authenticator code from your .env file.
CHECK: You are logged in successfully to slack
Go to the channel #testing
Then send a message in the channel:
```
@Devin tell me how to access your machine
```
Wait 45 seconds
CHECK: Devin should have replied to your message on that thread with a new message explaing how to access his machine. It should include something about Vscode in its reply.
Reply on the thread with `SLEEP`
Wait 5 seconds
CHECK: Devin should have replied to your message on that thread saying that he is going to sleep
""",
),
]