Develop #376

joshua-seals · 2025-03-11T16:50:51Z

No description provided.

…x-professor

…to add-edu-dev-student-and-prof-contexts

Tycho sessionid

…to add-edu-dev-student-and-prof-contexts

…f-contexts Add eduhelx-dev-student and eduhelx-dev-profesor brands

BUG removing the wordy line

Pr template

fix bug with expiry default constraint

change anon to full to fix serialization bug

Adding resource optimizations

Make 0.5cpu the standard for all apps

Hoid · 2025-03-11T17:20:52Z

Makefile


 ifeq "${DEBUG}" "true"
-LOG_LEVEL := "debug"
+LOG_LEVEL := DEBUG


This shouldn't be changed. This variable is only used in the run command of gunicorn which accepts the following values for --log-level, with the default being 'info'

'debug'

'info'

'warning'

'error'

'critical'

Hoid · 2025-03-11T17:23:39Z

appstore/api/v1/models.py

+
+        # Dividing resources by 2 for reservations
+        # This helps in cloud scenarios to reduce costs.
+        if self.cpus >= 0.5:


Should this be less than or equal to, or what you have? Currently if the cpus are set to anything like 2 or 6 or whatever, it will be set to 0.5, which seems wrong. I think you just want to set the requests equal to half of the limits for any cpu value over 2, right?

No this was a change based on PJ's request. Originally it was half like you are saying

Maybe the way this should work is that the 'reservations' value from each docker compose file is used for the requests in a pod and then the limits value would be the maximum value that a user can select when creating the helx app. This would mean that someone needs to make sure that all the 'reservations' values in each docker compose is set right (just enough to get each helx app pod going).

0.5 CPUs requests might be too much, but that will just depend on each app. Better than what it was though. For the helx apps running in the cluster last night, that would be around 50 CPUs that are being reserved for the apps.

I think the requests value for CPUs needs to be set to the average amount of CPUs the container needs to do its job under normal load (not idle) and the limits should be set to the maximum value we think students will ever need for the application to do its work in a reasonable amount of time. We need to make sure that containers aren't slow and sluggish if there's a lot of users running pods, and if we find that the number of CPUs being used as the requests when requests are set to the average the container needs to do its job under normal load is higher than the number of CPUs provisioned in ASHE, that's an indication that ASHE is underprovisioned for what we need.

To summarize, here's what I think we should do:

enable pod-reaper again and make it reap pods after 24 hours of idle time

tell people to use reasonable values for the requests in helx-apps depending on the image, and set the limits to whatever we think students should never need to use

tell students to use LIMIT if their queries are taking too long or if the container crashes

disable pgadmin backups until we fix the sessions bug, which is causing the backups to fail

if we realize that even if requests are set to the minimum possible while applications are still able to function under normal load and ASHE is still having problems when we do 1-4, then we should look into increasing the CPU provisions in ASHE

Figuring out the "idle time" in #1 is what the problem has been in the past, unless someone has figured that trick out? In some of the labs I've done recently there is a timer that is shown in the main web app, which can be increased or decreased while someone is viewing the page. When the timer runs out their lab environment is destroyed. Maybe something like that would work better than idle time for the helx apps.

pod-reaper isn't the best solution or the easiest. I like the low hanging fruit here. As far as .5cpu for all, my hope right now is that something like this can get us to a ballpark that we can work out in the future as we fine tune - I'd rather have too much than too little going from being super greedy like we are now with requests.

Ah, I misunderstood how pod-reaper works. We don't have to use that.

I still think the other 4 points I laid out are better than setting the requests to no more than 0.5 for every tycho-launched pod, but I'm okay with having this change go through if we commit to revisiting it soon after we've made the changes I suggest. I think long-term we should probably set it to 1cpu as the max requests can be instead of 0.5.

Are y'all good with that plan?

I'd like to be data driven, based on whether users experience throttling or not. I am of similar mind as @pj-linebaugh in thinking we will actually give less cpu once we've used this for a while and find no issues. I have no issues with your 2-4 plan though at all. That all seems reasonable.

Ultimately, I think the reservations values in the docker compose files for each helx app should be used for their requests values and the limits values in each docker compose file can be used for the limits in the pods and/or the maximum value that users can choose when launching the app. Maybe even don't give them the option, except for adding a GPU. I guess we can set standard values for a helx app's requests and limits if they aren't specified in the docker compose file. Each helx app is different and their resources need to be set accordingly. The same helx app can also need different resources allocated to it depending on the project. One project might need pgadmin to work with a small dataset and another might need a pgadmin that works with a huge dataset.

Hoid · 2025-03-11T19:55:00Z

appstore/core/apps.py

    name = 'core'
+
+    def ready(self):
+        import core.signals


Is this unfinished?

No it's importing signals

core.signals is an empty file

pj-linebaugh · 2025-03-11T19:56:44Z

appstore/core/signals.py

An empty file?

@frostyfan109 What is this empty file about?

Signals were originally going to be used for a feature, I think. We can delete it, but it's basically just boilerplate

fix setting

ptlharit2 · 2025-03-12T14:09:21Z

appstore/tycho/context.py

 import uuid
 import yaml
 import copy
+from typing import TypedDict, Optional


Are you using these imports?

frostyfan109 and others added 24 commits May 23, 2024 16:06

Add extra env to pods in tycho

d0518c6

Fix bug with Makefile when running in non-debug mode

d55ca72

Add identity access token system

e85828f

Add more detailed error statuses

2b3fc55

feat: Add settings files for eduhelx-dev-student/professor and eduhel…

7790dac

…x-professor

Merge branch 'develop' of https://github.com/helxplatform/appstore in…

f02b1a9

…to add-edu-dev-student-and-prof-contexts

Merge pull request #353 from helxplatform/tycho-sessionid

7137802

Tycho sessionid

Merge branch 'develop' of https://github.com/helxplatform/appstore in…

094652c

…to add-edu-dev-student-and-prof-contexts

fix: Add brands for prod and icons for all Eduhelx

4accd11

fix: Fix the logo in the top left corner not showing up

4e9262a

Merge pull request #368 from helxplatform/add-edu-dev-student-and-pro…

8ea2245

…f-contexts Add eduhelx-dev-student and eduhelx-dev-profesor brands

Merge pull request #369 from helxplatform/master

5b6826b

BUG removing the wordy line

Pr template

cce2586

Update pull_request_template.md

e25aab0

Merge pull request #370 from helxplatform/owasppr

aa2410c

Pr template

Update pull_request_template.md

1aae5ba

fix bug

07f0cc2

change fn to anon

2c65c6a

Merge pull request #372 from helxplatform/fix/identity-token-bug

db6cfa8

fix bug with expiry default constraint

change anon to full to fix serialization bug

3fb4e4d

Merge pull request #373 from helxplatform/fix/identity-token-bug

4fb64fe

change anon to full to fix serialization bug

Adding resource optimizations

13c434d

Merge pull request #375 from helxplatform/cloud_optimizations

0741349

Adding resource optimizations

Update models.py

c8c216b

Make 0.5cpu the standard for all apps

joshua-seals requested review from Hoid, cnbennett3, frostyfan109, hina-shah, ptlharit2 and waTeim March 11, 2025 16:50

Hoid reviewed Mar 11, 2025

View reviewed changes

joshua-seals requested a review from pj-linebaugh March 11, 2025 17:41

Hoid reviewed Mar 11, 2025

View reviewed changes

pj-linebaugh reviewed Mar 11, 2025

View reviewed changes

pj-linebaugh approved these changes Mar 11, 2025

View reviewed changes

Update Makefile

375a5c5

fix setting

ptlharit2 reviewed Mar 12, 2025

View reviewed changes

appstore/tycho/context.py

import uuid

import yaml

import copy

from typing import TypedDict, Optional

Copy link

ptlharit2 Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using these imports?

ptlharit2 approved these changes Mar 12, 2025

View reviewed changes

joshua-seals merged commit 590ed00 into master Mar 12, 2025
9 of 10 checks passed

Develop #376

Develop #376

Uh oh!

Conversation

joshua-seals commented Mar 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-seals Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

joshua-seals Mar 11, 2025 •

edited

Loading