Skip to content

Performance issues with large volumes of discussion data #235

@kaustavb12

Description

@kaustavb12

Description

Significant performance issues noticed after migrating long-running courses with large volumes of discussion data to MySQL backend.

For a course with around 8000 threads and around 31000 comments, the get threads list API takes around 2 mins to respond and often just times out.

Open edX Release - Sumac
forum release - 0.2.0

Potential Cause

Apart from some DB queries which could be further optimized, the performance issue mostly seems to be arising due to the to_dict methods defined in models in the MySQL backends (Example : Comment to_dict() and ForumUser to_dict()), which in turn recursively fetches data from other models.

I created a POC to optimize the get threads list API and the response time in a sandbox with similar volumes of sample data came down to around 3s. Here, I did the following:

  1. Optimized some queries Ref
  2. Avoided using the to_dic method Ref 1 Ref2

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions