Skip to content

Develop MySQL query to Improve performance for reporting #33

@kylehuynh205

Description

@kylehuynh205

About

  • Brief information about ark-services project
  • The database's design as Key and Value in Mysql has performance issue when the database is growing. Basically, the issue is in order to pull the report of Ark ID with metatdata, we have to do mysql JOINs, but they take so long to get the results.

How to reproduce

SELECT arks.* 
      FROM `dsu_ark`
      AS arks 
      JOIN ( 
        SELECT bound.id, 
        COALESCE(redirected._value, 0) 
        AS _value 
        FROM ( 
          SELECT DISTINCT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id 
          FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key NOT REGEXP '(\\s:\/c|\\sREDIRECT|\\sPID|\\sLOCAL_ID|\\sCOLLECTION)$' AND (_key LIKE '%%' OR _value LIKE '%%')
        ) AS bound 
        LEFT JOIN ( 
          SELECT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id, _value 
          FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key REGEXP '\\sREDIRECT$'
        ) AS redirected ON bound.id = redirected.id 
        ORDER BY _value ASC 
        LIMIT 10 
        OFFSET 0 
      ) AS subquery 
      ON arks._key LIKE CONCAT(subquery.id, '%')
      AND arks._key NOT LIKE '%:\/c'
      ORDER BY arks._key ASC;
 SELECT arks.* 
      FROM `dsu_ark`
      AS arks 
      JOIN ( 
        SELECT * FROM (
          SELECT DISTINCT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id
		      FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key NOT REGEXP '(\\s:\/c|\\sREDIRECT|\\sPID|\\sLOCAL_ID|\\sCOLLECTION)$' 
          INTERSECT
          SELECT DISTINCT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id
          FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key NOT REGEXP '\\s:\/c' AND (_key LIKE '%%' OR _value LIKE '%%')
        ) AS target
        ORDER BY id ASC 
        LIMIT 10
        OFFSET 0
      ) AS subquery 
      ON arks._key LIKE CONCAT(subquery.id, '%') 
      AND arks._key NOT LIKE '%:\/c' 
      ORDER BY arks._key ASC;

  • These query is querying the ARKs IDs with metadata which is bound to it. As LIMIT 10 which is querying 10 at a time, but it's take at least 15 secs to load which is quite slow for just 10 items.

Goals

  • Develop MySQL query which can pull Arks IDs and their metadata more sufficiently in a acceptable loading time
    • as 10, 25, 50 at a time (from the UI)
    • All items (exporting to CSV feature)

Stretch Goals

  • If the Goal above can't be achieved, we may look into restructure the database and look for alternative database platform to use
  • Scope the migration and development, and testing performance in the new database platform, for example:

Related tickets

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions