Skip to content

add unique_id to subjects table to track subject across ML workloads #22

@camallen

Description

@camallen

GZ uses unique_id str on subject metadata to uniquely identify the subject in the research domain context, e.g.

def unique_id
unique_id = payload.dig('subject', 'metadata', '#name')
return unique_id if unique_id
# staging has older data with different subject metadata - fallback to handling this special env case
payload.dig('subject', 'metadata', '!SDSS_ID') if Rails.env.staging? || Rails.env.test?
end

This is the data that flows into the catalgoues and ML systems to uniquely identify the subjects, not the subject_id in our systems. As such we'll need to have this attribute added to the subjects table with a unique index and backfilled when importing the subject data to the system.

One solution is to add the metadata import on the subject backfiller job,

Import::SubjectLocations.new(subject).run
. Alternatively this metadata comes through via the caesar reductions payload, we can use this flow of data to extract the information as it comes through.

We can then use this field to uniquely identify the subject linkage when importing / upsert ML results (vector representations, predictions etc).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions