add unique_id to subjects table to track subject across ML workloads

GZ uses `unique_id` str on subject metadata to uniquely identify the subject in the research domain context, e.g. https://github.com/zooniverse/kade/blob/bcb057c27504bff49de34f894709df3582706500/app/services/import/reduction.rb#L68-L74

This is the data that flows into the catalgoues and ML systems to uniquely identify the subjects, not the subject_id in our systems. As such we'll need to have this attribute added to the subjects table with a unique index and backfilled when importing the subject data to the system. 

One solution is to add the metadata import on the subject backfiller job, https://github.com/zooniverse/kade/blob/bcb057c27504bff49de34f894709df3582706500/app/sidekiq/subject_backfiller_job.rb#L8. Alternatively this metadata comes through via the caesar reductions payload, we can use this flow of data to extract the information as it comes through. 

We can then use this field to uniquely identify the subject linkage when importing / upsert ML results (vector representations, predictions etc).

	def unique_id
	unique_id = payload.dig('subject', 'metadata', '#name')
	return unique_id if unique_id

	# staging has older data with different subject metadata - fallback to handling this special env case
	payload.dig('subject', 'metadata', '!SDSS_ID') if Rails.env.staging? \|\| Rails.env.test?
	end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add unique_id to subjects table to track subject across ML workloads #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add unique_id to subjects table to track subject across ML workloads #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions