Skip to content
This repository was archived by the owner on Jan 31, 2022. It is now read-only.
This repository was archived by the owner on Jan 31, 2022. It is now read-only.

[Label Bot Continuous Training] Needs Training Needs to take into account whether there is a model currently being trained #178

@jlewi

Description

@jlewi

Our synchronous training pipeline is currently spawning multiple instances of training rather than the expected 1 model per hour.

The problem appears to be the code to decide whether to train a model only looks at whether there is a trained model.
So I don't think we take into account whether a model is currently being trained.

func GetLatestTrained(projectID string, location string, modelName string) (*automlpb.Model, error) {

My conjecture is the following happens

  • We launch a Tekton job to train the model
  • The notebook loads the data into AutoML which is a blocking operatin
  • The notebook initiates an AutoML training job but doesn't block until training is complete
    • This is intentional since we want to upload the notebook output and not wait for the AutoML job to complete.

At this point

  • A new model doesn't exist yet (it is still being trained)
  • needsTrain will continue to return true
  • Since there is no Tekton job running the controller will launch another job

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions