Skip to content

Conversation

@xiang-wuu
Copy link

DVC i.e Data Version Control is a data and ML experiment management tool for work flow management system that takes advantage of the existing DevOps & VCS toolset that everyone is already familiar with (Git, IDE, CI/CD, etc). There are multiple use-cases and purpose for integrating DVC specifically in ML/DL related code-bases, as managing and keeping training & validation logs for lengthy & complex training experimentation's in GitHub VCS along with trained model is not something git is made for, managing binary artifacts is still very challenging and git/GitHub doesn't encourage to support them hence DVC makes these things very hassle free.

  1. Easy versioning and tracking of all training experimentation's.
  2. Can be used to track all test/validation log's as data artifacts.
  3. All pre-trained model weights can be maintained as binary artifacts to backend storage remotes.
  4. DVC is storage agnostic as multiple backend storage's like S3-bucket,azure,SSH, google-drive,etc. can be used to store & track all binary artifacts.
  5. It is possible to sync up with git version of code-base along with the binary artifacts maintained using DVC.
  6. DVC is very easy and similar to use like git, as most of the commands are similar to git.
  7. It is possible to manage complex ML & data pipelines for all running experimentation's.

I can further update on this PR if the author is interested to integrate DVC to this code-base.
Thank You!

@xiang-wuu
Copy link
Author

@Chilicyy can you please update on this?

@zidanexu zidanexu closed this Sep 21, 2022
@zidanexu zidanexu reopened this Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants