Pipeline Parametrization Merged into Main Codebase #783
yoonspark
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We are happy to share that our work on pipeline parametrization has been merged into the
maincodebase! It is yet to be included in our next release, but you can give it a quick try now if you want:Overview
Oftentimes, data scientists/engineers need to run the same pipeline with different parameters. For instance, they may want to use a different data set for model training and/or prediction. To produce a parametrized pipeline, we can use pipeline API's (optional)
input_parametersargument.As a concrete example, consider the following development code:
Now, if we simply run
we get an "inflexible" pipeline where data sources are fixed rather than tunable:
Instead, we can run
to get a parametrized pipline, like so:
As shown, we now have
url1andurl2factored out as easily tunable parameters of the pipeline, which allows us to run it with various data sources beyond those we started with (hence increasing the pipeline's utility).Limitations
Currently,
input_parametersonly accepts variables from literal assignment such asa = "123". For each variable to be parametrized, there should be only one literal assignment across all artifact code for the pipeline. For instance, if botha = "123"anda = "abc"exist in the pipeline's artifact code, we cannot makeaan input parameter since its reference is ambiguous, i.e., we are not sure which literal assignmentarefers to.Reference(s)
Related PRs include (listing the latest first):
Beta Was this translation helpful? Give feedback.
All reactions