Conversation
The main goal here is to not have output written directly to stdout and stderr. The capture_output parameter does not make a lot of sense. With capture_output=False the output was written directly to stdout and with capture_output=True the output was captured and written to the log and returned from the function. Now the output (stdout and stderr) is always caputred and returned and written to the log. I.e. it is not written to stdout anymore. When an out stream is passed (stdout parameter != None), the output is not written to the log because it could be binary (i.e. it is used for cpio output). This is no change from the previous implementation. Additionally, stdin is set to subprocess.DEVNULL. This prevents processes executed from changing the terminal settings (i.e. changing the behavior of \n to just cause newline instead of a new line and carriage return). Changing the terminal settings can mess up the output during parallel builds.
This allows redirecting the output of the tool to a logfile, without relying on an external process (e.g. bash) to do the redirection. This also moves exception and banner printing to the log. It is required to enable clean parallel builds.
|
Not sure about 1. yet, but 2. makes lot of sense to me |
|
I would move the Fake part to a library, to reduce the code size and complexity of the build tools and make this reusable, which I prepared as https://github.com/eLiSCHe/libexec. The first topic is already addressed there. |
|
|
||
| if release_version: | ||
| print(f'Thanks for using EB corbos Linux workspace {release_version}!') | ||
| logging.info('Thanks for using EB corbos Linux workspace %s!', release_version) |
There was a problem hiding this comment.
Not using logging is intended, to print this on any log level.
|
|
||
|
|
||
| def init_logging(level: str = 'INFO') -> None: | ||
| def add_loging_arguments(parser: ArgumentParser) -> None: |
There was a problem hiding this comment.
I would not add this and instead use the bash redirection. I don't see and advantage of this parameter.
| ], | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.PIPE, | ||
| stdin=subprocess.DEVNULL, |
There was a problem hiding this comment.
I would propose to use a dataclass with reasonable defaults, since fake has already much to much parameters.
| else: | ||
| err = None | ||
| out = None | ||
| out: int | BufferedWriter = PIPE |
There was a problem hiding this comment.
PIPE will capture the output, None will print it to stdout and make it instant visible, which is a good idea for long running steps to get the user a instant feedback. Capturing the output always is a bad idea from my point of view.
|
@MofX I'm against support for parallel builds, since they make the build only more complex and a user cannot follow it step by step anymore following the live output. The build time advantage is also minimal for local development work, since you will only once build form scratch, and then only rebuild the one changed artifact and it's dependencies. Also the think all steps should print out to stdout directly, and not capture the logs, if possible, to provide instant feedback, and then we can use tee to pipe the logs to a log file additionally. |
|
To 1.) You can run multiple builds in parallel already. The improvement only helps for the first build, where you need execute all build steps, and if you have good enough hardware and you have build steps which are not already using multiple cores. In the end its a workaround making the build tools more complex and the build process less transparent. A real improvement would it be to speed up the single steps, but also this is not mandatory right now. To 2.) You would get the same only in a log file, not additionally as today, and it will not make it better readable. The problem is not that the user is sending you the wrong logs, but it is that the user has no idea what happens in the background and needs your help as soon as something goes wrong. |
|
|
|
But there are very good reasons to not have parallel build steps:
And on the pro side, for parallel builds, we have only:
So overall, form my point of view, we have quite some negative aspects for a minimal improvement. |
|
By the way, you can run steps parallel using Taskfile, we just make no use of it in our examples because we don't want this complexity. The log files will be OK, just the live output will mix up the logs. This redirecting and tee-ing the live output is done in the tasks, so you can use parallel build steps for your image if you you write proper tasks, and you get the output for each build step separated in different logs, and you can write the log file hint as last step of the build step to the console. There is no need at all to do this in the Python code of the build tools. If you want, you can do this as you want on a per image base. https://yairdar.github.io/base-tutorials/a-taskfile/c06_deps/ |
I know that you can run the build in parallel, it even works with make. The only annoying part is the log output, as it gets even more unreadable when being mixed up. What is needed to use parallel builds efficiently is a proper log handling. And even for sequential builds this would be desirable. No one is asking for parallel builds from inside the root_generator. It is totally fine to run multiple instances in parallel (orchestrated by some kind of build tooling). |
But at the moment, log handling is done in the tasks or in make, and not in the build tools, and there is no reason to change this. If you don't want, you don't have to tee the output, but you can. If you don't want, you can just remove the tee form the cli of the task, and the you only get the log file. At the moment, log handling is out of scope of the build tools where possible, and form my point of view it shall stay this way. |
Changes
Please see this as an RFC, not something I absolutely have to push into the code.
This implements two major changes:
Tests results
Developer Checklist:
Reviewer checklist: