-
Notifications
You must be signed in to change notification settings - Fork 60
_download_file_from_run silently does nothing if is_local_rank_zero() returns False #915
Description
This might be dangerous because the way health_azure checks whether is_local_rank_zero might be compatible with Lightning but not with other frameworks, and the files are unexpectedly missing after being "downloaded". This is especially problematic when validate_checksum is True, because one would expect that things are double-checked after downloading.
hi-ml/hi-ml-azure/src/health_azure/utils.py
Lines 1176 to 1195 in c606808
| def _download_file_from_run( | |
| run: Run, filename: str, output_file: Path, validate_checksum: bool = False | |
| ) -> Optional[Path]: | |
| """ | |
| Download a single file from an Azure ML Run, optionally validating the content to ensure the file is not | |
| corrupted during download. If running inside a distributed setting, will only attempt to download the file | |
| onto the node with local_rank==0. This prevents multiple processes on the same node from trying to download | |
| the same file, which can lead to errors. | |
| :param run: The AML Run to download associated file for | |
| :param filename: The name of the file as it exists in Azure storage | |
| :param output_file: Local path to which the file should be downloaded | |
| :param validate_checksum: Whether to validate the content from HTTP response | |
| :return: The path to the downloaded file if local rank is zero, else None | |
| """ | |
| if not is_local_rank_zero(): | |
| return None | |
| run.download_file(filename, output_file_path=str(output_file), _validate_checksum=validate_checksum) | |
| return output_file |
A minimally-invasive change would be logging a warning before returning None. It might also be nice to propagate the paths of the downloaded files up to other functions that call this one. For example, maybe _download_files_from_run and download_files_from_run_id should return the paths of all downloaded files.