MLHub Utilities
The utils module contains common utilities for MLHub.
Configurations
Download Directory
This is the directory where all the files (datasets, checkpoints, etc.) are downloaded. It’s used for dataloaders, saving checkpoints during training, loading checkpoints for models, etc.
It is /tmp by default. However, it can be set through the
following means
By setting the environment variable
MLHUB_DOWNLOAD_DIRto the desired directory.Using the
mlhub.utils.set_download_dir()function.
You can get the current download directory using the
mlhub.utils.get_download_dir() function. The value of
this variable is referred as DOWNLOAD_DIR in the docs.
- mlhub.utils.get_download_dir() str[source]
Get the download directory (as absolute/resolved path). Only use
mlhub.utils.set_download_dir()to set the download directory.- Returns:
The fully resolved download directory
- mlhub.utils.set_download_dir(path: str) str[source]
Set the download directory. If directory doesn’t exist, it is created. Use
mlhub.utils.get_download_dir()to get the current download directory.Note
By default, the download directory is set by the environment variable
MLHUB_DOWNLOAD_DIR. If it’s not set, then the default is/tmp.- Parameters:
path – The download directory
- Returns:
The download directory
File Management
- mlhub.utils.ex(x: str) str[source]
Expand a path fully (to realpath). Also expands
~(tilde) to home.- Parameters:
x – A path
- Returns:
A fully resolved (absolute) path
- mlhub.utils.download_and_extract_archive(url: str, download_root: str | None = None, extract_root: str | None = None, filename: str | None = None, md5: str | None = None, remove_finished: bool = False) None[source]
A wrapper to PyTorch’s download and extract function with documentation. If the file is already downloaded, then the download is not done again (after an MD5 integrity check).
- Parameters:
url – The download URL (to obtain the file from)
download_root – Root folder where downloaded items must be stored. If None, then it is inferred from the function
mlhub.utils.get_download_dir().extract_root – Root folder where the downloaded items are extracted. If None, then it is the same as the
download_rootfilename – The filename to use for saving. It is the basename of the URL if None.
md5 – The checksum to check the downloaded file against (before extracting anything). No check is done if
None.remove_finished – If True, remove the downloaded file after extracting it.
- mlhub.utils.check_md5(file: str, true_md5: str | None = None) str | bool[source]
Returns the MD5 checksum of the given file
- Parameters:
file – The file to check (should exist)
true_md5 – The true MD5 checksum of the file. If None, then the checksum is not checked and the function returns the MD5 checksum of the file. If an expected (true) hash is passed then the function returns
Trueif the MD5 matches (Falseotherwise)
- Returns:
The MD5 checksum of the file if
true_md5is None. Else a bool comparingtrue_md5with the MD5 offile.