MLHub Utilities

The utils module contains common utilities for MLHub.

Configurations

Download Directory

This is the directory where all the files (datasets, checkpoints, etc.) are downloaded. It’s used for dataloaders, saving checkpoints during training, loading checkpoints for models, etc.

It is /tmp by default. However, it can be set through the following means

You can get the current download directory using the mlhub.utils.get_download_dir() function. The value of this variable is referred as DOWNLOAD_DIR in the docs.

mlhub.utils.get_download_dir() str[source]

Get the download directory (as absolute/resolved path). Only use mlhub.utils.set_download_dir() to set the download directory.

Returns:

The fully resolved download directory

mlhub.utils.set_download_dir(path: str) str[source]

Set the download directory. If directory doesn’t exist, it is created. Use mlhub.utils.get_download_dir() to get the current download directory.

Note

By default, the download directory is set by the environment variable MLHUB_DOWNLOAD_DIR. If it’s not set, then the default is /tmp.

Parameters:

path – The download directory

Returns:

The download directory

File Management

mlhub.utils.ex(x: str) str[source]

Expand a path fully (to realpath). Also expands ~ (tilde) to home.

Parameters:

x – A path

Returns:

A fully resolved (absolute) path

mlhub.utils.download_and_extract_archive(url: str, download_root: str | None = None, extract_root: str | None = None, filename: str | None = None, md5: str | None = None, remove_finished: bool = False) None[source]

A wrapper to PyTorch’s download and extract function with documentation. If the file is already downloaded, then the download is not done again (after an MD5 integrity check). However, the downloaded file is always extracted (files are overwritten if they already exist).

Parameters:
  • url – The download URL (to obtain the file from)

  • download_root – Root folder where downloaded items must be stored. If None, then it is inferred from the function mlhub.utils.get_download_dir().

  • extract_root – Root folder where the downloaded items are extracted. If None, then it is the same as the download_root

  • filename – The filename to use for saving. It is the basename of the URL if None.

  • md5 – The checksum to check the downloaded file against (before extracting anything). No check is done if None.

  • remove_finished – If True, remove the downloaded file after extracting it.

mlhub.utils.check_md5(file: str, true_md5: str | None = None) str | bool[source]

Returns the MD5 checksum of the given file

Parameters:
  • file – The file to check (should exist)

  • true_md5 – The true MD5 checksum of the file. If None, then the checksum is not checked and the function returns the MD5 checksum of the file. If an expected (true) hash is passed then the function returns True if the MD5 matches ( False otherwise)

Returns:

The MD5 checksum of the file if true_md5 is None. Else a bool comparing true_md5 with the MD5 of file.

Images

mlhub.utils.norm_img(img: Tensor | ndarray, eps: float = 1e-12) Tensor | ndarray[source]

Normalize an image (uniformly map [min, max]) to range [0, 1].

Parameters:
  • img – The image to normalize. This is not modified.

  • eps – A small value to avoid division by zero

Returns:

The normalized image.

Miscellaneous

mlhub.utils.random_alnum_str(n: int = 4) str[source]

Generate a random alphanumeric string of length n. Characters could be repeated.

Parameters:

n – The length of alphanumeric string