AI Makerspace Shared Resources Tutorial

This tutorial demonstrates how to set up centralized storage for ML resources, avoiding redundant downloads and optimizing storage usage across your organization.

Overview

This tutorial addresses two common challenges in shared computing environments:

Dataset Redundancy: Multiple users downloading the same large datasets repeatedly
Model Storage Overhead: Each user maintaining separate copies of popular pretrained models

By centralizing these resources, you can significantly reduce network bandwidth usage, save storage space, and accelerate research workflows.

Part 1: Shared Datasets

The first half of this tutorial demonstrates how to download and store common AI datasets in a centralized location accessible to all users. This eliminates the need for each researcher to download datasets individually, saving time and resources.

The `makerspace_ds_mgr` Module

We provide a custom Python module called makerspace_ds_mgr that simplifies working with shared datasets. This module:

Scans and organizes datasets by format (ARRAYRECORD, TFRECORD, PARQUET, PYTORCH) and use case (IMAGE, TEXT, MEDICAL, VIDEO, etc.)
Provides a simple query interface to locate datasets
Returns absolute paths to dataset directories for easy integration with existing code
Supports flexible directory structures adaptable to your organization's needs

Key Features:

from makerspace_ds_mgr import DatasetMgr

# Initialize with your shared dataset directory
mgr = DatasetMgr(base_dir="/path/to/shared/datasets")

# View all available datasets
mgr.show_datasets()

# Get the path to a specific dataset
dataset_path = mgr.query_dataset("imagenette2")

Part 2: Shared Models

The second half focuses on hosting popular pretrained models from the HuggingFace Hub. By maintaining centralized copies of commonly-used models, you eliminate the need for each user to download these large files individually.

Important Note: The shared models are intended for inference and evaluation. If you need to fine-tune or continue training a model, you should download a local copy to avoid conflicts with other users.

Understanding HuggingFace Hub

The HuggingFace Hub is an online platform that serves as a central repository for sharing and discovering machine learning resources. Think of it like GitHub, but specialized for models, datasets, and ML applications. Users can:

Upload and share their own models or datasets
Access thousands of pretrained models and datasets shared by the community
Track versions and collaborate with other researchers
Discover state-of-the-art models for various tasks

Downloading Models with `snapshot_download`

We use the snapshot_download function from the huggingface_hub Python package to download and store models locally. This function:

Downloads a complete snapshot of a repository (model or dataset) hosted on the HuggingFace Hub
Creates an immutable copy of all repository contents at a specific commit or revision
Stores files locally for consistent, fast access without requiring internet connectivity
Ensures reproducibility by capturing the exact state of the model at download time

Example:

from huggingface_hub import snapshot_download

local_path = snapshot_download(
    repo_id="microsoft/resnet-50",
    local_dir="/shared/models/resnet-50",
    local_dir_use_symlinks=False
)

Using Stored Models

Using the stored model snapshots is straightforward:

Select the model you want to use from your shared directory
Locate its snapshot directory (e.g., /shared/models/resnet-50)
Visit the model's HuggingFace page (e.g., https://huggingface.co/microsoft/resnet-50)
Review the usage example on the model card
Modify the code to point to your local snapshot instead of downloading from the hub

The key is to use the local path and set local_files_only=True where applicable:

from transformers import AutoImageProcessor, ResNetForImageClassification

# Load from shared directory instead of downloading
processor = AutoImageProcessor.from_pretrained(
    "/shared/models/resnet-50", 
    local_files_only=True
)
model = ResNetForImageClassification.from_pretrained(
    "/shared/models/resnet-50"
)

Tutorial Examples

This tutorial includes two complete examples:

1. Image Classification with ResNet-50

Demonstrates loading the Microsoft ResNet-50 model from a shared directory
Shows how to use the shared Imagenette2 dataset
Includes inference on both dataset images and custom images

2. Text Processing with BERT

Illustrates using the BERT base uncased model for masked language modeling
Shows the fill-mask pipeline with locally stored models

Prerequisites

Python 3.7+
Virtual environment (recommended)

Installation

# Create and activate a virtual environment
python -m venv my_env
source my_env/bin/activate  # On Windows: my_env\Scripts\activate

# Install required packages
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
makerspace_ds_mgr		makerspace_ds_mgr
.gitignore		.gitignore
AIMakerspace_shared_resources_tutorial.ipynb		AIMakerspace_shared_resources_tutorial.ipynb
README.md		README.md
requirements.txt		requirements.txt
tiger.jpg		tiger.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Makerspace Shared Resources Tutorial

Overview

Part 1: Shared Datasets

The `makerspace_ds_mgr` Module

Part 2: Shared Models

Understanding HuggingFace Hub

Downloading Models with `snapshot_download`

Using Stored Models

Tutorial Examples

1. Image Classification with ResNet-50

2. Text Processing with BERT

Prerequisites

Installation

About

Uh oh!

Releases

Packages

Languages

pace-gt/AI-Makerspace-shared-resources-tutorial

Folders and files

Latest commit

History

Repository files navigation

AI Makerspace Shared Resources Tutorial

Overview

Part 1: Shared Datasets

The makerspace_ds_mgr Module

Part 2: Shared Models

Understanding HuggingFace Hub

Downloading Models with snapshot_download

Using Stored Models

Tutorial Examples

1. Image Classification with ResNet-50

2. Text Processing with BERT

Prerequisites

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

The `makerspace_ds_mgr` Module

Downloading Models with `snapshot_download`

Packages