Container Environments
Deploy NeMo Curator in containerized environments for reproducible, scalable data curation pipelines with pre-configured dependencies and optimized runtime settings.
Overview
NeMo Curator provides official Docker containers with all dependencies pre-installed and optimized for production workloads. Containers offer:
- Reproducible Environments: Consistent software stack across development, testing, and production
- Simplified Deployment: No manual dependency installation or environment configuration
- GPU Acceleration: Pre-configured CUDA, cuDNN, and NVIDIA libraries for optimal performance
- Multi-Modal Support: Built-in support for text, image, video, and audio curation
- Cloud-Ready: Compatible with Kubernetes, Docker Swarm, and cloud container orchestries
When to use containers:
- Production deployments requiring consistency and reliability
- Multi-node cluster processing with identical environments
- CI/CD pipelines for automated data curation workflows
- Quick prototyping without local environment setup
- GPU-accelerated processing in cloud environments
Available Containers
Main NeMo Curator Container
The primary container includes comprehensive support for all curation modalities:
Container registry: nvcr.io/nvidia/nemo-curator:container_version“
Supported modalities:
- ✅ Text curation (CPU/GPU)
- ✅ Image curation (GPU required)
- ✅ Video curation (GPU required, FFmpeg included)
- ✅ Audio curation (GPU required for ASR)
Pre-installed components:
- NeMo Curator with all optional dependencies (
[all]extras) - CUDA 12.8.1 with cuDNN
- Python 3.12 with uv package manager
- FFmpeg 8+ with NVENC support (for video processing)
- Ray, Dask, and distributed computing frameworks
- NVIDIA optimized Python packages
Curator Environment
| Property | Value |
|---|---|
| Python Version | 3.12 |
| CUDA Version | 12.8.1 (configurable) |
| Operating System | Ubuntu 24.04 (configurable) |
| Base Image | nvidia/cuda:${CUDA_VER}-cudnn-devel-${LINUX_VER} |
| Package Manager | uv (Ultrafast Python package installer) |
| Installation | NeMo Curator installed with all optional dependencies ([all] extras) using uv with NVIDIA index |
| Environment Path | Virtual environment at /opt/venv. Activate with source /opt/venv/env.sh after entering the container. |
Security Hardening
The container build includes the following security measures:
ray_dist.jarremoval: Ray’s Java support JAR is deleted during the build to remove a bundled jackson-core library affected by GHSA-72hv-8253-57qq (DoS via async JSON parser). NeMo Curator does not use Ray’s Java support, so this has no functional impact. A build-time verification guard fails the build if the JAR is not successfully removed.
Container Build Arguments
The main container accepts these build-time arguments for environment customization:
| Argument | Default | Description |
|---|---|---|
CUDA_VER | 12.8.1 | CUDA version |
LINUX_VER | ubuntu24.04 | Base OS version |
CURATOR_ENV | ci | Curator environment type |
NVIDIA_BUILD_ID | <unknown> | NVIDIA build identifier |
NVIDIA_BUILD_REF | - | NVIDIA build reference |
Environment Usage Examples
Text Curation
Uses the default container environment with CPU or GPU workers depending on the module.
Image Curation
Requires GPU-enabled workers in the container environment.