NVIDIANeMo Curator
Menu

Get Started with Video Curation

This guide shows how to install Curator and run your first video curation pipeline.

The example pipeline processes a list of videos, splitting each into 10‑second clips using a fixed stride. It then generates clip‑level embeddings for downstream tasks such as duplicate removal and similarity search.

Overview

This quickstart guide demonstrates how to:

  1. Install NeMo Curator with video processing support
  2. Set up FFmpeg with GPU-accelerated encoding
  3. Configure embedding models (Cosmos-Embed1)
  4. Process videos through a complete splitting and embedding pipeline
  5. Generate outputs ready for duplicate removal, captioning, and model training

What you build: A video processing pipeline that:

  • Splits videos into 10-second clips using fixed stride or scene detection
  • Generates clip-level embeddings for similarity search and deduplication
  • Optionally creates captions and preview images
  • Outputs results in formats compatible with multimodal training workflows

Prerequisites

System Requirements

To use NeMo Curator’s video curation capabilities, ensure your system meets these requirements:

Operating System

  • Ubuntu 24.04, 22.04, or 20.04 (required for GPU-accelerated video processing)
  • Other Linux distributions may work but are not officially supported

Python Environment

  • Python 3.10, 3.11, or 3.12
  • uv package manager for dependency management
  • Git for model and repository dependencies

GPU Requirements

  • NVIDIA GPU required (CPU-only mode not supported for video processing)
  • Architecture: Volta™ or newer (compute capability 7.0+)
    • Examples: V100, T4, RTX 2080+, A100, H100
  • CUDA: Version 12.0 or above
  • VRAM: Minimum requirements by configuration:
    • Basic splitting + embedding: ~16GB VRAM
    • Full pipeline (splitting + embedding + captioning): ~38GB VRAM
    • Reduced configuration (lower batch sizes, FP8): ~21GB VRAM

Software Dependencies

  • FFmpeg 8.0+ with one of the following encoders:
    • GPU encoder: h264_nvenc (recommended for performance; requires an NVENC-equipped GPU — note that A100 and H100 do not include NVENC)
    • CPU encoder: libvpx-vp9 (for non-NVENC GPUs; produces VP9 in .mp4)

Install

Create and activate a virtual environment, then choose an install option:

PyPI

uv pip install torch wheel_stub psutil setuptools setuptools_scm
uv pip install --no-build-isolation "nemo-curator[video_cuda12]"

Source

git clone https://github.com/NVIDIA-NeMo/Curator.git
cd Curator
uv sync --extra video_cuda12 --all-groups
source .venv/bin/activate

NeMo Curator Container

NeMo Curator is available as a standalone container:

# Pull the container
docker pull nvcr.io/nvidia/nemo-curator:`container_version`

# Run the container
docker run --gpus all -it --rm nvcr.io/nvidia/nemo-curator:`container_version`

Install FFmpeg and Encoders

Curator’s video pipelines rely on FFmpeg for decoding and encoding. If you plan to encode clips (using --transcode-encoder h264_nvenc or --transcode-encoder libvpx-vp9), install FFmpeg with NVENC and libvpx-vp9 support. The maintained install script bundles both.

Debian/Ubuntu (Script)

Use the maintained script in the repository to build and install FFmpeg with NVIDIA NVENC and libvpx-vp9 support. The script enables --enable-cuda-nvcc, --enable-libnpp, and --enable-libvpx.

curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh
chmod +x install_ffmpeg.sh
sudo bash install_ffmpeg.sh

Verify Installation

Confirm that FFmpeg is on your PATH and that at least one supported encoder is available:

ffmpeg -hide_banner -version | head -n 5
ffmpeg -encoders | grep -E "h264_nvenc|libvpx-vp9" | cat

If encoders are missing, reinstall FFmpeg with the required options or use the Debian/Ubuntu script above.

Refer to Clip Encoding to choose encoders and verify NVENC support on your system.

Available Models

Embeddings convert each video clip into a numeric vector that captures visual and semantic content. Curator uses these vectors to:

  • Remove near-duplicate clips during duplicate removal
  • Enable similarity search and clustering
  • Support downstream analysis such as caption verification

NeMo Curator supports two embedding model families:

Cosmos-Embed1 (default): Available in three variants—cosmos-embed1-224p, cosmos-embed1-336p, and cosmos-embed1-448p—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to MODEL_DIR on first run.

Model VariantResolutionVRAM UsageSpeedAccuracyBest For
cosmos-embed1-224p224×224~8GBFastestGoodLarge-scale processing, initial curation
cosmos-embed1-336p336×336~12GBMediumBetterBalanced performance and quality
cosmos-embed1-448p448×448~16GBSlowerBestHigh-quality embeddings, fine-grained matching

Model links:

For this quickstart, the following steps set up support for Cosmos-Embed1-224p.

Prepare Model Weights

For most use cases, you only need to create a model directory. The required model files will be downloaded automatically on first run.

  1. Create a model directory:

    mkdir -p "$MODEL_DIR"
  2. No additional setup is required. The model will be downloaded automatically when first used.

Set Up Data Directories

Organize input videos and output locations before running the pipeline.

  • Local: For local file processing. Define paths like:

    DATA_DIR=/path/to/videos
    OUT_DIR=/path/to/output_clips
    MODEL_DIR=/path/to/models
  • S3: For cloud storage (AWS S3, MinIO, etc.). Configure credentials in ~/.aws/credentials and use s3:// paths for --video-dir and --output-clip-path.

S3 usage notes:

  • Input videos can be read from S3 paths
  • Output clips can be written to S3 paths
  • Model directory should remain local for performance
  • Ensure IAM permissions allow read/write access to specified buckets

Run the Splitting Pipeline Example

Use the example script from https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started to read videos, split into clips, and write outputs. This runs a Ray pipeline with XennaExecutor under the hood.

python tutorials/video/getting-started/video_split_clip_example.py \
  --video-dir "$DATA_DIR" \
  --model-dir "$MODEL_DIR" \
  --output-clip-path "$OUT_DIR" \
  --splitting-algorithm fixed_stride \
  --fixed-stride-split-duration 10.0 \
  --embedding-algorithm cosmos-embed1-224p \
  --transcode-encoder h264_nvenc \
  --verbose

What this command does:

  1. Reads all video files from $DATA_DIR
  2. Splits each video into 10-second clips using fixed stride
  3. Generates embeddings using Cosmos-Embed1-224p model
  4. Encodes clips using h264_nvenc codec
  5. Writes output clips and metadata to $OUT_DIR

Configuration Options Reference

OptionValuesDescription
Splitting
--splitting-algorithmfixed_stride, transnetv2Method for dividing videos into clips
--fixed-stride-split-durationFloat (seconds)Clip length for fixed stride (default: 10.0)
--transnetv2-frame-decoder-modepynvc, ffmpeg_gpu, ffmpeg_cpuFrame decoding method for TransNetV2
Embedding
--embedding-algorithmcosmos-embed1-224p, cosmos-embed1-336p, cosmos-embed1-448pEmbedding model to use
Encoding
--transcode-encoderh264_nvenc, libvpx-vp9, libopenh264Video encoder for output clips. Use libvpx-vp9 (CPU) on GPUs without NVENC such as A100/H100. libopenh264 is opt-in — run install_h264_support.sh --with-libopenh264 inside the container or provide a system FFmpeg that includes it. See Software H.264/HEVC/AV1 Codec Support.
--transcode-use-hwaccelFlagEnable hardware acceleration for encoding (only valid with h264_nvenc).
Optional Features
--generate-captionsFlagGenerate text captions for each clip
--generate-previewsFlagCreate preview images for each clip
--verboseFlagEnable detailed logging output

Understanding Pipeline Output

After successful execution, the output directory will contain:

$OUT_DIR/
├── clips/
│   ├── video1_clip_0000.mp4
│   ├── video1_clip_0001.mp4
│   └── ...
├── embeddings/
│   ├── video1_clip_0000.npy
│   ├── video1_clip_0001.npy
│   └── ...
├── metadata/
│   └── manifest.jsonl
└── previews/  (if --generate-previews enabled)
    ├── video1_clip_0000.jpg
    └── ...

File descriptions:

  • clips/: Encoded video clips (MP4 format)
  • embeddings/: Numpy arrays containing clip embeddings (for similarity search)
  • metadata/manifest.jsonl: JSONL file with clip metadata (paths, timestamps, embeddings)
  • previews/: Thumbnail images for each clip (optional)

Example manifest entry:

{
  "video_path": "/data/input_videos/video1.mp4",
  "clip_path": "/data/output_clips/clips/video1_clip_0000.mp4",
  "start_time": 0.0,
  "end_time": 10.0,
  "embedding_path": "/data/output_clips/embeddings/video1_clip_0000.npy",
  "preview_path": "/data/output_clips/previews/video1_clip_0000.jpg"
}

Best Practices

Data Preparation

  • Validate input videos: Ensure videos are not corrupted before processing
  • Consistent formats: Convert videos to a standard format (MP4 with H.264) for consistent results
  • Organize by content: Group similar videos together for efficient processing

Model Selection

  • Start with Cosmos-Embed1-224p: Best balance of speed and quality for initial experiments
  • Upgrade resolution as needed: Use 336p or 448p only when higher precision is required
  • Monitor VRAM usage: Check GPU memory with nvidia-smi during processing

Pipeline Configuration

  • Enable verbose logging: Use --verbose flag for debugging and monitoring
  • Test on small subset: Run pipeline on 5-10 videos before processing large datasets
  • Use GPU encoding: Enable NVENC for significant performance improvements
  • Save intermediate results: Keep embeddings and metadata for downstream tasks

Infrastructure

  • Use shared storage: Mount shared filesystem for multi-node processing
  • Allocate sufficient VRAM: Plan for peak usage (captioning + embedding)
  • Monitor GPU utilization: Use nvidia-smi dmon to track GPU usage during processing
  • Schedule long-running jobs: Process large video datasets in batch jobs overnight

Next Steps

Explore the Video Curation documentation. For encoding guidance, refer to Clip Encoding.