Infrastructure References
This section provides technical reference documentation for NeMo Curator’s infrastructure components that are used across all modalities (text, image, video).
Infrastructure Components
Memory Management
Optimize memory usage when processing large datasets partitioning batching monitoring
GPU AccelerationLeverage NVIDIA GPUs for faster data processing cuda rmm performance
Resumable ProcessingContinue interrupted operations across large datasets checkpoints recovery batching
Container EnvironmentsAvailable environments and configurations in NeMo Curator containers. Includes build arguments and video-specific environments. docker conda environments