The Resources dataclass defines compute requirements for processing stages.
Import
from nemo_curator.stages.resources import Resources
Class Definition
from dataclasses import dataclass
@dataclass
class Resources:
"""Define compute requirements for a stage.
Attributes:
cpus: Number of CPU cores (default: 1.0).
gpu_memory_gb: GPU memory in GB for single-GPU stages (default: 0.0).
gpus: Number of full GPUs (1 or more) for GPU stages (default: 0.0).
"""
cpus: float = 1.0
gpu_memory_gb: float = 0.0
gpus: float = 0.0
Properties
requires_gpu
Check if any GPU resources are requested.
@property
def requires_gpu(self) -> bool:
"""Returns True if any GPU resources are requested (gpus or gpu_memory_gb)."""
Usage Examples
CPU-Only Stage
# Default: 1 CPU core
resources = Resources()
# Multiple CPU cores
resources = Resources(cpus=4.0)
Single-GPU Stage
Use gpu_memory_gb for stages that need a fraction of a GPU:
# Request 16GB of GPU memory
resources = Resources(
cpus=4.0,
gpu_memory_gb=16.0,
)
The system automatically calculates the GPU fraction based on available GPU memory.
Multi-GPU Stage
Use gpus for stages that need one or more full GPUs:
# Request 2 full GPUs
resources = Resources(
cpus=8.0,
gpus=2.0,
)
Important Constraints
# ❌ Invalid - cannot specify both
resources = Resources(gpus=1.0, gpu_memory_gb=16.0)
# ✅ Valid - use gpu_memory_gb for partial GPU
resources = Resources(gpu_memory_gb=16.0)
# ✅ Valid - use gpus for full GPUs
resources = Resources(gpus=2.0)
Using Resources with Stages
from dataclasses import dataclass, field
from nemo_curator.stages.base import ProcessingStage
from nemo_curator.stages.resources import Resources
@dataclass
class GPUClassifierStage(ProcessingStage[DocumentBatch, DocumentBatch]):
name: str = "GPUClassifier"
resources: Resources = field(
default_factory=lambda: Resources(cpus=4.0, gpu_memory_gb=16.0)
)
def process(self, task: DocumentBatch) -> DocumentBatch:
# GPU-accelerated classification
...
Configuring Resources at Runtime
Use with_() to override resource configurations:
stage = GPUClassifierStage()
# Override with more resources
high_resource_stage = stage.with_(
resources=Resources(cpus=8.0, gpu_memory_gb=32.0)
)