NVIDIANeMo Curator
Menu

NeMo ASR Models

Use NeMo Framework’s automatic speech recognition models for transcription in your audio curation pipelines. This guide covers basic usage and configuration.

Model Selection

NeMo Framework provides pre-trained ASR models through the Hugging Face model hub. For the complete list of available models and their specifications, refer to the NeMo Framework ASR documentation.

Example Model Usage

# Example using a test-verified model
example_model = "nvidia/parakeet-tdt-0.6b-v2"

# For production use, select appropriate models from:
# https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html

Basic Usage

Simple ASR Inference

from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
from nemo_curator.stages.resources import Resources

# Create ASR inference stage with a model from NeMo Framework
asr_stage = InferenceAsrNemoStage(
    model_name="your_chosen_model_name",  # Select from NeMo Framework docs
    filepath_key="audio_filepath",
    pred_text_key="pred_text"
)

# Configure for GPU processing
asr_stage = asr_stage.with_(
    resources=Resources(gpus=1.0),
    batch_size=16
)

Custom Configuration

# Example with custom field names
custom_asr = InferenceAsrNemoStage(
    model_name="your_chosen_model_name",
    filepath_key="custom_audio_path",
    pred_text_key="transcription"
).with_(
    batch_size=32,
    resources=Resources(cpus=4.0, gpus=1.0)
)

Model Caching

Models are automatically downloaded and cached when first loaded:

# Models are cached automatically on first use
asr_stage = InferenceAsrNemoStage(model_name="your_chosen_model_name")

# The setup() method handles model downloading and caching
asr_stage.setup()

Resource Configuration

Configure GPU and CPU resources based on your hardware:

from nemo_curator.stages.resources import Resources

# Single GPU configuration
asr_stage = InferenceAsrNemoStage(
    model_name="your_chosen_model_name"
).with_(
    resources=Resources(
        cpus=4.0,
        gpu_memory_gb=8.0  # Adjust based on your model's requirements
    ),
    batch_size=16
)

# Multi-GPU configuration
multi_gpu_stage = InferenceAsrNemoStage(
    model_name="your_chosen_model_name"
).with_(
    resources=Resources(
        cpus=8.0,
        gpus=2.0  # Use 2 GPUs
    ),
    batch_size=32
)