Pocket TTS Skill

Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.

Features

🎯 Fully local - No API calls, runs completely offline
🚀 CPU-only - No GPU required, works on any computer
⚡ Fast generation - ~2-6x real-time on CPU
🎤 8 built-in voices - alba, marius, javert, jean, fantine, cosette, eponine, azelma
🎭 Voice cloning - Clone any voice from a WAV sample
🔊 Low latency - ~200ms first audio chunk
📚 Simple Python API - Easy integration into any project

Installation

# 1. Accept the model license on Hugging Face
# https://huggingface.co/kyutai/pocket-tts

# 2. Install the package
pip install pocket-tts

# Or use uv for automatic dependency management
uvx pocket-tts generate "Hello world"

Usage

CLI

# Basic usage
pocket-tts "Hello, I am your AI assistant"

# With specific voice
pocket-tts "Hello" --voice alba --output hello.wav

# With custom voice file (voice cloning)
pocket-tts "Hello" --voice-file myvoice.wav --output output.wav

# Adjust speed
pocket-tts "Hello" --speed 1.2

# Start local server
pocket-tts --serve

# List available voices
pocket-tts --list-voices

Python API

from pocket_tts import TTSModel
import scipy.io.wavfile

# Load model
tts_model = TTSModel.load_model()

# Get voice state
voice_state = tts_model.get_state_for_audio_prompt(
    "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)

# Generate audio
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to WAV
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())

# Check sample rate
print(f"Sample rate: {tts_model.sample_rate} Hz")

Available Voices

| Voice | Description |

|-------|-------------|

| alba | Casual female voice |

| marius | Male voice |

| javert | Clear male voice |

| jean | Natural male voice |

| fantine | Female voice |

| cosette | Female voice |

| eponine | Female voice |

| azelma | Female voice |

Or use --voice-file /path/to/wav.wav for custom voice cloning.

Options

| Option | Description | Default |

|--------|-------------|---------|

| text | Text to convert | Required |

| -o, --output | Output WAV file | output.wav |

| -v, --voice | Voice preset | alba |

| -s, --speed | Speech speed (0.5-2.0) | 1.0 |

| --voice-file | Custom WAV for cloning | None |

| --serve | Start HTTP server | False |

| --list-voices | List all voices | False |

Requirements

Python 3.10-3.14
PyTorch 2.5+ (CPU version works)
Works on 2 CPU cores

Notes

⚠️ Model is gated - accept license on Hugging Face first
🌍 English language only (v1)
💾 First run downloads model (~100M parameters)
🔊 Audio is returned as 1D torch tensor (PCM data)

Pocket Ttsv1.0.1

Install & Quick Start

Pocket TTS Skill

Features

Installation

Usage

CLI

Python API

Available Voices

Options

Requirements

Notes

Links

AI Usage Analysis

💡 Application Scenarios

💼 Business Models

More Speech & Audio Skills