text-to-speech
The text-to-speech skill enables AI agents to convert text into natural-sounding audio using ElevenLabs voice models. It supports multiple languages, voice customization, streaming for real-time applications, and various output formats.
Is text-to-speech safe to install?
Review the source first: our audit of text-to-speech's source files found 1 shell command, 1 external URL, file reads and writes (high risk). Every command and URL listed appears verbatim in the skill's source. The skill makes network requests to ElevenLabs API endpoints and writes audio files to the local filesystem.
How we audit skills: our security review methodology.
Who is this skill for?
Developers building voice-enabled AI agents, voice applications, or automated narration systems.
What can you do with it?
- Generating audio from text
- Creating voiceovers
- Building voice-enabled applications
- Synthesizing speech in over 70 languages
- Streaming real-time audio
How good is this skill?
Quality score: 5/10. The documentation is comprehensive, providing clear code examples for Python, JavaScript, and cURL, along with detailed explanations of models, voice settings, and error handling.
What does the skill file contain?
# ElevenLabs Text-to-Speech
Generate natural speech from text - supports 70+ languages, multiple models for quality vs latency tradeoffs.
> **Setup:** See [Installation Guide](references/installation.md). For JavaScript, use `@elevenlabs/*` packages only.
## Quick Start
### Python
```python
from elevenlabs import ElevenLabs
client = ElevenLabs()
audio = client.text_to_speech.convert(
text="Hello, welcome to ElevenLabs!",
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
model_id="eleven_multilingual_v2"
)
with open("output.mp3", "wb") as f:
for chunk in audio:
f.write(...Frequently asked questions
What is required to use this skill?
The skill requires an active internet connection and a valid ElevenLabs API key stored in the ELEVENLABS_API_KEY environment variable.
Can I control the voice characteristics?
Yes, you can adjust stability, similarity boost, style, and speaker boost settings to fine-tune the output.
How do I handle long-form text to avoid audio artifacts?
Use the request stitching feature by providing next_text and previous_text parameters to maintain context between sequential audio generation requests.
Does the skill support real-time audio generation?
Yes, the stream method returns audio chunks as they are generated, which is suitable for real-time applications.