Preset: Deploy Speech Service {#default}
Deploy a GPU-accelerated speech recognition (ASR) and voice synthesis (TTS) service on your Jetson device.
| Device | Purpose |
|---|
| NVIDIA Jetson Orin NX 16GB | Runs GPU-accelerated ASR + TTS with dual language mode support |
What you'll get:
- Real-time streaming speech recognition
- Low-latency voice synthesis (multiple speakers)
- Two language modes: Chinese + English (Matcha TTS + Paraformer ASR) or English-only (Kokoro TTS + Zipformer ASR)
- HTTP + WebSocket API on port 8621
Requirements: Jetson with JetPack 6.x · SSH access · Internet to pull image (~8GB)
Step 1: Deploy Speech Service {#speech_service type=docker_deploy required=true config=devices/docker_remote.yaml}
Deploy the speech recognition and voice synthesis service to your Jetson device. The pre-built image includes all dependencies and models — just pull and run.
Target: Remote Deployment {#speech_remote type=remote config=devices/docker_remote.yaml default=true}
Deploy to your Jetson over SSH with one click.
Wiring
- Connect your Jetson to the network
- Enter the Jetson's IP address and SSH credentials
- Click Deploy — the system will pull the pre-built image and start the service automatically
Deployment Complete
Service is running at http://<jetson-ip>:8621. Quick test:
# Check service health
curl http://<jetson-ip>:8621/health
# Expected: {"asr": true, "tts": true, "streaming_asr": true}
# Test TTS
curl -X POST http://<jetson-ip>:8621/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello, I am your voice assistant.", "sid": 0}' \
--output test.wav
# Test ASR
curl -X POST http://<jetson-ip>:8621/asr \
-F "file=@test.wav"
Troubleshooting
| Issue | Solution |
|---|
| SSH connection failed | Verify the IP address and credentials. Try ssh username@ip from your computer first |
| Image pull slow | The image is ~8GB compressed. Ensure stable internet on the Jetson |
| Service not starting | Check logs: ssh user@ip "cd jetson-voice && docker compose logs" |
| Health check fails | First startup takes ~40 seconds for model warmup. Wait and retry |
| Out of memory | Ensure Jetson has 16GB RAM and no other GPU-intensive tasks running |
Target: Local Deployment {#speech_local type=local config=devices/docker_local.yaml}
Deploy directly on the current machine (requires NVIDIA GPU).
Wiring
- Ensure Docker and NVIDIA Container Toolkit are installed
- Click Deploy to start installation
Note: First startup may take 10-15 minutes for Docker image download and model initialization.
Deployment Complete
Service is running at http://localhost:8621. Quick test:
# Check service health
curl http://localhost:8621/health
# Expected: {"asr": true, "tts": true, "streaming_asr": true}
# Test TTS
curl -X POST http://localhost:8621/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello, I am your voice assistant.", "sid": 0}' \
--output test.wav
# Test ASR
curl -X POST http://localhost:8621/asr \
-F "file=@test.wav"
Troubleshooting
| Issue | Solution |
|---|
| NVIDIA runtime not found | Install NVIDIA Container Toolkit: sudo apt install nvidia-container-toolkit && sudo systemctl restart docker |
| Port 8621 already in use | Stop existing services on port 8621 |
| Container keeps restarting | Check logs: docker logs jetson-voice-speech-1 |
| Health check fails | First startup takes ~40 seconds for model warmup. Wait and retry |
Step 2: Voice Demo {#voice_demo type=voice_demo required=false config=devices/voice_demo.yaml}
Try the deployed speech service directly from this page. Enter the Jetson IP address, then use the panels below to test speech recognition and voice synthesis.
Speech Recognition (ASR)
Press and hold the Record button to speak. Your speech will be recognized in real-time and the transcribed text will appear on screen.
Text to Speech (TTS)
Type any text and click Generate to hear it spoken. The audio will play with a waveform visualization.
Troubleshooting
| Issue | Solution |
|---|
| Microphone not working | Allow microphone access when prompted by your browser |
| ASR shows no results | Verify the service is running: curl http://<ip>:8621/health |
| TTS playback silent | Check browser audio is not muted. Try a shorter text first |
Deployment Complete
Congratulations! Your local voice assistant service is running.
Quick Verification
- Open
http://<jetson-ip>:8621/health in your browser — all fields should show true
- Test voice synthesis with the
curl command above
- Connect your application to the API endpoints
API Reference
| Endpoint | Method | Description |
|---|
/health | GET | Service health check |
/asr/stream | WebSocket | Real-time streaming speech recognition |
/tts | POST | Text-to-speech (returns WAV) |
/tts/stream | POST | Streaming text-to-speech (returns raw PCM) |
/asr | POST | Offline speech recognition (upload WAV file) |
Next Steps
- Connect your LLM to complete the voice assistant pipeline: ASR → LLM → TTS
- Adjust TTS speaker ID (0-9) from the Devices page after deployment
- Jetson Voice GitHub