Jetson Local Voice Assistant - 1

Jetson Local Voice Assistant

GPU-accelerated local speech service on Jetson — sub-180ms ASR + TTS latency, fully offline, no cloud dependency.

Beginner15minVoice AI
VoiceJetsonasrttslocal

What This Solution Does

Building a voice assistant that responds fast enough for natural conversation is hard — cloud speech APIs add 200-500ms of network latency on every turn. This solution runs speech recognition and voice synthesis directly on your Jetson GPU, cutting total voice processing time to under 180ms.

Core Benefits

BenefitDetails
Ultra-low latencySpeech recognition + voice synthesis completes in under 180ms — fast enough for real-time conversation
Fully local processingAll voice processing runs on-device, no internet required, no data leaves your network
Ready-to-use APIStandard HTTP + WebSocket endpoints — any device on your network can send audio and get results
Real-time streamingGet partial speech recognition results as the user speaks, don't wait for them to finish

Use Cases

ScenarioHow It Works
Voice-controlled robotsRobot microphone captures speech → Jetson transcribes in real-time → your LLM generates a response → Jetson speaks it back
Smart kiosksVisitor speaks a question → instant transcription → connect to your knowledge base → voice reply in under a second
Industrial voice commandsOperator gives hands-free commands → Jetson recognizes speech locally → triggers actions without cloud dependency
Voice IoT gatewayMultiple devices send audio to the Jetson API → centralized speech processing with GPU acceleration

What You Need

Hardware

DevicePurposeRequired
NVIDIA Jetson Orin NX 16GBRuns speech recognition and voice synthesis with GPU accelerationYes

Network

  • Jetson must be reachable via SSH from the deployment computer
  • Internet required during first deployment (downloads ~900MB of AI models)
  • After setup, works fully offline

Integration Interfaces

websocket

Real-time ASR transcription stream

/asr/stream · Port: 8621
{"text":"hello world","is_final":true,"language":"en"}
http_stream

TTS audio synthesis stream

/tts/stream · Port: 8621 · Method: POST

Deployment Options

Download & Install

Preset: Deploy Speech Service {#default}

Deploy a GPU-accelerated speech recognition (ASR) and voice synthesis (TTS) service on your Jetson device.

DevicePurpose
NVIDIA Jetson Orin NX 16GBRuns GPU-accelerated ASR + TTS with dual language mode support

What you'll get:

  • Real-time streaming speech recognition
  • Low-latency voice synthesis (multiple speakers)
  • Two language modes: Chinese + English (Matcha TTS + Paraformer ASR) or English-only (Kokoro TTS + Zipformer ASR)
  • HTTP + WebSocket API on port 8621

Requirements: Jetson with JetPack 6.x · SSH access · Internet to pull image (~8GB)

Step 1: Deploy Speech Service {#speech_service type=docker_deploy required=true config=devices/docker_remote.yaml}

Deploy the speech recognition and voice synthesis service to your Jetson device. The pre-built image includes all dependencies and models — just pull and run.

Target: Remote Deployment {#speech_remote type=remote config=devices/docker_remote.yaml default=true}

Deploy to your Jetson over SSH with one click.

Wiring

  1. Connect your Jetson to the network
  2. Enter the Jetson's IP address and SSH credentials
  3. Click Deploy — the system will pull the pre-built image and start the service automatically

Deployment Complete

Service is running at http://<jetson-ip>:8621. Quick test:

# Check service health
curl http://<jetson-ip>:8621/health
# Expected: {"asr": true, "tts": true, "streaming_asr": true}

# Test TTS
curl -X POST http://<jetson-ip>:8621/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, I am your voice assistant.", "sid": 0}' \
  --output test.wav

# Test ASR
curl -X POST http://<jetson-ip>:8621/asr \
  -F "file=@test.wav"

Troubleshooting

IssueSolution
SSH connection failedVerify the IP address and credentials. Try ssh username@ip from your computer first
Image pull slowThe image is ~8GB compressed. Ensure stable internet on the Jetson
Service not startingCheck logs: ssh user@ip "cd jetson-voice && docker compose logs"
Health check failsFirst startup takes ~40 seconds for model warmup. Wait and retry
Out of memoryEnsure Jetson has 16GB RAM and no other GPU-intensive tasks running

Target: Local Deployment {#speech_local type=local config=devices/docker_local.yaml}

Deploy directly on the current machine (requires NVIDIA GPU).

Wiring

  1. Ensure Docker and NVIDIA Container Toolkit are installed
  2. Click Deploy to start installation

Note: First startup may take 10-15 minutes for Docker image download and model initialization.

Deployment Complete

Service is running at http://localhost:8621. Quick test:

# Check service health
curl http://localhost:8621/health
# Expected: {"asr": true, "tts": true, "streaming_asr": true}

# Test TTS
curl -X POST http://localhost:8621/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, I am your voice assistant.", "sid": 0}' \
  --output test.wav

# Test ASR
curl -X POST http://localhost:8621/asr \
  -F "file=@test.wav"

Troubleshooting

IssueSolution
NVIDIA runtime not foundInstall NVIDIA Container Toolkit: sudo apt install nvidia-container-toolkit && sudo systemctl restart docker
Port 8621 already in useStop existing services on port 8621
Container keeps restartingCheck logs: docker logs jetson-voice-speech-1
Health check failsFirst startup takes ~40 seconds for model warmup. Wait and retry

Step 2: Voice Demo {#voice_demo type=voice_demo required=false config=devices/voice_demo.yaml}

Try the deployed speech service directly from this page. Enter the Jetson IP address, then use the panels below to test speech recognition and voice synthesis.

Speech Recognition (ASR)

Press and hold the Record button to speak. Your speech will be recognized in real-time and the transcribed text will appear on screen.

Text to Speech (TTS)

Type any text and click Generate to hear it spoken. The audio will play with a waveform visualization.

Troubleshooting

IssueSolution
Microphone not workingAllow microphone access when prompted by your browser
ASR shows no resultsVerify the service is running: curl http://<ip>:8621/health
TTS playback silentCheck browser audio is not muted. Try a shorter text first

Deployment Complete

Congratulations! Your local voice assistant service is running.

Quick Verification

  1. Open http://<jetson-ip>:8621/health in your browser — all fields should show true
  2. Test voice synthesis with the curl command above
  3. Connect your application to the API endpoints

API Reference

EndpointMethodDescription
/healthGETService health check
/asr/streamWebSocketReal-time streaming speech recognition
/ttsPOSTText-to-speech (returns WAV)
/tts/streamPOSTStreaming text-to-speech (returns raw PCM)
/asrPOSTOffline speech recognition (upload WAV file)

Next Steps

  • Connect your LLM to complete the voice assistant pipeline: ASR → LLM → TTS
  • Adjust TTS speaker ID (0-9) from the Devices page after deployment
  • Jetson Voice GitHub
Contact Us
We Are Glad to Be Your Hardware Partner !
Have you used our products before?