Edge VLM: Qwen2.5-VL on RK3576 - 1

Edge VLM: Qwen2.5-VL on RK3576

Run Qwen2.5-VL vision-language model locally on reComputer RK3576. Understands images and text with OpenAI-compatible API.

Beginner15minAI
rk3576vlmqwenmultimodaledge-ainpu

What It Does

Turn your reComputer RK3576 into a multimodal AI that understands both images and text. Qwen2.5-VL can describe photos, answer questions about images, and process visual information — all running locally on your device.

Core Value

  • See and understand — send an image and ask questions about it in natural language
  • Privacy first — all visual processing stays on your device, nothing leaves your network
  • Standard API — OpenAI-compatible vision API works with existing tools and libraries
  • NPU accelerated — Rockchip NPU handles multimodal inference on low-power hardware

Use Cases

ScenarioDescription
Image captioningAutomatically describe photos for accessibility or cataloging
Visual Q&AAsk questions about surveillance footage or product images
Document understandingExtract information from scanned documents or forms
Scene descriptionGenerate text descriptions of camera feeds for logging

Good to Know

  • Requires 8GB+ device memory for reliable operation
  • Supports image input via URL or Base64-encoded data
  • First startup takes 60-120 seconds for model loading
  • Interactive API documentation available at /docs endpoint

Integration Interfaces

http

OpenAI-compatible vision chat API (accepts text + image input)

/v1/chat/completions · Port: 8002 · Method: POST
{"model":"rkllm-vision","messages":[{"role":"user","content":[{"type":"text","text":"Describe this image"},{"type":"image_url","image_url":{"url":"https://example.com/photo.jpg"}}]}],"stream":false}
http

List available models

/v1/models · Port: 8002 · Method: GET
http

Interactive API documentation (Swagger UI)

/docs · Port: 8002 · Method: GET

Usage Requirements

network

Network connection for Docker image pull

Deployment Options

Download & Install

Preset: RK3576 Vision-Language Model {#rk3576_vlm}

Deploy Qwen2.5-VL vision-language model to your reComputer RK3576 with one click.

DevicePurpose
reComputer RK3576Runs Qwen2.5-VL with NPU acceleration

What you'll get:

  • Multimodal AI that understands both images and text
  • OpenAI-compatible vision API running locally
  • Image captioning, visual Q&A, and more — all on-device
  • Interactive API documentation at /docs

Requirements: RK3576 device (8GB+ RAM) with SSH access + Docker installed

Step 1: Deploy Qwen2.5-VL {#deploy_vlm type=docker_deploy required=true config=devices/rk3576.yaml}

Deploy the vision-language model container to your RK3576 device.

Target: Remote Deployment {#rk3576_remote type=remote config=devices/rk3576.yaml default=true}

Deploy to your RK3576 over SSH with one click.

Wiring

  1. Connect RK3576 to the same network as your computer
  2. Fill in device IP, SSH username, and password
  3. Click Deploy

Deployment Complete

  1. The VLM container is running on your RK3576
  2. Vision chat API: http://<device-ip>:8002/v1/chat/completions
  3. API docs: http://<device-ip>:8002/docs

Troubleshooting

IssueSolution
SSH connection failedVerify IP address, username, password
NPU not detectedEnsure device is RK3576 with RKNPU kernel module loaded
Out of memoryVLM requires 8GB+ RAM. Close other services to free memory
Image pull slowCheck network connection. Image is about 3GB

Step 2: Try Vision Chat {#verify_vlm type=image_text_chat}

Test the VLM by sending an image or text.

Mode: Image Understanding {#vision_mode config=devices/vlm_chat.yaml default=true}

Upload an image and ask a question about it.

Troubleshooting

IssueSolution
Connection refusedWait 60-120 seconds for model to load
TimeoutVLM model is large, initial load takes time

Mode: Text Chat {#text_mode config=devices/vlm_text.yaml}

Chat with the model using text only.

Troubleshooting

IssueSolution
Empty responseCheck container logs: docker logs ai_lab_vlm

Deployment Complete

Qwen2.5-VL is running on your RK3576 device.

Text Chat Example

curl -X POST http://<device-ip>:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "rkllm-vision", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 256}'

Image Understanding Example

curl -X POST http://<device-ip>:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rkllm-vision",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }],
    "max_tokens": 256
  }'

Python Example

import openai
client = openai.OpenAI(base_url="http://<device-ip>:8002/v1", api_key="dummy")
response = client.chat.completions.create(
    model="rkllm-vision",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]
    }],
    max_tokens=256
)
print(response.choices[0].message.content)
Contact Us
We Are Glad to Be Your Hardware Partner !
Have you used our products before?
Edge VLM: Qwen2.5-VL on RK3576