vahnxu
@vahnxuDoubao Asr
Transcribe audio via Doubao Seed-ASR 2.0 (豆包录音文件识别模型2.0) API from ByteDance/Volcengine. Best-in-class Chinese speech recognition. 调用字节跳动火山引擎「豆包录音文件识别模型2.0」转写...
Installation
clawhub install doubao-asr
Requires npm i -g clawhub
35
Downloads
0
Stars
0
current installs
0 all-time
11
Versions
Doubao ASR / 豆包语音转写
Transcribe audio files via ByteDance Volcengine's Seed-ASR 2.0 Standard (豆包录音文件识别模型2.0-标准版) API. Best-in-class accuracy for Chinese (Mandarin, Cantonese, Sichuan dialect, etc.) and supports 13+ languages.
调用字节跳动火山引擎豆包录音文件识别模型2.0-标准版(Seed-ASR 2.0 Standard)转写音频文件。中文识别(普通话、粤语、四川话等方言)准确率业界领先,支持 13+ 种语言。
Sending audio to OpenClaw
Currently, audio files can be sent to OpenClaw via Discord or WhatsApp. Send the audio file in a chat message and ask the bot to transcribe it.
目前可通过 Discord 或 WhatsApp 向 OpenClaw 发送音频文件,发送后让 bot 转写即可。
Note: Direct voice recording in the OpenClaw web UI is not yet supported. Use a messaging app to send pre-recorded audio files.
提示:OpenClaw 网页端暂不支持直接录音,请通过即时通讯应用发送预录制的音频文件。
Quick start
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a
Defaults:
- Model: Seed-ASR 2.0 Standard / 豆包录音文件识别模型2.0-标准版
- Speaker diarization: enabled / 说话人分离:默认开启
- Output: stdout (transcript text with speaker labels / 带说话人标签的转写文本)
Useful flags
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --out /tmp/transcript.txt
python3 {baseDir}/scripts/transcribe.py /path/to/audio.mp3 --format mp3
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --json --out /tmp/result.json
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --no-speakers # disable speaker diarization / 关闭说话人分离
python3 {baseDir}/scripts/transcribe.py https://example.com/audio.mp3 # direct URL (skip upload)
How it works
The Doubao API accepts audio via URL (not direct file upload). The script:
- Uploads audio to Volcengine TOS (object storage) via presigned URL — audio stays within Volcengine infrastructure, no third-party services involved
- Submits transcription task to Seed-ASR 2.0
- Polls until complete (typically 1-3 minutes for a 10-min audio)
- Returns transcript text
Privacy: By default, audio is uploaded to your own Volcengine TOS bucket via presigned URL. No data is sent to third-party services.
Custom upload endpoint
If you prefer to use a different storage service (e.g. Aliyun OSS, AWS S3, your own server), set DOUBAO_ASR_UPLOAD_URL to your upload endpoint. The script will POST the file as multipart form data and expect a JSON response with a url field.
You can also pass a direct audio URL as the argument to skip upload entirely:
python3 {baseDir}/scripts/transcribe.py https://your-bucket.tos.volces.com/audio.m4a
Dependencies
- Python 3.9+
requests:pip install requests
Credentials
Step 1: Doubao ASR API Key / 第一步:豆包 ASR API Key
Get your API key from the Volcengine Speech console:
从火山引擎语音控制台获取 API Key:
- Open https://console.volcengine.com/speech/app
- Find "豆包录音文件识别模型2.0" and create an API key
- Copy the API key (UUID format, e.g.
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
export VOLCENGINE_API_KEY="your_api_key"
Step 2: Volcengine TOS Bucket / 第二步:火山引擎 TOS 存储桶
The Doubao API requires audio to be accessible via URL. TOS provides secure, private temporary upload within Volcengine.
豆包 API 要求音频通过 URL 访问。TOS 对象存储提供安全的临时上传,数据留在火山引擎内部。
Create a TOS bucket / 创建 TOS 存储桶:
- Open https://console.volcengine.com/tos
- Create a bucket, choose the right region (see below) / 选择正确的区域(见下方)
Region selection / 区域选择:
| Server location / 服务器位置 | Recommended TOS region / 推荐 TOS 区域 | Region code |
|---|---|---|
| China mainland / 中国内地 | cn-beijing, cn-shanghai, cn-guangzhou | cn-beijing |
| Hong Kong / 香港 | cn-hongkong | cn-hongkong |
| Southeast Asia / 东南亚 | ap-southeast-1 (Singapore) | ap-southeast-1 |
| US, Europe, other overseas / 美国、欧洲等海外 | cn-hongkong (recommended) | cn-hongkong |
Important: If your server is outside China mainland, do NOT use
cn-beijing/cn-shanghai— cross-border upload will be extremely slow (~15KB/s). Usecn-hongkonginstead.重要:如果你的服务器在中国大陆以外,不要用
cn-beijing/cn-shanghai——跨境上传会非常慢(约 15KB/s)。请使用cn-hongkong。
Step 3: IAM Access Key / 第三步:IAM 访问密钥
Get your IAM access key for TOS upload:
获取 TOS 上传所需的 IAM 访问密钥:
- Open https://console.volcengine.com/iam/keymanage/
- Create an Access Key (or use an existing one)
- If using a sub-user (IAM user), make sure it has TOSFullAccess permission
如果使用子用户(IAM 用户),请确保已授权 TOSFullAccess 权限。
export VOLCENGINE_ACCESS_KEY_ID="your_ak"
export VOLCENGINE_SECRET_ACCESS_KEY="your_sk"
export VOLCENGINE_TOS_BUCKET="your_bucket_name"
export VOLCENGINE_TOS_REGION="cn-hongkong" # see region table above / 见上方区域表
Summary of all environment variables / 环境变量汇总
| Variable | Required | Description |
|---|---|---|
VOLCENGINE_API_KEY |
Yes | ASR API key (UUID format) from Speech console / 语音控制台的 API Key |
VOLCENGINE_ACCESS_KEY_ID |
Yes | IAM Access Key ID (starts with AKLT) / IAM 访问密钥 ID |
VOLCENGINE_SECRET_ACCESS_KEY |
Yes | IAM Secret Access Key / IAM 访问密钥 |
VOLCENGINE_TOS_BUCKET |
Yes | TOS bucket name / TOS 存储桶名称 |
VOLCENGINE_TOS_REGION |
No | TOS region (default: cn-beijing) / TOS 区域 |
DOUBAO_ASR_UPLOAD_URL |
No | Custom upload endpoint (skip TOS) / 自定义上传地址(跳过 TOS) |
Alternative: Custom upload endpoint / 替代方案:自定义上传地址
Skip TOS setup entirely by providing your own upload endpoint:
无需配置 TOS,使用自己的上传服务:
export DOUBAO_ASR_UPLOAD_URL="https://your-server.com/upload"
Supported formats
WAV, MP3, MP4, M4A, OGG, FLAC — up to 5 hours, 512MB max.
支持格式:WAV、MP3、MP4、M4A、OGG、FLAC——最长 5 小时,最大 512MB。
Statistics
Author
vahnxu
@vahnxu
Latest Changes
v0.6.0 · Feb 25, 2026
默认开启说话人分离(Speaker X:标签输出);标题改中英双语;删除错误的Telegram描述;修正SDK→presigned URL描述
Quick Install
clawhub install doubao-asr Related Skills
Other popular skills you might find useful.
Chat with 100+ AI Models in one App.
Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.