vahnxu

Doubao Asr

Transcribe audio via Doubao Seed-ASR 2.0 (豆包录音文件识别模型2.0) API from ByteDance/Volcengine. Best-in-class Chinese speech recognition. 调用字节跳动火山引擎「豆包录音文件识别模型2.0」转写...

v0.6.0 Recently Updated Updated Today

Installation

clawhub install doubao-asr

Requires npm i -g clawhub

View on ClawHub Download .zip

35

Downloads

0

Stars

0

current installs

0 all-time

11

Versions

Doubao ASR / 豆包语音转写

Transcribe audio files via ByteDance Volcengine's Seed-ASR 2.0 Standard (豆包录音文件识别模型2.0-标准版) API. Best-in-class accuracy for Chinese (Mandarin, Cantonese, Sichuan dialect, etc.) and supports 13+ languages.

调用字节跳动火山引擎豆包录音文件识别模型2.0-标准版（Seed-ASR 2.0 Standard）转写音频文件。中文识别（普通话、粤语、四川话等方言）准确率业界领先，支持 13+ 种语言。

Sending audio to OpenClaw

Currently, audio files can be sent to OpenClaw via Discord or WhatsApp. Send the audio file in a chat message and ask the bot to transcribe it.

目前可通过 Discord 或 WhatsApp 向 OpenClaw 发送音频文件，发送后让 bot 转写即可。

Note: Direct voice recording in the OpenClaw web UI is not yet supported. Use a messaging app to send pre-recorded audio files.

提示：OpenClaw 网页端暂不支持直接录音，请通过即时通讯应用发送预录制的音频文件。

Quick start

python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a

Defaults:

Model: Seed-ASR 2.0 Standard / 豆包录音文件识别模型2.0-标准版
Speaker diarization: enabled / 说话人分离：默认开启
Output: stdout (transcript text with speaker labels / 带说话人标签的转写文本)

Useful flags

              python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --out /tmp/transcript.txt
python3 {baseDir}/scripts/transcribe.py /path/to/audio.mp3 --format mp3
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --json --out /tmp/result.json
python3 {baseDir}/scripts/transcribe.py /path/to/audio.m4a --no-speakers  # disable speaker diarization / 关闭说话人分离
python3 {baseDir}/scripts/transcribe.py https://example.com/audio.mp3  # direct URL (skip upload)
            

How it works

The Doubao API accepts audio via URL (not direct file upload). The script:

Uploads audio to Volcengine TOS (object storage) via presigned URL — audio stays within Volcengine infrastructure, no third-party services involved
Submits transcription task to Seed-ASR 2.0
Polls until complete (typically 1-3 minutes for a 10-min audio)
Returns transcript text

Privacy: By default, audio is uploaded to your own Volcengine TOS bucket via presigned URL. No data is sent to third-party services.

Custom upload endpoint

If you prefer to use a different storage service (e.g. Aliyun OSS, AWS S3, your own server), set DOUBAO_ASR_UPLOAD_URL to your upload endpoint. The script will POST the file as multipart form data and expect a JSON response with a url field.

You can also pass a direct audio URL as the argument to skip upload entirely:

              python3 {baseDir}/scripts/transcribe.py https://your-bucket.tos.volces.com/audio.m4a
            

Dependencies

Python 3.9+
requests: pip install requests

Credentials

Step 1: Doubao ASR API Key / 第一步：豆包 ASR API Key

Get your API key from the Volcengine Speech console:

从火山引擎语音控制台获取 API Key：

Open https://console.volcengine.com/speech/app
Find "豆包录音文件识别模型2.0" and create an API key
Copy the API key (UUID format, e.g. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

export VOLCENGINE_API_KEY="your_api_key"

Step 2: Volcengine TOS Bucket / 第二步：火山引擎 TOS 存储桶

The Doubao API requires audio to be accessible via URL. TOS provides secure, private temporary upload within Volcengine.

豆包 API 要求音频通过 URL 访问。TOS 对象存储提供安全的临时上传，数据留在火山引擎内部。

Create a TOS bucket / 创建 TOS 存储桶：

Open https://console.volcengine.com/tos
Create a bucket, choose the right region (see below) / 选择正确的区域（见下方）

Region selection / 区域选择：

Server location / 服务器位置	Recommended TOS region / 推荐 TOS 区域	Region code
China mainland / 中国内地	cn-beijing, cn-shanghai, cn-guangzhou	`cn-beijing`
Hong Kong / 香港	cn-hongkong	`cn-hongkong`
Southeast Asia / 东南亚	ap-southeast-1 (Singapore)	`ap-southeast-1`
US, Europe, other overseas / 美国、欧洲等海外	cn-hongkong (recommended)	`cn-hongkong`

Important: If your server is outside China mainland, do NOT use cn-beijing / cn-shanghai — cross-border upload will be extremely slow (~15KB/s). Use cn-hongkong instead.

重要：如果你的服务器在中国大陆以外，不要用 cn-beijing / cn-shanghai——跨境上传会非常慢（约 15KB/s）。请使用 cn-hongkong。

Step 3: IAM Access Key / 第三步：IAM 访问密钥

Get your IAM access key for TOS upload:

获取 TOS 上传所需的 IAM 访问密钥：

Open https://console.volcengine.com/iam/keymanage/
Create an Access Key (or use an existing one)
If using a sub-user (IAM user), make sure it has TOSFullAccess permission

如果使用子用户（IAM 用户），请确保已授权 TOSFullAccess 权限。

              export VOLCENGINE_ACCESS_KEY_ID="your_ak"
export VOLCENGINE_SECRET_ACCESS_KEY="your_sk"
export VOLCENGINE_TOS_BUCKET="your_bucket_name"
export VOLCENGINE_TOS_REGION="cn-hongkong"  # see region table above / 见上方区域表
            

Summary of all environment variables / 环境变量汇总

Variable	Required	Description
`VOLCENGINE_API_KEY`	Yes	ASR API key (UUID format) from Speech console / 语音控制台的 API Key
`VOLCENGINE_ACCESS_KEY_ID`	Yes	IAM Access Key ID (starts with `AKLT`) / IAM 访问密钥 ID
`VOLCENGINE_SECRET_ACCESS_KEY`	Yes	IAM Secret Access Key / IAM 访问密钥
`VOLCENGINE_TOS_BUCKET`	Yes	TOS bucket name / TOS 存储桶名称
`VOLCENGINE_TOS_REGION`	No	TOS region (default: `cn-beijing`) / TOS 区域
`DOUBAO_ASR_UPLOAD_URL`	No	Custom upload endpoint (skip TOS) / 自定义上传地址（跳过 TOS）

Alternative: Custom upload endpoint / 替代方案：自定义上传地址

Skip TOS setup entirely by providing your own upload endpoint:

无需配置 TOS，使用自己的上传服务：

export DOUBAO_ASR_UPLOAD_URL="https://your-server.com/upload"

Supported formats

WAV, MP3, MP4, M4A, OGG, FLAC — up to 5 hours, 512MB max.

支持格式：WAV、MP3、MP4、M4A、OGG、FLAC——最长 5 小时，最大 512MB。

Statistics

Downloads 35

Stars 0

Current installs 0

All-time installs 0

Versions 11

Comments 0

Created Feb 25, 2026

Updated Feb 25, 2026

Author

vahnxu

@vahnxu

Latest Changes

v0.6.0 · Feb 25, 2026

默认开启说话人分离(Speaker X:标签输出)；标题改中英双语；删除错误的Telegram描述；修正SDK→presigned URL描述

Quick Install

clawhub install doubao-asr

Related Skills

Other popular skills you might find useful.

Sonoscli

Peter Steinberger

Control Sonos speakers (discover/status/play/volume/group).

21.3k 16 v1.0.0

Gog

Peter Steinberger

Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs.

36.6k 286 v1.0.0

Github

Peter Steinberger

Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries.

27.0k 90 v1.0.0

Summarize

Peter Steinberger

Summarize URLs or files with the summarize CLI (web, PDFs, images, audio, YouTube).

28.7k 140 v1.0.0

Browse all skills →

Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Start for free View pricing