Free & Open Source · License

SIP and WebRTC Voice Framework.
Built-in AI support.

VoiceBlender is an open source Go service that bridges SIP and WebRTC voice calls with multi-party audio mixing, a REST API, and real-time webhooks. Plug in your own TTS, STT, and AI agent models.

go run ./cmd/voiceblender

Features

A complete toolkit for voice transformation, built in the open.

SIP Inbound & Outbound

Originate and Receive SIP calls. Multiple codecs supported, including PCMA, PCMU and Opus.

WebRTC

Browser-based voice via SDP offer/answer with trickle ICE. Connect users directly from the browser with no plugins required.

Multi-Party Rooms

Mix multiple participants in a single room. Join via SIP, WebRTC, or WebSocket.

TTS & STT

Built-in support for ElevenLabs, Google Cloud, and AWS Polly for TTS. Real-time STT with partial transcripts.

AI Agent

Attach a conversational AI agent to any leg or room with barge-in support. Supports ElevenLabs, VAPI, and Pipecat out of the box.

REST API & Webhooks

Full REST API for legs, rooms, playback, recording, and more. Real-time event delivery with HMAC-SHA256 signing and retry.

Getting Started

Up and running in minutes.

1

Build & Run

go build, go run, or pull the Docker image. REST API on :8080, SIP on :5060.

2

Configure

Set environment variables for SIP, ICE servers, webhooks, and your TTS/STT/AI provider API keys.

3

Connect

Originate SIP calls, accept inbound calls via webhooks, or connect browsers over WebRTC.

Documentation

Everything you need to get started.

Quick Start

bash
# Build and run
go build -o voiceblender ./cmd/voiceblender
./voiceblender

# Or run directly
go run ./cmd/voiceblender

# REST API on :8080, SIP on 127.0.0.1:5060

Typical Workflow

text
1. Register a webhook        POST /v1/webhooks
2. Receive inbound call      --> webhook: leg.ringing {leg_id, from, to}
3. Answer                    POST /v1/legs/{id}/answer
4. Create a room             POST /v1/rooms
5. Add legs to room          POST /v1/rooms/{id}/legs
6. Attach AI agent           POST /v1/legs/{id}/agent
7. Start recording           POST /v1/legs/{id}/record
8. Hang up                   DELETE /v1/legs/{id}

Legs API

text
POST   /v1/legs                    # Originate outbound SIP call
GET    /v1/legs                    # List all legs
POST   /v1/legs/{id}/answer        # Answer ringing inbound leg
POST   /v1/legs/{id}/early-media   # Enable early media (183)
DELETE /v1/legs/{id}               # Hang up
POST   /v1/legs/{id}/dtmf          # Send DTMF digits
POST   /v1/legs/{id}/play          # Play audio or tone
POST   /v1/legs/{id}/tts           # Text-to-speech
POST   /v1/legs/{id}/record        # Start recording
POST   /v1/legs/{id}/stt           # Start speech-to-text
POST   /v1/legs/{id}/agent         # Attach AI agent

Rooms API

text
POST   /v1/rooms                   # Create room
GET    /v1/rooms                   # List rooms
DELETE /v1/rooms/{id}              # Delete room (hangs up all legs)
POST   /v1/rooms/{id}/legs         # Add leg to room
GET    /v1/rooms/{id}/ws           # Join room via WebSocket
POST   /v1/rooms/{id}/play         # Play audio or tone to room
POST   /v1/rooms/{id}/tts          # TTS to room
POST   /v1/rooms/{id}/record       # Record room mix
POST   /v1/rooms/{id}/agent        # Attach AI agent to room

Configuration

bash
export HTTP_ADDR=:8080              # REST API listen address
export SIP_BIND_IP=127.0.0.1       # IP for SDP/Contact/Via headers
export SIP_PORT=5060                # SIP listen port
export ICE_SERVERS=stun:stun.l.google.com:19302
export RECORDING_DIR=/tmp/recordings
export LOG_LEVEL=info               # debug, info, warn, error
export WEBHOOK_URL=https://example.com/hooks
export ELEVENLABS_API_KEY=sk-...    # TTS, STT, Agent
export VAPI_API_KEY=...             # VAPI Agent provider
export S3_BUCKET=my-recordings      # Optional S3 upload

WebRTC & Webhooks

text
# WebRTC
POST   /v1/webrtc/offer                 # SDP offer/answer exchange
POST   /v1/legs/{id}/ice-candidates     # Add trickle ICE candidate
GET    /v1/legs/{id}/ice-candidates     # Get gathered ICE candidates

# Webhooks
POST   /v1/webhooks             # Register webhook
GET    /v1/webhooks             # List webhooks
DELETE /v1/webhooks/{id}        # Unregister webhook

Performance

Measured end-to-end with real SIP calls using the built-in benchmark suite.

20ms
Audio latency
avg leg-to-leg at 100 rooms
27
Rooms/sec
concurrent room setup throughput
19MB
Heap
at 100 rooms, 200 active calls
64ms
p99 latency
worst-case at full load

Run it yourself

bash
# Run the benchmark (default scales: 5, 10, 25, 50, 100 rooms)
go test -tags integration -v -timeout 300s \
  -run TestConcurrentRoomsScale ./tests/integration/

# Example output at 100 rooms:
# Phase 1 — Setup: 100 rooms in 3.7s (26.9 rooms/sec)
#   call+room setup latency: avg=570ms p50=615ms p95=728ms p99=751ms
#   Goroutines: 1914  |  Heap alloc: 19.0 MB
# Phase 2 — Sustaining 100 rooms for 3s... All 200 calls connected
# Phase 3 — Audio latency: avg=20ms p50=10ms p95=62ms p99=64ms
# Phase 4 — Teardown: 100 rooms in 5.6ms (17782 rooms/sec)

Contribute to Voiceblender

Voiceblender is built by the community. Whether you write code, report bugs, or improve docs, every contribution matters.