Now in production - API keys available

Voice in.
Intelligence out.

Transcription, diarization, PCI compliance detection, financial extraction, and sentiment analysis - one API call, from $0.15/hr (Lite) | $0.39/hr (Pro).

97+
Languages supported
$0.15
Lite starts at /hr
216×
Faster than realtime
12s
For a 46-min call
Enterprise-grade security
99.9% uptime target
TLS 1.3 encryption
PCI-DSS compliant masking
No audio stored post-processing
Pro + Lite features

One API call.
Full intelligence.

Upload audio, get a structured JSON response. Lite ($0.15/hr) includes diarization, cleanup, and timestamps. Pro ($0.39/hr) adds AI analysis, compliance, sentiment, and more.

AI Diarization

Automatically identifies Agent vs. Customer and labels every line. Handles cross-talk, hold music, and accented speech.

Pro + Lite

Word-Level Timestamps

Every word gets a precise start/end timestamp. Power keyword search, compliance audits, and QA review with millisecond accuracy.

Pro + Lite

Structured JSON Output

Every response is a structured JSON object - call summary, type, outcome, customer info, agent info, agreements, action items, and key issues.

Pro + Lite

PCI Compliance

Detects credit card numbers, CVVs, and expiration dates spoken during calls. Flags sensitive data for automatic redaction or audit.

Pro only

Financial Extraction

Pulls payment amounts, recurring charges, pending balances, card types, and billing dates directly from conversation context.

Pro only

Sentiment Analysis

Customer sentiment (positive/negative/neutral) and agent performance scoring on every call. Track quality at scale.

Pro only

AI Summary

Automatic call summary, type classification (sale, support, billing), and outcome detection on every transcription. Skip the full read.

Pro only

Action Items

Extracts key issues raised, agreements made, and follow-up actions from every call. Feed directly into your CRM or ticketing system.

Pro only

PII Redaction

Automatically masks Social Security numbers, credit cards, phone numbers, and addresses. HIPAA and PCI-ready output by default.

Pro only
Modular & Optional

Need more? Enable per request.

All optional features are included at no extra cost. Enable any combination with a simple boolean flag.

Topic Detection

Automatically categorize every call by topic — billing, cancellation, technical support, product inquiry. Build trend dashboards at scale.

optional

Entity Detection

Extract named entities — people, organizations, products, and locations — for CRM auto-population and compliance cross-referencing.

optional

Content Moderation

Detect profanity, hostility, hate speech. Identify the speaker and severity level. Essential for HR compliance and agent coaching.

optional

Auto Chapters

Segment long calls into titled chapters with summaries and timestamps. Reviewers jump directly to the section they need.

optional

Translation

Translate the full transcript to 50+ languages. Original preserved, translation added as a separate field. One API call, both languages.

optional

Custom Vocabulary

Pass brand names, industry jargon, and acronyms to improve ASR accuracy. No training required — just a comma-separated list per request.

optional
How it works

Three steps.
No infrastructure required.

Upload your audio

Send any audio file - WAV, MP3, OGG, FLAC. We auto-detect language. Files up to 4 hours, up to 95 MB. Stereo or mono.

Get everything back - instantly

In one synchronous response: diarized transcript, word timestamps, call summary, financial data, compliance flags, sentiment, action items. A 46-minute call returns in ~12 seconds.

Build on top

Pipe the structured JSON into your CRM, dashboard, coaching tool, or compliance system. Every field is machine-readable and ready to store.

Developer experience

One API call. Full intelligence.

Standard REST. Upload a file, get structured JSON. Synchronous by default, with optional async mode and HMAC-signed webhooks. Works with cURL, Python, Node, anything.

python
import requests

# Pro: full AI analysis ($0.39/hr)
response = requests.post(
    "https://api.voxparse.com/v1/transcribe",
    headers={"X-API-Key": "vxp_..."},
    files={"file": open("call_recording.mp3", "rb")},
    data={"plan": "pro"},  # or "lite" for $0.15/hr
)

data = response.json()
ai = data["ai_analysis"]

print(f"Summary: {ai['call_summary']}")
print(f"Customer: {ai['customer']['name']}")
print(f"Sentiment: {ai['sentiment']['customer_sentiment']}")
print(f"PCI flags: {ai['compliance']['sensitive_data_shared']}")
print(f"Payment today: {ai['financial']['payment_today']}")
Real output

Here's what you get back.

Actual response from a 46-minute customer service call. Processed in 12 seconds.

json - ai_analysis (excerpt)
{
  "call_summary": "Customer called about a billing discrepancy on March invoice. Agent issued a $75 credit and adjusted recurring rate to $149.99/mo.",
  "call_type": "billing",
  "call_outcome": "resolved",
  "customer": { "name": "James Rivera", "company": "Greenfield Dental Group", "email": "[email protected]" },
  "financial": {
    "credit_issued": "$75.00",
    "recurring_amount": "$149.99",
    "pending_balance": "$0.00",
    "payment_method": "Visa ending in 8831"
  },
  "compliance": {
    "recording_disclosure": true,
    "sensitive_data_shared": ["credit card", "mailing address"]
  },
  "sentiment": { "customer_sentiment": "neutral", "agent_performance": "excellent" }
}
Head-to-head

VoxParse vs AssemblyAI

Same 46-minute customer service call. Same day. Real results.

VoxParse Winner
Processing time 12.1 seconds
Total cost (all features) from $0.15/hr
Output format Structured JSON
Speaker diarization ✓ AI-powered
Name accuracy ✓ "Jesús" (accent preserved)
Email correction ✓ Auto-fixed
PCI masking ✓ Included
Sentiment analysis ✓ Included
Financial extraction ✓ Included
Custom instructions ✓ Included
AssemblyAI Universal-3 Pro
Processing time ~30 seconds
Total cost (all features) $0.51+/hr
Output format Raw text
Speaker diarization ✓ Built-in
Name accuracy ⚠ "Jus" (truncated)
Email correction ✗ Not available
PCI masking $ PII Redaction add-on
Sentiment analysis $ +$0.02/hr
Financial extraction ✗ Not available
Custom instructions $ LeMUR (token cost)
Processing Speed (46-min call)
VoxParse
12s
AssemblyAI
~30s
Cost per 1,000 Audio Hours (all features)
VoxParse
$390
AssemblyAI
$510+

Benchmark conducted April 2026 on a 46-minute English-language customer service recording. Both providers tested with the same audio file within the same hour.

Simple pricing

Two plans. Choose per call.
No subscriptions.

Pro $0.39/hr for full AI analysis, or Lite $0.15/hr for diarization + transcript cleanup. Select the right plan for every call.

$0.15 / hr Lite | Pro $0.39/hr
Included in both Pro + Lite
  • 97+ languages, 216x real-time speed
  • AI speaker diarization (Agent / Customer)
  • Word-level timestamps on every word
  • Transcript cleanup (verbatim or polished)
  • Structured JSON output - ready for your CRM
Pro only ($0.39/hr)
  • PCI compliance detection (cards, CVV, expiry)
  • Financial extraction (payments, balances, billing)
  • Sentiment analysis & agent performance scoring
  • Call summary, type, outcome classification
  • Action items & agreements extraction
  • Custom AI instructions (up to 2,000 chars)

No subscriptions. No feature gates. Per-call plan selection — choose Pro or Lite per request.

Estimate your cost
hrs/mo
Pro
$0.39
/ hr — full analysis
$39.00
/ month
Lite
$0.15
/ hr — diarization only
$15.00
/ month

Per-call plan selection. Mix Pro and Lite as needed.

Start Free — 5 Lite Hours on Us

How we compare

Provider Base Price All Features AI Analysis PCI Compliance Custom Instructions Speed (46-min call)
VoxParse Pro $0.39/hr $0.39/hr Included Included Included ~12 seconds
VoxParse Lite $0.15/hr $0.15/hr Not included Not included Not included ~8 seconds
AssemblyAI $0.15/hr
+$0.02 diarization
$0.51+/hr* +$0.28/hr add-ons Extra (PII Redaction) LeMUR (token cost) ~30 seconds
Deepgram $0.46/hr $0.60+/hr Extra cost Not available Not available ~15 seconds
Google Cloud STT $0.96/hr $0.96/hr Not available Not available Not available ~60 seconds
AWS Transcribe $1.44/hr $1.60+/hr Extra cost Extra cost Not available ~120 seconds
FAQ

Common questions

How much does VoxParse cost?

Two tiers: Pro $0.39/hr (full 11-feature AI analysis including diarization, compliance, sentiment, financial extraction) and Lite $0.15/hr (diarization + transcript cleanup). Choose per API call. No subscriptions or minimums.

How does VoxParse compare to AssemblyAI?

VoxParse provides transcription, diarization, PCI compliance, financial extraction, sentiment analysis, and call classification in a single API call for $0.39/hr (Pro) or $0.15/hr (Lite). AssemblyAI charges separately for each feature, totaling $0.51+/hr for comparable functionality. VoxParse also returns structured JSON with labeled fields instead of raw text.

What languages does VoxParse support?

97+ languages with automatic language detection. Upload audio in any supported language and the API detects and transcribes it automatically.

Is VoxParse synchronous or asynchronous?

Synchronous by default — upload an audio file and receive the complete response in seconds. For long-running jobs or background processing, enable async mode with async=true to get an immediate 202 Accepted response and receive results via HMAC-signed webhooks. See the Webhooks documentation for setup details.

Does VoxParse handle PCI compliance?

Yes. VoxParse automatically detects credit card numbers, CVVs, and expiration dates in call recordings and masks them in the response. This helps businesses meet PCI-DSS compliance requirements for recorded customer calls.

What format does VoxParse return?

Structured JSON with labeled fields: call summary, call type, outcome, customer info, agent info, financial data, compliance flags, sentiment scores, key issues, action items, and a cleaned transcript with speaker labels.

How secure is my audio data?

Audio files are encrypted in transit via TLS 1.3 and processed in isolated containers. Files are automatically deleted after processing completes. VoxParse never stores your audio permanently and does not use customer data for model training. All infrastructure runs on a globally distributed edge network with enterprise-grade security controls.

What happens if my balance runs out?

VoxParse checks your balance before processing each request. If your balance is insufficient, the API returns a 402 Payment Required error before any processing begins. You are never charged for partial work. You can enable auto-recharge in the dashboard to automatically top up when your balance drops below a threshold you set.

Do you offer an SLA or uptime guarantee?

VoxParse targets 99.9% API uptime. The API runs on globally distributed edge infrastructure with automatic failover across 300+ data centers.

Do you offer per-call plan selection?

Yes. Pro $0.39/hr (full 11-feature analysis) and Lite $0.15/hr (diarization + transcript cleanup). Select the plan per API call via the plan parameter. Enterprise plans available on request.

Start processing audio today.

Sign up and get 5 free Lite hours instantly. No credit card, no commitments. Get your API key in under a minute.

Start Free — 5 Lite Hours Included