← Back to Blog Developer workstation with code editor and meeting notes on ultrawide monitor

Tutorial

Build an Automated Meeting Note-Taker with Python and VoxParse

April 27, 2026 · 8 min read

Meeting notes are one of those tasks that everyone agrees is important and nobody wants to do. Someone scribbles partial notes, misses half the action items, and the recap email goes out two days late with gaps nobody can fill.

In this tutorial, you will build a Python script that takes a meeting recording, transcribes it with speaker labels, and extracts a structured summary with action items, decisions, and owners. The entire thing is under 50 lines of code.

What we are building

By the end of this tutorial, you will have a script that:

Accepts any audio file (MP3, WAV, M4A, FLAC, OGG, WebM)
Transcribes it with speaker diarization (who said what)
Extracts action items, decisions, and owners using custom AI instructions
Outputs clean Markdown meeting notes ready for Slack, Notion, or email

Prerequisites

Python 3.10+ installed
A VoxParse API key (sign up at voxparse.com/dashboard for 5 free hours)
The requests library: pip install requests
A meeting recording in any supported format

No other dependencies. No SDKs, no model downloads, no GPU required.

1 Transcribe with speaker diarization

The core of the note-taker is a single API call. VoxParse returns a diarized transcript and AI analysis in one synchronous response:

import requests
import json
import sys

API_KEY = "YOUR_API_KEY"
API_URL = "https://api.voxparse.com/v1/transcribe"

def transcribe_meeting(audio_path):
    """Transcribe a meeting recording with speaker labels
    and extract action items."""

    with open(audio_path, "rb") as f:
        response = requests.post(
            API_URL,
            headers={"X-API-Key": API_KEY},
            files={"file": f},
            data={
                "custom_instructions": (
                    "This is a business meeting recording. "
                    "Extract: 1) Key decisions made, "
                    "2) Action items with owners and deadlines, "
                    "3) Open questions that need follow-up, "
                    "4) A 3-sentence executive summary."
                )
            }
        )

    response.raise_for_status()
    return response.json()

The custom_instructions parameter tells the AI analysis layer exactly what to extract. You can customize this for different meeting types: standups, sprint reviews, client calls, board meetings.

2 Format the output as Markdown

Take the structured JSON response and format it as clean, readable Markdown:

def format_notes(result):
    """Convert API response to Markdown meeting notes."""

    analysis = result.get("ai_analysis", {})
    duration = result.get("duration_seconds", 0)
    minutes = round(duration / 60)

    lines = []
    lines.append(f"# Meeting Notes")
    lines.append(f"**Duration:** {minutes} minutes")
    lines.append("")

    # Executive summary
    summary = analysis.get("call_summary", "No summary available.")
    lines.append("## Summary")
    lines.append(summary)
    lines.append("")

    # Full transcript with speaker labels
    transcript = analysis.get("transcript_cleaned", "")
    if transcript:
        lines.append("## Transcript")
        lines.append("")
        for line in transcript.split("\n"):
            line = line.strip()
            if line:
                # Bold the speaker labels
                if ":" in line:
                    speaker, text = line.split(":", 1)
                    lines.append(f"**{speaker.strip()}:** {text.strip()}")
                else:
                    lines.append(line)
                lines.append("")

    return "\n".join(lines)

3 Put it all together

The complete script with file output:

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python meeting_notes.py <audio_file>")
        sys.exit(1)

    audio_file = sys.argv[1]
    print(f"Transcribing {audio_file}...")

    result = transcribe_meeting(audio_file)
    notes = format_notes(result)

    # Save to file
    output_file = audio_file.rsplit(".", 1)[0] + "_notes.md"
    with open(output_file, "w") as f:
        f.write(notes)

    print(f"Notes saved to {output_file}")
    print(notes)

Run it:

python meeting_notes.py weekly-standup.mp3

That is it. The entire script is under 50 lines. The API handles the transcription, diarization, and AI analysis. Your code just formats the output.

Customizing for different meeting types

The power of this approach is in the custom_instructions parameter. Change the prompt to get different outputs for different meetings:

Meeting Type	Custom Instructions
Daily standup	"Extract: what each person completed yesterday, what they are working on today, and any blockers mentioned."
Sprint review	"Extract: features demoed, stakeholder feedback, scope changes, and sprint velocity discussed."
Client call	"Extract: client requirements, agreed deliverables with deadlines, budget discussions, and follow-up items."
1-on-1	"Extract: performance feedback given, career development topics, concerns raised, and agreed next steps."
Board meeting	"Extract: motions proposed, votes taken with results, financial figures discussed, and strategic decisions."

Adding translation for multilingual teams

If your team meetings include non-English speakers, add the translate parameter to get both the original transcript and a translation in the same response:

response = requests.post(
    API_URL,
    headers={"X-API-Key": API_KEY},
    files={"file": f},
    data={
        "translate": "es",  # Spanish translation
        "custom_instructions": "Extract action items and decisions."
    }
)

The response includes both the English transcript and the Spanish translation. No separate API call, no extra cost. See our translation guide for all 50+ supported languages.

Adding PII redaction

If your meetings discuss sensitive information (client names, contract values, personal details), add redact_pii=true to strip PII from the output before sharing:

data={
    "redact_pii": "true",
    "custom_instructions": "Extract action items and decisions."
}

Names become [REDACTED_NAME], phone numbers become [REDACTED_PHONE], and so on. The meeting structure and action items are preserved, but identifying details are removed. Read more about how PII redaction works.

Cost breakdown

Meeting Length	VoxParse Cost	Otter.ai Pro Cost
30 minutes	$0.25	$16.99/mo (subscription)
60 minutes	$0.49	$16.99/mo (subscription)
12 meetings/month	$5.88	$16.99/mo

With VoxParse, you pay only for what you use. No monthly subscription, no per-seat pricing, no feature tiers. A typical team doing 12 one-hour meetings per month pays under $6.

New accounts get 5 free hours. That is enough for approximately 10 one-hour meetings or 20 half-hour standups.

Going further

Once you have the basic note-taker working, consider these extensions:

Slack integration: Post the formatted notes directly to a Slack channel using the Slack API after transcription
Notion integration: Create a new Notion page for each meeting using the Notion API and the Markdown output
Calendar automation: Watch your Google Calendar for completed meetings and automatically process the recordings
Sentiment tracking: Use the sentiment analysis data to track meeting health over time (are your standups getting more frustrating?)
Searchable archive: Store all meeting notes in a database with full-text search for instant recall of any past discussion

Get 5 free hours

That is enough for about 10 meetings. No subscription, no credit card required to start.

Get your API key →

Bottom line

Building a meeting note-taker used to require a real-time speech-to-text engine, a speaker diarization model, an NLP pipeline for summarization, and a lot of infrastructure glue. In 2026, it is a 50-line Python script and a single API call.

The value is not in the code. It is in never losing an action item, never missing a decision, and never spending 30 minutes writing a recap email again.

Check the API documentation for the full parameter reference, or read about verbatim vs. polished transcription modes to choose the right mode for your meetings.