Back to Blog

How I connected a Telegram bot to my backend for real-time prosody analysis

2026-02-225 min read

I've been building Acento, a pronunciation coaching tool that analyzes how you speak English. It started as a web app, but the most natural way to practice speaking is to just send a voice message. So I connected the backend to a Telegram bot. You record, it analyzes, you get feedback. No app to install, no UI to navigate.

Here's how the pieces fit together, from the webhook and API to the audio pipeline and the authorization layer that ties Telegram users to real accounts.

The API, FastAPI on Cloud Run

The backend is a FastAPI application deployed to Google Cloud Run. It exposes a few key endpoints, but two matter most.

  • POST /api/analyze, the main analysis endpoint used by the web app
  • POST /api/telegram/webhook, where Telegram sends every message the bot receives

Cloud Run handles scaling automatically. When nobody's sending voice messages at 3 AM, it scales to zero. When a dozen people are practicing during lunch, it spins up more instances. The backend is stateless, so that works fine.

The Telegram bot itself is registered through BotFather, and I set the webhook URL to point at my Cloud Run service. From that point on, every message, voice note, or button press that happens in the bot chat gets forwarded to my endpoint as a JSON payload.

Receiving and routing Telegram updates

When Telegram hits the webhook endpoint, the first thing the backend does is verify the request. Telegram sends a secret token in the X-Telegram-Bot-Api-Secret-Token header, and I compare it using a constant-time check to prevent timing attacks. If the secret doesn't match, the request gets rejected immediately.

After verification, the handler inspects what kind of update it is.

  • Text commands like /start, /help, /practice, or /setkey are routed to their respective handlers
  • Voice messages trigger the full analysis pipeline
  • Callback queries handle inline button presses, like when a user taps "Details" to expand their results

The webhook always returns 200 OK quickly. The actual processing happens in the background so Telegram doesn't time out waiting for a response.

From voice message to prosody scores

When a user sends a voice message, the backend goes through several steps.

  1. Downloads the audio from Telegram's servers using the file ID from the webhook payload
  2. Converts it from OGG/Opus (Telegram's default) to WAV at 16kHz mono using pydub and ffmpeg
  3. Runs prosody analysis with Parselmouth (a Python wrapper around Praat) across five dimensions, including pitch range in semitones, volume dynamics, speaking tempo, rhythm patterns (nPVI), and pause distribution
  4. Generates AI coaching by sending the audio to Google Gemini for transcription and pronunciation tips
  5. Formats and sends a compact response back through the Telegram Bot API

Each dimension gets a score from 1 to 10, and they're combined into a weighted overall score with pitch at 25%, volume and tempo at 20% each, rhythm at 20%, and pauses at 15%. The user gets a quick summary with their top issue and one actionable tip, plus a "Details" button they can tap to see the full breakdown.

Two paths, one auth system

The backend supports two authentication methods.

Firebase ID tokens are what the web app uses. The frontend sends a Bearer token in the Authorization header, the backend verifies it with Firebase Admin SDK, and the user gets full access to their history, progress, and AI coaching.

API keys are what the Telegram bot uses. When a user creates an API key through the web app, the backend generates a random key, shows it once, and stores only its SHA-256 hash in Firestore. The key is sent in an X-Acento-Key header. Each key has a configurable daily limit to prevent abuse.

But Telegram users don't send HTTP headers. They just send voice messages. So I needed a way to link a Telegram chat to an Acento account.

Linking Telegram to your account

The /setkey command bridges the gap. A user types /setkey acento_xxxxx in the bot chat. The backend immediately deletes that message (so the key doesn't sit in chat history), hashes the key, looks it up in the api_keys collection, and if it matches, creates a telegram_links document that maps their Telegram chat ID to their Acento user ID and API key.

From then on, every voice message from that chat automatically gets associated with their account. The backend checks their API key quota, saves analysis sessions to Firestore, and tracks their practice streak.

Users who haven't linked an account can still use the bot. They get prosody analysis (no AI coaching) with a cap of 10 analyses per day, tracked by chat ID in a separate telegram_anon_usage collection.

Daily reminders and streaks

Once a user is linked, they can opt into daily practice reminders. A Cloud Scheduler job hits POST /api/telegram/send-reminders every hour with a secret header. The backend checks which users have reminders enabled for that hour and sends them a message with a practice sentence.

The reminder message adapts to context. If you have an active streak and missed yesterday, you get a "don't break your streak" nudge. If you've been inactive for three or more days, it's a gentler re-engagement prompt. The streak itself increments when you practice on consecutive days and resets if you skip one.

Putting it all together

The full flow looks like this.

Telegram Bot Backend Architecture Flow

No app store, no sign-up form, no loading screens. Just open Telegram, record your voice, and get feedback in seconds. The architecture keeps things simple. One webhook endpoint, one analysis pipeline, and a lightweight linking mechanism that bridges Telegram's world with the rest of the system.

If you want to try it yourself, head over to accent.learnenglishsounds.com and record something. Or just open the bot on Telegram and send a voice message. It's free to start, no account required.

Stay Updated

Get the latest posts and insights delivered to your inbox.

Unsubscribe anytime. No spam, ever.