Unlike many services for video & voice calls, Discord doesn’t have captions for calls. Not only does it suck when you can’t listen in, but it also makes it inaccessible to those who are hard of hearing.
So I built Echo, a Discord bot that provides real-time caption generation for voice calls.
Echo uses discord.js, a JavaScript wrapper for Discord’s API, to fetch audio data from calls. The audio stream is then fed into Vosk, a lightweight, offline transcription model to provide captions both in-chat and in a web UI built with SvelteKit.
Echo is designed to be lightweight and self-hostable, usable on a Raspberry Pi, lowering the barrier to entry and improving privacy.
Note: Echo is a proof-of-concept and is not intended for production use. It is not affiliated with Discord.