Show HN: Trace – Offline Mac meeting transcripts you can flag mid-call
Posted by AG342 3 days ago
I'm the developer of Trace, a non-intrusive, shortcut-driven Mac app that records and transcribes your meetings on-device. I know, another meeting transcription app. Please bear with me though, I'm confident that this is at least a little novel.
I primarily built Trace for myself. I'd been using MacWhisper, but there was enough fiddling before each call that I'd forget to start it and walk out of an hour-long meeting with nothing written down. So the things I cared about most were that it's quick to activate and stays out of the way. You activate Trace by pressing a global shortcut (configurable), which reveals a small bar at the bottom of your screen (there's also a keystroke and/or option to hide it entirely if you'd rather not see it at all).
As I was building it I wanted to bake in a couple of workflows I'd wished for in other transcription apps.
1. Mid-meeting you can press another global shortcut to mark a "key moment" and type a note. The note shows up in the resulting transcript inline at that timestamp. I wanted to add this because I kept catching myself thinking "wait, that bit matters" in meetings and reaching to jot it down in a separate app like Obsidian, which I then needed to add context to, which took me out of the meeting. I use it all the time. If I paste the transcript into an LLM afterwards (which I find myself doing more and more these days) the important moments are flagged so it doesn't gloss over them. This is more noticeable in longer meetings with lots of topics. 2. With another keyboard shortcut you can summon a rough live recap (subtitles, basically) to quickly recap what's just been said.
Trace uses standard macOS microphone and system recording APIs to capture both sides of the conversation as two separate tracks and then runs the system side through on-device diarization to identify speakers. Right now we only label them as "Speaker 1", "Speaker 2", etc but there are plans for speaker labelling in the future. You can also show a "live recap" as the call is happening to review what someone just said.
All transcription models run on your machine. To be clear though, Trace doesn't do any of the summarising itself, it just produces a markdown transcript, so if you want summaries then you need to pass the output to an AI.
The app is sandboxed and your audio/transcripts are never uploaded anywhere - they just exist as audio files and markdown on disk. The only network call Trace is required to make is on the first run to download the speech and speaker models (around 500MB) from Hugging Face, and after that it can be used fully offline. If enabled, a Google Calendar integration can auto-name sessions but that needs a network connection.
The app is £9.99 on the macOS App Store. I've been using it every day for months now and I'm super happy with how it's improved my workflow. Feedback very welcome.
Comments
Comment by blopker 2 days ago
I'm making my own, for personal use. I did a survey of many and they all (that I could find) skip the fundamentals.
The major issues that I've run into:
- Crash recovery. Most of these apps are incredibly buggy and crash all the time, taking the recorded audio with them. Macwhisper is incredibly bad at this.
- Disk space. Many of these apps save wav files to disk. After a few hours of meetings, you may end up with gigabytes eaten.
- Microphone bleed. People don't always use headphones, the system mic will pick up the speaker sounds, causing duplicate (approximately) transcriptions.
I've yet to find a solution that handles all these correctly, let alone having high quality transcriptions.
Anyway, most of these apps are built around https://github.com/FluidInference/FluidAudio, if anyone is curious. Their readme has a big list of similar apps as well.
Comment by AG342 2 days ago
I think I've got the other two bits covered. I pushed an update yesterday that adds active echo cancellation so that audio playing through the speakers (or leaky headphones) won't get transcribed twice if it is picked up by the microphone. It can be disabled in preferences, but it's on by default.
The disk space issue is one that I considered as well. By default, Trace deletes the actual audio recordings as soon as transcription is successfully completed, so the idea is you keep just the markdown transcript rather than the gigabytes of raw audio. If you want, there's a preference to disable the auto-deletion. There's a bit more on the support page here https://traceapp.info/support (search for "Auto-deletion of audio").
FluidAudio is a big part of this and is actually used in two places during a session. It runs the Parakeet EOU model for the instant recap (which isn't hugely accurate, but it's good enough for the job) and after the call it's also used to transcribe the recording, depending on which engine you've selected (Trace offers a fast and an accurate one). If the fast engine is selected, we use FluidAudio with the Parakeet-TDT 0.6b v3 model for transcription, which then goes through Pyannote and WeSpeaker for diarization. If the accurate engine is selected, we use WhisperKit with the Whisper large-v3-turbo model for transcription, and SpeakerKit for diarization.
Comment by kstenerud 2 days ago
- Journaling file structures (telegraph what you're about to write, then write it, then signal completion)
- memmap your important data structures to a file (they will be flushed to disk no matter how your app dies - short of a power loss)
- post-crash dump (put last-minute writers in a crash handler to save it to disk)
A journaling file structure is the most secure, because it's designed with the assumption that writing will eventually fail. memmapped structs are easy and cheap, and get you 99% of the way there (only power loss will lose your data). Crash-time writing is doable with a crash handler like KSCrash, but there are many ways an app can crash without triggering a crash handler (thermal kill, exceeding quota, memory jetsam, etc). You also need to write your data in a signal-safe manner.
Comment by jamesbagley849 1 day ago
Comment by scosman 2 days ago
- crash recovery: part one is use ADTS aac (even if process crashes, audio is saved up until it does). Part two is isolating the transcription/summaries in separate XPC services.
- disk space: AAC 64kbps mono soles it. Could use Opus for further reduction but both are small.
- speaker bleed: macOS voice isolation processing solves this. It’s a nightmare to get setup, but works great once done.
- library: using argmax SDK - by a bunch of ex-Apple on device AI folks.
It it wasn’t for CoreAudio, I’d say it was easy to make. Argmax, Whisper, and llama.cpp - wrapped in the right architecture, mostly just work.
I’m having fun nerding out on the details like custom vocabulary (get the names of the people in here meeting right), inferring speaker names from transcript, calendar integration, nice UI, etc.
Comment by jv22222 2 days ago
Comment by victorbjorklund 1 day ago
Comment by highmastdon 2 days ago
Comment by Folcon 2 days ago
Wait really? I honestly would have thought this was a solved problem by now, especially high quality transcriptions bit, just out of curiosity, is the problem that the quality isn't high enough?
Comment by blopker 2 days ago
Comment by sofixa 2 days ago
If I had to guess, all of those apps are probably vibecoded, hence the variable quality.
Comment by kexelion 21 hours ago
Comment by scimonk 2 days ago
Due to audio quality, transcription sometimes produces garbled output or understands something wrong. FluidVoice offers the option to use a LLM to „interpret“ the text to rescue garbled audio through context. Do you also plan to support something like this? This would be a great feature!
Comment by ahamez 2 days ago
> Which languages does Trace support? English only, for now. Both transcription models, Fast and Accurate, are built for English audio. A recording in another language will still produce a transcript, but it won’t be accurate: the model maps whatever it hears onto English words, so the result comes out garbled rather than failing outright.
> If transcribing other languages matters to you, get in touch (see Contact below).
Comment by scimonk 2 days ago
Comment by denbyc 2 days ago
Comment by AG342 2 days ago
Comment by addozhang 2 days ago
Comment by tillcarlos 2 days ago
Comment by thenipper 2 days ago
Comment by watchlight 2 days ago
What's your diarization pipeline? Pyannote?
I'd taken a different approach that used a LLM clean-up pass to summarize and progressively compress the transcript for ultra-long content, but I like the idea of targeted "pay attention here" flags.
Comment by tillcarlos 2 days ago
I just purchased it. What's the best way to give you feedback? (Do you want any?)
From the top of my head: - will the mic switch automatically when I am at my office? Or do I have to change settings every time? Maybe a preference of what's available + auto switch would be good. - I personally don't need the hot key. Menu bar icon would be fine. - Download the model is a long process. Put it into the installer, not into the bar on the bottom - Speaker correction would be amazing. If it could "Learn" the speakers based on voice. - Overall neat app. Good animations and UX
**Speaker 1** [00:00] What if I fell to the floor?
**Microphone** [00:02] Yes, this is Phil, I'm just speaking, this should be my voice, and there's music in the
**Speaker 1** [00:05] Couldn't tell this anymoreComment by AG342 2 days ago
For the switching, do you mean if you hot-swap during a call? The mic should auto-switch if you've got System default selected, but feel free to give it a go and report back. If it doesn't do what we expect I can absolutely take a look at changing the behaviour.
Learning speakers is also on the to-do list.
P.S. Great choice in test audio. What a banger.
Comment by zmmmmm 2 days ago
- record and separate two sides of the conversation
- save meetings in a simple transcription format in a local folder
- connect with my calendar (Outlook, Google Calendar) and name meeting transcripts accordingly
- for recurring meetings, append rather than create a new transcript
- let me label speaker voices and recognise those voices across different meetings
A tool that did all this and then ALSO built a knowledge base to let me RAG query my meetings would be the holy grail for me.
Comment by mrkn1 2 days ago
Comment by addozhang 2 days ago
Comment by geniium 2 days ago
Comment by sofixa 2 days ago
Comment by dsl 2 days ago
Comment by z3ugma 1 day ago
It's very cyberpunk eventually...the human operator of the console needs to be able to see and hear the screen and sound, there will always be an interface that can be adapted to a machine, however low-fidelity
Comment by robertkarl 2 days ago
I would be more willing to purchase if it was open source and I could build from source to try it first.
Comment by satvikpendem 2 days ago
Comment by PufPufPuf 2 days ago
Comment by addozhang 2 days ago
Comment by anonymouse008 2 days ago
Not the subsidized subs
Comment by plaguuuuuu 2 days ago
Comment by littlecranky67 2 days ago
Comment by nkmnz 2 days ago
Comment by AG342 2 days ago
Comment by mushufasa 2 days ago
Comment by transitKnox 1 day ago
Comment by AG342 19 hours ago
Comment by usernametaken29 2 days ago
Comment by shireboy 1 day ago
Comment by AG342 1 day ago
Comment by AG342 1 day ago
I’ll take a look as my top priority.
Comment by lee_ars 1 day ago
Comment by nightpool 2 days ago
Comment by AG342 2 days ago
Comment by Gigachad 2 days ago
Comment by ectoloph 2 days ago
Minor complaint is that it steals Cmd-Shift-P (Firefox Private Browsing shortcut) by default.
Easy to change in the UI though, so no big deal.
Comment by Myrmornis 2 days ago
Comment by AG342 2 days ago
Comment by Myrmornis 1 day ago
I'll be wanting to find a good workflow to get the markdown transcripts into a git repo with file names that define a suitable sort order and also indicate what the meeting was. So would welcome your suggestions there. Not blocked of course, yo umake it easy to copy from clipboard or from the disk location and rename, but might be nice to have more control about where and how the .md lands.
I might email the support address on the off-chance that you're happy to have support/feature conversations like this. Thanks!
Comment by AG342 1 day ago
Comment by haaz 1 day ago
Comment by AG342 1 day ago
Yep, based in Sheffield, UK.
Comment by yilugurlu 1 day ago
Comment by AG342 1 day ago
Comment by frabia 2 days ago
Comment by watchlight 2 days ago
In my experience, medium is often the sweet spot for English accuracy vs speed, especially if following-up with a post-processing pass. The large options are all fine, but can severely slow it down. There are some speed checks on my website if you're curious (link not posted because I don't want to hijack another post's app).
Comment by fandorin 2 days ago
Comment by scosman 2 days ago
Comment by AG342 2 days ago
Comment by scosman 1 day ago
Comment by iorinu 1 day ago
Comment by nazca 2 days ago
Comment by chid 2 days ago
Comment by overflowy 2 days ago
Comment by triyambakam 2 days ago
Comment by Zhite_Panther 17 hours ago
Comment by kexelion 1 day ago
Comment by mantlemd 1 day ago
Comment by ipotapov 3 days ago
Comment by _onecookie 1 day ago
Comment by beefmumbai 2 days ago
Comment by JohnBizBiz 3 days ago
Comment by ZoneZealot 2 days ago
Comment by satvikpendem 2 days ago
Comment by hmokiguess 2 days ago
Comment by infl8ed 2 days ago
Comment by satvikpendem 2 days ago
Add "open source" if you wish as well.
Comment by hmokiguess 2 days ago
Comment by jv22222 2 days ago
Comment by vermilingua 2 days ago
Comment by jv22222 2 days ago
Comment by hmokiguess 2 days ago
Comment by nl 2 days ago
Comment by hmokiguess 2 days ago
Comment by jv22222 2 days ago
Comment by satvikpendem 2 days ago