Dictate & AI: WhisperShortcut
Speech to Text & Transcribe
Only for Mac
Free
Turn your voice into text anywhere on your Mac. WhisperShortcut lives in your menu bar — press a shortcut, speak, and your words are ready to paste in any app.
Open source. No subscription, no account. Bring your own API key, or run fully offline with local Whisper.
WHAT YOU CAN DO
• Dictate — Speak and get clean text on your clipboard. Use cloud models or local Whisper, fully offline and private.
• Voice editing — Copy any text, then say an instruction like "make this shorter" or "translate to English" to rewrite it.
• Read Aloud — Select text anywhere and hear it in natural AI voices, at the speed you choose.
• AI Chat — A built-in chat with multiple models, screenshots, image and file attachments, web search and slash commands.
• Screenshots — Capture your screen and attach it to a voice prompt or chat.
• Live Meeting — Record and transcribe meetings as they happen, with a local transcript.
• Smart Improvement — The app learns your vocabulary and preferences over time.
PRIVATE BY DESIGN
Local-first, with no backend and no sign-up. Your audio and text go only to the AI provider you choose with your own key — or never leave your Mac with local Whisper.
WORKS WITH YOUR TOOLS
Google Gemini, OpenAI (GPT), and xAI (Grok). Optional Google (Calendar, Tasks, Gmail) and Trello integrations turn chat into a hands-free assistant.
Requires macOS 15.5 or later.
more - New: Run Dictate Prompt fully offline with a local OpenAI-compatible server such as Ollama or LM Studio — just set the endpoint and model in Settings.
- Chat: email questions now get a real answer instead of dead-ending, replies keep a consistent tone, and the sidebar shows Today and Yesterday expanded by default.
- Fix: resolved a Gemini 3 chat error that could break tool calls.
7.74 2d ago
- Improved welcome tour and onboarding flow, featuring a persistent floating window, a unified permissions hub, and reliable system settings integration.
- Enhanced chat functionality with image downloads, real-time Google integration (Calendar, Tasks, Gmail), and fixes for streaming freezes and Keychain-related stalls.
- Added clearer error handling for non-vision models attempting to read screenshots and overall UI polish.
7.73 Jun 24
- Launch at Login is now off by default — enable it yourself in Settings.
- Auto-paste is now optional; dictation works without extra permissions.
- Clearer setup prompts.
- More reliable AI chat with automatic retries on temporary server errors.
7.68 Jun 18
- More reliable chat: Fewer freezes during long conversations with generated images; smoother streaming and a more stable message list when sending or resizing the window.
- Meeting summaries for every provider: Summaries and titles now work with Gemini, OpenAI, and Grok; missing summaries regenerate when you open the Summary tab.
- Smarter transcription & calendar: Your glossary is applied to instructable speech-to-text models; Google Calendar lookups can include past events with normalized dates.
7.55 Jun 10
- Generate images in AI Chat with Gemini - ask for a picture or let the assistant use the new image tool.
- Chat scrolls more smoothly in long conversations; fixes rare 100% CPU freezes when scrolling.
- Read Aloud shows a speaking indicator in the menu bar; retry your last chat message or copy pasted text with your message.
7.50 Jun 4
- Fixes chat freezing during long responses — especially when switching chats while a response was still generating.
- Up to 10 file attachments per message (previously 5).
7.45 Jun 1
• Read Aloud: more reliable text capture; friendly note when nothing is selected
• Screenshots save to your folder again, with better permission guidance
• Live meeting titles show up in chat sooner
• Fixed chat freezing on replies with web sources
• Restored menu-bar review prompts
7.42 May 31
• Read Aloud: pick a voice per provider (Gemini, OpenAI, Grok) in Settings
• Chat: /think sets reasoning depth per conversation (all providers)
• Read Aloud rewrite sounds more natural with an improved default prompt
7.39 May 30
Read Aloud: Read selected text aloud, optional AI rephrasing for non-readable content, variable speed.
Setup: Record shortcuts by keypress, permissions onboarding, save screenshots and attach in chat.
Chat & Meetings: Sidebar with search and meetings area, faster live transcription, more stable meeting titles.
7.35 May 28
- Paste screenshots and images into Chat; new ⌘3 shortcut captures a region straight to your clipboard.
- Updated chat defaults (GPT-5.5, Grok 4.3); Gemini Pro models use deeper reasoning for stronger answers.
- Smart Improvement runs more reliably in the background with fewer popups; sharper dictation and chat prompts.
7.26 May 25
Dictation: Fixed silent drop to idle (especially offline Whisper); clearer “no speech detected” feedback; more forgiving silence threshold.
Quality & support: Show logs in Settings; better transcription for large non-WAV files; fewer wasted API calls after Dictate Prompt.
Settings: Offline Whisper download confirmation no longer blocks the window; toast stays visible for 10 seconds.
7.20 May 20
This update improves Dictate Prompt with a stronger default model and stricter language and minimal-edit behavior, shows richer attachment details in chat, and refines slash-command recognition (including /copy).
7.16 May 18
Trello (OAuth + tools), OpenAI for transcription, chat, and Dictate Prompt, smarter custom transcription API defaults, richer chat UX (copy as Markdown, /copy, command awareness, more screenshots, clearer streaming behavior), Google Tasks/calendar link quality, Grok speed and error clarity, Smart Improvement audio checks and stricter patterns, plus reliability and shutdown improvements.
7.15 May 13
- Code quality and reliability improvements across core systems
7.10 May 6
• Reliability: Fixed a rare issue where recording could stay stuck on if you started a new recording immediately after the previous one ended.
• Live Meeting: Starting a meeting from a new chat after a previous one ended now attaches the recording to the current session.
• Google tools: Calendar and task actions show clearer confirmation details in chat when they succeed.
• Gmail: Improved stability when opening large search results.
• Google sign-in & Calendar: More robust handling of special characters in tokens, codes, and event IDs.
• Settings: “Reset all to defaults” now also clears chat sessions, meeting transcripts, and saved system prompts (API keys and Google tokens remain in Keychain). The confirmation text reflects this accurately.
7.9 Apr 30
- Unified chat + live meeting: The chat window (formerly “Open Gemini”) is the main surface; live meeting lives in the same place with session sidebar and related fixes.
- Deeper AI + Google in chat: Multi-provider (Gemini + Grok), tool/function calling, Google Calendar/Tasks, read-only Gmail, and many stability fixes.
- Streamlined settings: Renamed modes (“Dictate Prompt” etc.), per-mode system prompts, removed old modes (Prompt Read, chat read-aloud), and UI/privacy defaults (e.g. masked API keys, sidebar, pins, chat management).
7.3 Apr 27
Gemini model selection: Fixed the model picker not prefilling correctly on the first open after a cold launch.
6.7.1 Apr 4
- Feature Removal: Read Aloud, Prompt & Read, and Improve-from-voice have been removed to focus the app on Speech-to-Text, Speech-to-Prompt, and Gemini Chat.
- Gemini Chat Enhancements: Added a prompt queue for sequential processing, prefill from selection with visual chips, and a window toggle shortcut.
- Opt-in Controls: Screenshots in Prompt Mode and auto-paste for dictation are now strictly opt-in to improve privacy and reduce permission prompts.
- Reliability: Fixed crashes related to main-thread rules and focus-loss bugs, alongside general internal code cleanup.
6.7 Apr 2
- Gemini Chat: Attachments, paste handling, markdown tables, session memory (/remember, /context), and in recent releases syntax-highlighted code blocks, LaTeX-style math, and inline images, with steadier layout and window behavior.
- Meeting mode: Split window, rolling summary, generated summaries, past meetings in one place, and refinements to controls, titles, and the transcript experience.
- Elsewhere: Screenshot context in prompt mode, Whisper glossary and model tuning for dictation/prompt/improvement, a central Stop control, no automatic app termination, optional close-Settings-on-focus-loss, and a fix for heavy disk writes that could get the app terminated by macOS.
6.6.3 Mar 22
- Gemini Chat offers a dedicated window with a global shortcut and persistent sessions as well as slash commands with autocomplete for efficient command execution.
- Grounding sources and citations are displayed inline in a FlowLayout and supplemented by an integrated screenshot function for context enrichment.
6.4.5 Mar 5
- Smart Improvement & AI: System prompts now improve automatically based on usage, powered by Gemini for more precise analysis and history management.
- User Context & History: Introduction of configurable limits for user context and system prompts, plus tab-specific history for AI generation.
- UI/UX & Settings: Optimized prompt editor and direct access to interaction and transcript folders within the app.
- Live Meeting & Stability: New safety safeguards for recording duration, consistent timestamps, and improved transcription control, including a cancel shortcut.
6.0.0 Feb 20
- Introduced live meeting recording and transcription, including dedicated settings and direct folder access.
- Implementation of a global rate limit with UI notifications for transcription and TTS services.
- Optimized TTS workflow and simplified configuration through consolidation of service logic.
5.3.6 Feb 5
Read Aloud and Prompt & Read now support an adjustable playback speed so you can listen faster or slower.
5.3.4 Feb 2
- Introduced conversation history for prompt modes to enable more coherent, context-aware AI interactions.
- Enhanced application reliability through optimized TTS processing, updated API endpoints, and improved audio validation.
- Added a configurable timeout for transcription history to prevent long-running requests from blocking the app.
5.3.3 Feb 1
- Added configurable recording safeguards to prevent unintended API usage
5.3.2 Jan 25
- New: Run Dictate Prompt fully offline with a local OpenAI-compatible server such as Ollama or LM Studio — just set the endpoint and model in Settings.
- Chat: email questions now get a real answer instead of dead-ending, replies keep a consistent tone, and the sidebar shows Today and Yesterday expanded by default.
- Fix: resolved a Gemini 3 chat error that could break tool calls.
more Version 7.74 2d ago
Data Not Collected The developer does not collect any data from this app.