Gemma 4 support, better GGUF imports, smarter retrieval, live prompt progress, and new memory plus system prompt customization.
Noema brings large language model intelligence to all your devices, fully offline. Download lightweight models directly from Hugging Face, connect supported remote endpoints, and pair models with curated textbooks and your own PDFs or EPUBs. The privacy-first design means your data never leaves your device when running locally, whether you are on iPhone, Mac, or visionOS.
- Native macOS app: Run the full Noema experience on your desktop with a rebuilt interface that feels at home on macOS.
- visionOS support: Use Noema in spatial computing environments, with windows you can place around your workspace.
- Noema Relay: Connect your iPhone to your Mac via CloudKit, with no local Wi-Fi required, so one device can host a model while another becomes the client.
- Vision support for models: Attach photographs to your prompts and use multimodal models for on-device image understanding and analysis.
- Open Textbook Library integration: Browse and import entire textbooks through the built-in Explore view; Noema indexes them locally so you can search and retrieve relevant passages on demand.
- Bring your own data: Add personal documents in PDF or EPUB formats, which are embedded and indexed on-device to power retrieval-augmented generation.
- Integrated Hugging Face search: Discover and install quantized models from the Hugging Face Hub with one-tap installation, automatic dependency management, and real-time download progress.
- Remote model support: Connect to supported remote endpoints including OpenRouter and LM Studio, with updated LM Studio REST v1 compatibility and a smoother model download flow through Explore.
- Expanded model runtime support: Run models across GGUF, MLX, ExecuTorch, CoreML, and Apple Foundation Model support, giving you flexible on-device options across Apple hardware.
- RAM check and model size helper: A built-in advisor estimates each model’s memory footprint and shows when it fits your device’s budget; it can also estimate the maximum context length that fits in RAM.
- Advanced settings for power users: Fine-tune context length, quantization, and GPU acceleration; enable tool calling for built-in search and other functions; and customize model parameters for optimal performance.
- Built-in tool calling and Python support: Use integrated tools, including Python, to extend model capabilities for more advanced workflows.
- Built-in search and RAG: Use integrated search tools and retrieval-augmented generation to query your data without hitting context limits.
- Localization upgrades: Experience Noema in 10 languages, so international teams can work in the interface that suits them best.
- Private and offline by default: Local models run entirely on-device, and your conversations and files stay on your device unless you choose to use a connected remote provider.
Update: Hi xxxsman, the issue you were encountering has been fixed, please let us know if you continue encountering issues.Hi there xxxsman! We're sorry the off-grid functionality is not working correctly. We have a team of early testers and thoroughly test features before we release, this is actually a feature that has been in Noema for a long time. We'd love if you could send us an email at clientcare@noemaai.com as this might be a device-specific bug. Please contact us and we'll fix this as soon as possible.Thank you.
Best offline LLM solution on iOS
Ratros
Really, this is exactly the kind of offline LLM experience that I am looking for on iOS. Bravo! There were some minor bugs here and there, but the core experience works greatly. If iOS can relax the memory limit a bit more I am sure the app could get much more useful with larger models but one could dream at this moment. Still, the app itself really stands out. I am wondering if there’s a way to support the developer…
Developer Response
Hi, thanks for your review! We’re working towards resolving all the bugs and improving the overall user experience so that it is closer to what you could find on desktop. Thanks for again for your support! At the current time, there’s no way to support, but in future releases I will look into it. Your feedback currently is more than enough!
The best so far
Johnny Nimbus
I’ve tried all the apps for local AI and for accessing a remote backend and this is the best so far. It’s professionally designed and implemented, offers free search and RAG (ability to interact with documents), has both recommended local models and search for downloadable models, and at this writing is free. The developer has been very responsive to suggested improvements. Deeply grateful to the developer for the time and effort to create and polish this gem!
Developer Response
Thanks for your review. It means a lot to me. If you need anything else regarding the features Noema offers, don't hesitate to reach out again!
Great App
THXA76
Noema is the most reliable on-device AI I’ve tried. It runs locally on my iPhone, so responses are fast and my data stays on my device. I can bring in my own documents and get answers based off them. Clean UI, and the app id clearly optimized for iOS. Easily a daily tool for work or study.
• Updated llama.cpp for Gemma 4 support, including fixes for previously known Gemma 4 issues
• Improved GGUF import reliability, with better detection for chat templates, JSON configs, and multimodal projector files
• Added a clearer download experience for CML models, including visible progress during downloads
• Improved smart retrieval so large-context models make better use of available context with PDFs and long documents
• Added a new Prompt Processing card in chat with live progress feedback
• Fixed prompt processing progress getting stuck at 0% and corrected its placement after tool calls
• Fixed scrolling issues in Model Settings caused by repeated memory fit checks
• Updated VRAM estimates and maximum context recommendations to reflect KV cache quantization changes
• Added support for Memory and system prompt customization
• Refreshed curated models with Gemma 4 and Qwen 3 1.7B support
Version 2.2
The developer, Alexandru Stamate, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy .
Data Not Collected
The developer does not collect any data from this app.
Privacy practices may vary, for example, based on the features you use or your age. Learn More
The developer indicated that this app supports the following accessibility features. Learn More