Private LLM - Local AI Chat 13+

Name: Private LLM - Local AI Chat
Price: 5 CHF
Rating: 4.1 (8 reviews)
Author: Numen Technologies Limited

Local Offline Private AI Chat

Numen Technologies Limited

- 4.1 • 8 Ratings

- CHF 5.00

Description

Meet Private LLM: Your Secure, Offline AI Assistant for macOS

Private LLM brings advanced AI capabilities directly to your iPhone, iPad, and Mac—all while keeping your data private and offline. With a one-time purchase and no subscriptions, you get a personal AI assistant that works entirely on your device.

Key Features:

- Local AI Functionality: Interact with a sophisticated AI chatbot without needing an internet connection. Your conversations stay on your device, ensuring complete privacy.

- Wide Range of AI Models: Choose from various open-source LLM models like Llama 3.2, Llama 3.1, Google Gemma 2, Microsoft Phi-3, Mistral 7B, and StableLM 3B. Each model is optimized for iOS and macOS hardware using advanced OmniQuant quantization, which offers superior performance compared to traditional RTN quantization methods.

- Siri and Shortcuts Integration: Create AI-driven workflows without writing code. Use Siri commands and Apple Shortcuts to enhance productivity in tasks like text parsing and generation.

- No Subscriptions or Logins: Enjoy full access with a single purchase. No need for subscriptions, accounts, or API keys. Plus, with Family Sharing, up to six family members can use the app.

- AI Language Services on macOS: Utilize AI-powered tools for grammar correction, summarization, and more across various macOS applications in multiple languages.

- Superior Performance with OmniQuant: Benefit from the advanced OmniQuant quantization process, which preserves the model's weight distribution for faster and more accurate responses, outperforming apps that use standard quantization techniques.

Supported Model Families:
- DeepSeek R1 Distill based models
- Phi-4 14B model
- Llama 3.3 70B based models
- Llama 3.2 based models
- Llama 3.1 based models
- Llama 3.0 based models
- Google Gemma 2 based models
- Qwen 2.5 based models (0.5B to 32B)
- Qwen 2.5 Coder based models (0.5B to 32B)
- Google Gemma 3 1B based models
- Solar 10.7B based models
- Yi 34B based models

For a full list of supported models, including detailed specifications, please visit privatellm.app/models.

Private LLM is a better alternative to generic llama.cpp and MLX wrappers apps like Enchanted, Ollama, LLM Farm, LM Studio, RecurseChat, etc on three fronts:
1. Private LLM uses a significantly faster mlc-llm based inference engine.
2. All models in Private LLM are quantised using the state of the art OmniQuant quantization algorithm, while competing apps use naive round-to-nearest quantization.
3. Private LLM is a fully native app built using C++, Metal and Swift, while many of the competing apps are bloated and non-native Electron JS based apps.

Please note that Private LLM only supports inference with text based LLMs.

Private LLM has been specifically optimized for Apple Silicon Macs.Private LLM for macOS delivers the best performance on Macs equipped with the Apple M1 or newer chips. Users on older Intel Macs without eGPUs may experience reduced performance. Please note that although the app nominally works on Intel Macs, we've stopped adding support for new models on Intel Macs due to performance issues associated with Intel hardware.

13 Oct 2025

Version 1.9.12

Minor compatibility fixes with macOS 26

4.1 out of 5

8 Ratings

Lacks some features

I like using the App, but I wish it could do more. Particularly:
1. It would be great if I could have multiple chats. That way I could continue thread with all the history whenever I want, instead of always having start with a blank slate.
2. iOS shortcut could provide option to select preferred model. Also include system instructions field.
3. Why does app ship with LLM model? This makes the whole app extra large to download. Why don't i have option to delete this build-in model?

Thanks for the thoughtful review!

We’re actively working on Private LLM v2 (free upgrade for all v1 users). Along with big performance gains, v2 adds multi-chat with full chat history so you can resume any thread - coming soon.

Shortcuts: we’d love to let you pick a model. The blocker has been that loading a model into memory takes time, and Apple Shortcuts actions have strict timeouts - so “switch model then run” often fails. That’s why we skipped it initially. We’ll revisit this to see if there’s room to improve.

Built-in model: we currently ship a small starter model to satisfy App Review. We’re exploring making it optional (download on demand) and removable to keep the app size down.

If you’d like to follow v2 progress or share more feedback, please join our Discord from the app

Works great but

there are improvements that would add massive value:

1. it would be even better if it would offer the ability to have multiple chats. Having a single chat stream is confusing. On my Mac I am using LM Studio and I would love for a similar setup to be present.

2. I’m not sure it offers the maximum quantization that would be possible on the M4 16GB iPad Pro given the Q3 limits on Mistral Instruct and 4bit Phi3. I understand why those make a lot of sense for broad support in a one size fits all situation but I would naturally also like to use the device to its limits :)

Thanks for the review! We're working on enabling support for larger 4-bit Omniquant quantized models on high end iPad Pros with 16GB of RAM (M1, M2 and M4). This has lately taken a bit of a back-seat because we're currently working on adding newer models that were released recently (like Llama 3.1). We'll get back to it as soon as we're through our model backlog.

Einfacher Zugang zu GenAI Chat

Funktioniert tipptopp + gute Modelle können einfach heruntergeladen werden. Neben dem Hinzufügen weiterer Modelle wäre die Möglichkeit des Verwaltens von mehreren Chats essentiell. Ohne diese Funktionalität ist das Tool eher nur für sporadische Nutzung interessant.

The developer, Numen Technologies Limited, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy.

Data Not Collected

The developer does not collect any data from this app.

Privacy practices may vary based on, for example, the features you use or your age. Learn More

Information

Provider

Numen Technologies Limited

Size

1.3 GB

Private LLM - Local AI Chat 13+

Local Offline Private AI Chat

Numen Technologies Limited

Screenshots

Description

What’s New

Ratings and Reviews

Lacks some features

Developer Response ,

Works great but

Developer Response ,

Einfacher Zugang zu GenAI Chat

App Privacy

Data Not Collected

Information

Supports

Family Sharing

Up to six family members can use this app with Family Sharing enabled.

More By This Developer

You Might Also Like