Hal Universal
Private AI with Memory
Free · Designed for iPad. Not verified for macOS.
On-device AI with persistent memory, multiple model voices, and full transparency about how it works. Nothing leaves your device when running local models.
Hal Universal
Private. Powerful. Personal.
Hal Universal is an on-device AI assistant for iPhone, iPad, and Mac. A thoughtful, private companion for conversation, reflection, and creativity — and a window into how AI actually works.
Multiple AI Voices
Hal ships with Apple Intelligence built in — always available, no download required. For users who want a fully private on-device experience, Hal offers a curated library of tested local models: Gemma 4 E2B, Llama 3.2 3B, Qwen 3.5 2B, and Dolphin 3.0. Each has a distinct voice. Download any or all — they run entirely on your device, with nothing sent to any server.
Genuinely Private
When you use a local MLX model, your conversations never leave your iPhone. No network calls. No server. When using Apple Intelligence, inference is on-device when offline; when connected, Apple's Private Cloud Compute may be used (encrypted in transit, processed in non-persistent memory on Apple-controlled servers).
Memory That Persists
Hal remembers across conversations. A semantic memory system weighted by recency and relevance retrieves what's useful when it's useful. The optional Nomic retrieval upgrade dramatically improves recall on specific facts (opt-in in the Model Library, 522 MB).
When a turn's context would exceed the active model's window, Hal posts a brief explanation in the conversation rather than silently dropping content. When self-knowledge sections are compressed to fit, a small "condensed" badge appears in the message footer.
Self Model
Hal builds a structured self-model over time from patterns it notices in your conversations — values, preferences, themes. You can browse the Self Model viewer to see what Hal has crystallized about you, and toggle each entry between private (visible only to you) and shareable (available for export). Every entry is editable. Nothing leaves your device without your action.
Salon Mode
Put up to four AI voices in conversation. Independent mode (each voice isolated) or context-aware (each builds on what came before). Accessible in Settings → Power User Mode.
Transparency as Architecture
Hal shows you the work. Token counts, memory retrieval, model identity, inference timing, compression events — all visible. When Hal remembers something, it tells you how. When it doesn't know something, it says so.
Power User Controls
Tune memory depth, recency weighting, semantic similarity, half-life decay, temperature, and RAG settings. Per-model settings: each AI voice remembers your preferences independently.
For Users Who Want to Understand AI
Hal is an educational window into how large language models actually work. Not a black box. A transparent system you can inspect, tune, and learn from.
Hardware note: Local MLX models are validated on iPhone 16 family and iPhone 17 family. iPhone 15 Pro should work. Older devices may run slowly or not at all. A hardware disclosure appears on first download attempt.
Ratings & Reviews
- This app has not received enough ratings or reviews to display an overview.
Version 2.0 — significant update.
- Four curated local AI models now available: Gemma 4 E2B, Llama 3.2 3B, Qwen 3.5 2B, and Dolphin 3.0. Each runs entirely on your device.
- Self Model: Hal now builds a structured self-model of you over time from patterns in conversation. Browse and edit it in Settings → Self Model. Each entry is private by default.
- Salon Mode: put up to four AI voices in conversation. Independent and context-aware modes.
- Per-model settings: temperature, memory depth, RAG configuration saved independently for each model.
- Upgraded retrieval (optional): Nomic Embed Text v1.5 available as an opt-in via Model Library — significantly better recall on specific facts than the default. 522 MB.
- Per-turn memory pre-flight: when a turn would exceed the active model's working memory, Hal explains why in chat rather than crashing.
- Real token streaming: responses stream as they generate.
- Background downloads: model downloads continue while the app is backgrounded or the phone is locked.
- Unified model status dots: one dot, one meaning across all screens.
- Significant performance and stability improvements.
The developer, Mark Friedlander, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy .
Data Not Collected
The developer does not collect any data from this app.
Accessibility
The developer has not yet indicated which accessibility features this app supports. Learn More
Information
- Provider
- Mark Friedlander
- Size
- 28.9 MB
- Category
- Productivity
- Compatibility
Requires iOS 26.0 or later.
- iPhone
Requires iOS 26.0 or later. - iPad
Requires iPadOS 26.0 or later. - Mac
Requires macOS 26.0 or later and a Mac with Apple M1 chip or later. - Apple Vision
Requires visionOS 26.0 or later.
- iPhone
- Languages
- English
- Age Rating
4+
- 4+
- Copyright
- © 2025-2026 Mark Friedlander
