Inferencer - Private AI Studio

Advanced Local AI Assistant

Free · In‑App Purchases

Inferencer lets you run, host and deeply control the latest SOTA AI models (OSS, DeepSeek, Qwen, Kimi, GLM, MiniMax and more) from your own computer. No data is sent to the cloud for processing - maintaining your complete privacy. Advanced inferencing controls give you complete control on their accuracy and outputs. Models Start in the models section where you can select the location of existing models or download new ones directly from Hugging Face. Use the distributed compute feature to load a model across two Macs, or use the model streaming feature to inference larger models partially from storage. Chats Select the model to interact with on the top menu bar and write a prompt to begin. At any point you can switch between models and continue the chat to see what else they can uncover. You can also selectively delete past messages to keep the model focused and less scatterbrain. Chat Controls Control the inferencing parameters including batching to inference multiple chats at the same time, intensity of processing, context quantization to further reduce the memory usage and model streaming to load models larger than available memory. Token Entropy and Inspection Select the inspectors to peek into the inner-workings of each word outputted and see the model's confidence levels and alternative choices. Response Control Utilise the framing feature which allows you to control the output the model generates. For example, skipping the preamble or directing the model to output in structured html. Tools The tools editor allows you to enable built in tools such as get_webpage_content or add in your own, so that models can use them when needed. For example, if you'd like a webpage or search result inferenced, simply enable the tool in the Tools section, and allow tool calls in the chat settings panel. Server If enabled, the server feature allows you to serve and connect to your own or trusted devices. No data is sent elsewhere. Also includes compatible APIs for application development. Prompt Caching Significantly improves prompt processing by prefix-matching previously processed prompts. Uses a user-configurable cache pool with optional external storage and automatic LRU (least recently used) eviction (enabled in Settings). Distributed Inference With distributed compute you can link together two Macs, sharing the memory to inference larger models. To use make sure it's enabled in both the app and server settings. Once a connection to your server is made, if both the computers have the same model, a distributed compute icon will appear. Simply tap on it to load the model for distributed compute. Coding Tools Built-in support for Xcode Intelligence and Visual Studio Code. Use the server feature with Compatibility APIs enabled and SSL disabled to allow Xcode or Visual Studio Code to use Inferencer as a service provider. Shortcuts Use the Shortcuts app to automate inferencing workflows (e.g., copy text from clipboard > inference > speak result). Settings Includes parental controls, an automatic deletion policy and more. Privacy For maximum privacy, all AI processing happens offline and on your device, by default. Subscriptions Basic (Free): Most features unlocked for free including unlimited chats. Professional: Upgrade for more advanced token inspection, prompt-framing and model streaming. Terms & Support Terms of Use: inferencer.com/terms Privacy Policy: inferencer.com/privacy Disclaimer Inferenced models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions.

  • 4.1
    out of 5
    17 Ratings

+ Support for M5 Neural Accelerators (~4x prompt processing) + HTML block support + ModelScope downloader + Patches for DeepSeekv32, Step35 and Distributed Compute + Improved support including crash fixes for macOS 26.4 + More bug fixes and performance improvements Also in case you missed the last update: + Support for Gemma4, Qwen 3.5, Nemotron Super, Mistral Small 4, Bonsai, Sarvam, Ring-2.5-1T + Improvements to prompt caching (reduced cache misses, 99x faster when cached) + Support for TurboQuant (Unbatched) in Model Context Precision + Auto-launch on login, hide Dock and Menu Bar options in Settings + Prompt caching and Tool call support for Distributed Compute + Download models filter (including filter by system memory) + Pro Yearly plan (2 months free)

The developer, Ashraf Samy, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy .

  • Data Not Collected

    The developer does not collect any data from this app.

    Privacy practices may vary, for example, based on the features you use or your age. Learn More

    The developer has not yet indicated which accessibility features this app supports. Learn More

    Seller
    • Ashraf Samy
    Size
    • 162.7 MB
    Category
    • Productivity
    Compatibility
    Requires iOS 18.0 or later.
    • iPhone
      Requires iOS 18.0 or later.
    • iPad
      Requires iPadOS 18.0 or later.
    • Mac
      Requires macOS 15.0 or later and a Mac with Apple M1 chip or later.
    • Apple Vision
      Requires visionOS 2.0 or later.
    Languages
    • English
    Age Rating
    9+
    • 9+
    • This app has an age rating of 9+ with content restrictions. Some content may be rated higher, but access is managed by the developer through in-app controls.
    • In-App Controls
      Parental Controls

      Infrequent
      Cartoon or Fantasy Violence
      Profanity or Crude Humor
      Mature or Suggestive Themes
      Horror/Fear Themes
      Guns or Other Weapons

      Contains
      User-Generated Content
    In-App Purchases
    Yes
    • Professional $9.99
    • Professional (Yearly) $99.99
    Copyright
    • © 2026 Inferencer