Inferencer - Private AI Studio

Name: Inferencer - Private AI Studio
Rating: 0 (0 reviews)
Author: Ashraf Samy

Advanced Local AI Assistant

Only for Mac

Free · In‑App Purchases

Ages

13+
In-App Controls
Category

Productivity
Developer

Ashraf Samy
Language

EN
English
Size

627.4
MB

Mac

Inferencer lets you run, host and deeply control the latest SOTA AI models (OSS, DeepSeek, Qwen, Kimi, GLM and more) from your own computer. No data is sent to the cloud for processing - maintaining your complete privacy. Advanced inferencing controls give you complete control on their accuracy and outputs. Models Start in the models section where you can select the location of existing models or download new ones directly from Hugging Face. Use the distributed compute feature to load a model across two Macs, or use the model streaming feature to inference larger models partially from storage. Chats Select the model to interact with on the top menu bar and write a prompt to begin. At any point you can switch between models and continue the chat to see what else they can uncover. You can also selectively delete past messages to keep the model focused and less scatterbrain. Chat Controls Control the inferencing parameters including batching to inference multiple chats at the same time, intensity of processing, and model streaming to load models larger than available memory. Token Entropy and Inspection Select the inspectors to peek into the inner-workings of each word outputted and see the model's confidence levels and alternative choices. Prompt Framing Expanding the prompt section to utilise the framing feature which allows you to control the output the model generates. Tools The tools editor allows you to enable built in tools such as get_webpage_content or add in your own, so that models can use them when needed. For example, if you'd like a webpage or search result inferenced, simply enable the tool in the Tools section, and allow tool calls in the chat settings panel. Server If enabled, the server feature allows you to serve and connect to your own or trusted devices. No data is sent elsewhere. Also includes compatible APIs for application development. Distributed Inference With distributed compute you can link together two Macs, sharing the memory to inference larger models. To use make sure it's enabled in both the app and server settings. Once a connection to your server is made, if both the computers have the same model, a distributed compute icon will appear. Simply tap on it to load the model for distributed compute. Coding Tools Built-in support for Xcode Intelligence and Visual Studio Code. Use the server feature with Compatibility APIs enabled and SSL disabled to allow Xcode or Visual Studio Code to use Inferencer as a service provider. Shortcuts Use the Shortcuts app to automate inferencing workflows (e.g., copy text from clipboard > inference > speak result). Settings Includes parental controls, an automatic deletion policy and more. Privacy For maximum privacy, all AI processing happens offline and on your device, by default. Subscriptions Basic (Free): Most features unlocked for free including unlimited chats. Professional: Upgrade for more advanced token inspection, prompt-framing and model streaming. Terms & Support Terms of Use: inferencer.com/terms Privacy Policy: inferencer.com/privacy Disclaimer Inferenced models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions.

Ratings & Reviews

This app hasn’t received enough ratings or reviews to display an overview.

Downloading models broken
Jan 4

MikeTrainDoc
App would not download models but the developer responded to email with a quick fix, and it should be fixed in next version. This is an interesting program - I haven’t figured out all the bells and whistles.

Developer Response
Hi, an update was submitted to Apple earlier in the week to address an issue with downloading on macOS 26. As a workaround, you can download the models directly from HuggingFace and place them directly in your chosen Downloads folder. Feel free to report any other issues or requests you have on the public roadmap.
Downloading models broken
Jan 4

MikeTrainDoc

App would not download models but the developer responded to email with a quick fix, and it should be fixed in next version. This is an interesting program - I haven’t figured out all the bells and whistles.
Developer Response Jan 4
Hi, an update was submitted to Apple earlier in the week to address an issue with downloading on macOS 26. As a workaround, you can download the models directly from HuggingFace and place them directly in your chosen Downloads folder. Feel free to report any other issues or requests you have on the public roadmap.
Like nothing before
09/06/2025

txg_sync
This app does something nothing else does: allow you to see alternative tokens in nearly real-time. And to adjust the output to your framing without complicated setup or per-model sidebar configuration. You can do it per conversation. For AI researchers, the curious, and those looking to tweak model outputs? This is a fabulous tool. Highly recommended for the curious and local LLM enthusiasts.

Developer Response
Thank you so much for the support and feedback, if there's any other features you'd like implemented feel free to add it to the public roadmap.
Like nothing before
09/06/2025

txg_sync

This app does something nothing else does: allow you to see alternative tokens in nearly real-time. And to adjust the output to your framing without complicated setup or per-model sidebar configuration. You can do it per conversation. For AI researchers, the curious, and those looking to tweak model outputs? This is a fabulous tool. Highly recommended for the curious and local LLM enthusiasts.
Developer Response 11/12/2025
Thank you so much for the support and feedback, if there's any other features you'd like implemented feel free to add it to the public roadmap.
Probably good, but…
11/17/2025

Jackdeath3
As someone who has multiple Macs, I was looking forward to using this for distributed inference, but I think that charging a monthly subscription is a really bad business model. I would love to support the development of this program, but monthly fees are the exact kind of thing that I wanted to avoid by investing in local AI hardware. I hope you consider changing this to a one time payment.

Developer Response
Hi Jack, thanks so much for sharing your perspective.
Probably good, but…
11/17/2025

Jackdeath3

As someone who has multiple Macs, I was looking forward to using this for distributed inference, but I think that charging a monthly subscription is a really bad business model. I would love to support the development of this program, but monthly fees are the exact kind of thing that I wanted to avoid by investing in local AI hardware. I hope you consider changing this to a one time payment.
Developer Response 11/19/2025
Hi Jack, thanks so much for sharing your perspective.
Most Quality Inferencing for OS X
Jan 20

packetssx
The maker of this app, xCreate is well beyond knowledgeable when it comes to inferencing LLM’s. Though some functions of using it as an agentic coding api endpoint needs work, the creator of this app is updating and fixing things on a literal daily basis. Keep in mind that he is allowing all of this to be used for free.

Developer Response
Thanks so much for the support. If there's any specific issues you have with the api endpoints, please post them on the issues page or send over an email and we'll get it fixed for you.
Most Quality Inferencing for OS X
Jan 20

packetssx

The maker of this app, xCreate is well beyond knowledgeable when it comes to inferencing LLM’s. Though some functions of using it as an agentic coding api endpoint needs work, the creator of this app is updating and fixing things on a literal daily basis. Keep in mind that he is allowing all of this to be used for free.
Developer Response Jan 23
Thanks so much for the support. If there's any specific issues you have with the api endpoints, please post them on the issues page or send over an email and we'll get it fixed for you.

+ Support for GLM-5, MiniMax 2.5 and LongCat-Flash-Lite + Prompt caching - 99x faster prompt processing when cached (Enable in Settings) + Distributed compute support for GLM-5 + MLA support for DeepSeek-3.2, GLM-5 and Kimi-K2.5 (33x reduction in context memory use) + Custom MLA context size for smarter long context attention (Model Settings) + Faster prompt processing t/s speeds + Custom prompt processing chunk sizes (Model Settings) + Tool call support for GLM-4.7-Flash + Server API and Tool call improvements + Pro Yearly plan (2 months free) + Fixed intermittent crash when loading last model on startup + More bug fixes and performance improvements

Version 1.10.1

2d ago

Data Not Collected

The developer does not collect any data from this app.

Privacy practices may vary, for example, based on the features you use or your age. Learn More

Accessibility

Information

Seller
Size
Category
Compatibility
Languages
Age Rating
In-App Purchases
Copyright

Privacy Policy

More by Ashraf Samy

Blur Square

Photo & Video
Playir: Game & App Creator

Productivity
Music Gym

Music
Multicam Pro - Tri-Camera Rec

Record from multiple cameras.
CraftWorlds

Adventure
Treatment Pad AI Layout Camera

Generate Before & After Photos
Food Fighters

Casual
Diagnosis Pad - AI Doctor

Offline Medical AI Assistant