Apple Foundation Models
Posted by MehrdadKhnzd 2 days ago
Comments
Comment by harrouet 2 days ago
They are a hardware company and will keep selling the best machine for AI use. Well done.
Comment by tedggh 2 days ago
Comment by CuriouslyC 1 day ago
Comment by hedora 1 day ago
Comment by ls612 1 day ago
Comment by yandie 1 day ago
Comment by CuriouslyC 1 day ago
Comment by drewda 1 day ago
OpenAI and Anthropic may have gone silent on how they build their models, but other companies have different incentives.
Comment by sealeck 1 day ago
Isn’t this the problem inference (training) a model is designed to solve :)))
Comment by jmalicki 1 day ago
And it's a hard problem.
What's an easier form of training is being able to see the intermediate results and train to imitate them.
Comment by lumost 1 day ago
China will spend all of the money required to catch up, Google and OpenAI will both spend money to catch up as well. NVidia and others will not allow a frontier lab to become the AI bottleneck.
Comment by wahnfrieden 1 day ago
Comment by throw10920 1 day ago
Comment by wahnfrieden 1 day ago
Comment by nyrikki 1 day ago
Comment by mingqiz 1 day ago
Comment by lacy_tinpot 1 day ago
Comment by throwaway85825 1 day ago
Comment by naravara 1 day ago
Comment by greenavocado 1 day ago
Comment by alecco 2 days ago
I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)
Comment by ksec 1 day ago
I wish there was a case where I find Evans is wrong. As far as my memory served me, I failed to record a single one.
I disagree that Amazon, Meta, Microsoft, and Google are "well" behind. If anything the frontier model advantage seems to be at best 6 - 9 months. And that the Chinese model are all doing well.
One of Steve Jobs's line, "It is a feature, not a product." Even if Apple were a generation behind or 1 year behind frontier model. The advantage of default is enough to hold a lot of its user.
To put it simply, even if OpenAI or Anthropic were better, there is zero chances they would topple Apple in hardware sales, user or ecosystem. On the other hand, even if Apple's AI were 6 - 9 months or a generation behind, most user would settle for it and damage OpenAI / Anthropic.
Comment by ak_111 1 day ago
Comment by overfeed 1 day ago
Do you mean Google's AI with Apple wrappers? Apple's in-house AI is further behind Google, amd very far from the frontier according to your ranking. IMO, Google is on the frontier - I recall Altman calling for an OpenAI all-hands-on deck when Gemini was released because of how good it was compared to ChatGPT. I also suspect Google has the lowest operating expenses due to scale, experience and luck/planning (TPUs), there will come a time when AI investments will slow down, and the cost of revenue will become more important.
Comment by alecco 1 day ago
Comment by geodel 1 day ago
If anything Apple should notice it is Anthropic has got a really good marketing team and it would be no shame if they pick a trick or two from them.
Comment by throwaway98797 1 day ago
employees will always suffer.
Comment by hedora 1 day ago
Anthropic and OpenAI are far behind state of the art for the entire curve except the “extremely expensive for barely measurable improvements” part.
GLM is probably the third most expensive frontier model (benchmarks and reviews will say for sure), and is apparently ~Opus 4.6 for 10% the inference cost.
The last I checked, qwen was still owning the 24-32GiB RAM range (it runs reasonably without a GPU!) and somewhere around 3.5-4 generation models.
Also, even anthropic says Mythos ~= ChatGPT 5.5, so it’s unlikely either one is leaving the other behind. The big problem they both have is they asked for the government to gate keep model releases and use cases, and their wish was granted.
That’s knocked them back 6 months already. Anthropic’s only frontier offering has been taken down.
Comment by tedggh 1 day ago
Comment by joenot443 1 day ago
Comment by thrill 1 day ago
It's like the difference to talking to two smartest kids in a class, but one really belongs a grade higher - and the other hasn't learned yet to ask the questions that encourage it to dig in that little bit more for the additional multi-order effects.
Comment by yfontana 1 day ago
Comment by tedggh 1 day ago
Comment by hedora 1 day ago
I didn’t use it on big enough tasks to notice any improvement.
I had been hitting plan limits pretty regularly, but fixed it by changing my workflow. That also increased the success rate of claude by an order of magnitude.
Comment by embedding-shape 2 days ago
Truly fascinating ecosystem and community in general, as experiences differ so wildly. Anthropic's models seems far behind OpenAI to me, especially when you get into "Pro" territory, and there doesn't seem to be any worthy competition to Pro Mode available at all.
And this is said with someone who use both platforms, and spend a lot of my day interacting with agents and LLMs in various ways. The interesting part is that probably so do you too, and probably your experience and what you share lines up with what you experience! Yet we come away with basically opposite takeaways :) I don't think either of us are wrong either, somehow.
Comment by haellsigh 1 day ago
I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.
Just my two cents.
Comment by embedding-shape 1 day ago
Yeah, exact prompting matters a lot, seemingly more than people think. There is definitely tradeoffs between how literal the models takes the prompts, on one hand it's useful for the model to ignore their own instinct when you know better, so they don't go chasing geese randomly, but on the other hand it's useful sometimes when they self-direct, when you misworded something and it's obvious you meant something different because of the context, and similar things. They're basically good at different things.
Really agree every model isn't equal and they aren't as interchangeable without adjusting how you prompt them as people seem to think.
Comment by WarmWash 1 day ago
Comment by JumpCrisscross 1 day ago
At which point it’s fair to reject the commoditization label.
Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.
Comment by embedding-shape 1 day ago
They're missing in the discussion because the ones you can run locally, aren't actually "one step away from other closed-source labs" in practice when you use them. They might benchmark as such, but they're sadly far away from measuring up to those scores except for very specific use cases, even when you have say 96GB of VRAM available to run the bigger models even most (at home) consumers won't be able to run.
Comment by JumpCrisscross 1 day ago
And they probably won’t be for at least another decade. Comparing like with like, flagship model running on the best hardware it can run on, Qwen is close.
Comment by embedding-shape 1 day ago
I wish so badly this was true, but sadly today it just isn't.
Comment by JumpCrisscross 1 day ago
Comment by computerex 1 day ago
Comment by embedding-shape 1 day ago
Comment by alecco 2 days ago
Comment by embedding-shape 1 day ago
Comment by awongh 1 day ago
Spend for compute seems like it needs to increase to get the next iterations of models, and even if they IPO the money might run out before they can solidify their revenue streams.
All while Google just needs to survive long enough with their good-enough models and do it without really putting themselves in any existential financial risk.
And ideally the chinese models are also still there keeping everyone honest.
The true dystopic worst case is a Google monopoly on cutting edge AI.
Comment by jimbokun 1 day ago
Comment by wolttam 1 day ago
But what I think a lot of people miss is that the market for the truly bleeding edge (developing bio-tech, building the most sophisticated software stacks (probably with a tilt towards simulation, GPU kernel optimization, etc)) is not the whole market.
There's a plethora of use-cases for models that are not on the bleeding edge. If I can solve my relatively simple problems with an off-the-shelf model for a minuscule fraction of the cost of the frontier, I'm going to.
Comment by thewebguyd 1 day ago
Its somewhat of a myth that you need the most advanced, expensive model for software development.
Comment by johsole 1 day ago
Comment by nxobject 1 day ago
Well, in domains like SWE where Anthropic's putting in the effort. I don't they'll make the claims that OpenAI makes about how their models are pushing the life sciences forward, for example.
Comment by afavour 1 day ago
Fable might well be a better model but it’s too expensive for everyday AI use. Definitely if we’re talking about the kind of stuff you’re going to want to do on your phone. Even for coding, I’m not going to reach for Fable (well, when I can…) for 95% of the work I do.
I don’t believe a mature AI industry is going to have a one size fits all, single winner.
Comment by tedggh 1 day ago
Comment by bushbaba 1 day ago
Some of the harness even let you run a local model for most things, and only pay for the latest frontier models when needed, which cuts down cost drastically.
Comment by zitterbewegung 1 day ago
Comment by HPsquared 1 day ago
Comment by axus 1 day ago
Comment by tedggh 1 day ago
Comment by hylaride 1 day ago
Most of the ones that survived did so due to being able to pick up distressed assets and at values that could then be profitably monetized - a move that it would not surprise me to see repeat itself in the LLM space (we'll see).
Comment by colechristensen 1 day ago
The fact that telcos couldn't charge rent was a primary reason the Internet was so successful.
Remember $0.10 per text message? You bet in some alternate timeline AT&T charges $0.10 per webpage visit and we're stuck on 100kbps connections because the monopoly doesn't want to innovate.
Comment by enos_feedler 1 day ago
Comment by paulsutter 1 day ago
Comment by post-it 1 day ago
Extremely tangential, but this is my favourite upshot of AI. For decades, companies have been walling off their services and forcing us into their fuckass UIs. Now over the course of the last twelve months, suddenly everything has an MCP and I can use it through my command line chat interface.
Any company that doesn't adapt gets so hammered by people's AI-DIY web scrapers that they have no choice but to cave.
Comment by swingboy 2 days ago
Comment by embedding-shape 2 days ago
Comment by ABS 1 day ago
But we can imagine that the balance of what's on-device vs what's remote will move continuously towards the former as time, improved HW and improved local models keep progressing
Comment by brookst 1 day ago
From a user’s perspective, it doesn’t matter.
Comment by sqquima 1 day ago
Comment by WorldMaker 1 day ago
Comment by 5701652400 1 day ago
Anthropic literally says "Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses." — Apple straight up lied
Comment by Tagbert 1 day ago
the Swift package for Claude for Foundation Models is about sending calls to Claude. That had nothing to do with Apples models which do use local models and models on Private Cloud Compute.
Your accusation that "Apple straight up lied" is based on misunderstanding TFA.
Comment by halJordan 1 day ago
Comment by amelius 1 day ago
Comment by hedora 1 day ago
They’re typically a bit better on high TDP stuff, and a bit worse on low TDP. They mostly match in the middle. I have a $500 AMD NUC and a slightly older $2000 MBP. Inference throughput is within 2x.
The comparison is a little messy: AMD currently maxes out at 128GB of RAM vs Apple’s discontinued 512. Apple has nothing to rival the Steam Deck.
Comment by jimbokun 1 day ago
Android succeeded at this to an extent with phones, but Apple has been able to keep its products differentiated enough in the minds of consumers to maintain their premium pricing. So far.
Comment by Danox 1 day ago
Comment by klausa 2 days ago
Comment by matwood 2 days ago
Comment by mr_toad 1 day ago
Comment by klausa 2 days ago
Comment by embedding-shape 2 days ago
Comment by klausa 2 days ago
That API has no user-facing components, and has no influence over UX of what the end-users are interacting with.
The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.
Comment by embedding-shape 2 days ago
That's the point! That's the whole "white-labeling" part, and what the commentator earlier is talking about. You're very close in understanding the context here!
Comment by klausa 2 days ago
Comment by embedding-shape 1 day ago
Comment by klausa 1 day ago
I'd genuinely like to understand where you're coming from more.
I think we're all in agreement that this framework is very much about letting developers swap the models easily, and treat them as commodities. That seems pretty obvious.
I do however still don't see how this has anything to do with controlling the UX (or the new Siri for that matter! The new Siri doesn't use Anthropic models, and there are no extensions point for it to do so — that's pretty much the whole reason why it won't be available in the EU).
Help me see your point of view!
Comment by embedding-shape 1 day ago
The way I see it, isn't about what is immediately there right now today, but what intent it signals, or what path Apple is planning. Yes, today it's ClaudeForFoundationModels, but the FoundationModels stuff will be used to allowed switching between models, probably without users noticing, and who knows what Apple will ultimately surface to users, tends to be in the direction of less user-control.
But there is a lot of assumptions, guesses and extrapolation from that, I think you're right if you focus only what's there right now, rather than trying to "see into the future" which harrouet basically started doing with their root comment.
Comment by geodel 1 day ago
Same is happening to Claude software package as it would stand behind branded Apple foundation models. From pure software developer thinking this is exactly what Claude offered here so where is the issue? Issue is in larger space where Apple could take steps to block Claude out of their ecosystem if they so wish at some point and there is little Claude / Anthropic would do if Apple Foundation is the only thing that Apple consumers would know about.
Comment by klausa 1 day ago
But this is very much _not_ what this is.
Apple showed a bunch of new APIs at WWDC last week. One of this is a way for a developers to interact with LLM's in a way that let's you easily swap out models (with a bunch of other niceties around it), including swapping between on-device and remote models.
This is _Anthropic_ (not Apple!) shipping their support for that framework, so you can also switch between different Anthropic models using the same APIs you'd use to swap between a local or PCC model.
I expect OpenAI will probably ship their shims in the next couple of weeks too? (You can probably vibe-code one in half an hour if you point Codex at the Anthropic one, tbh).
(Apple also doesn't use "Apple Foundation Model" anywhere in the user-facing marketing materials AFAICT, this is strictly developer facing terminology, but I could be wrong?)
My impression is that people are _wildly_ misunderstanding what this _actually_ is, and running wild with speculation/interpretation.
Comment by butlike 1 day ago
Comment by klausa 1 day ago
Are you thinking about Intents? That lets Siri interact with data (and perform some actions in them) from your apps, but it is something completely different.
You can definitely expose things from your app via Intents that will end up calling an external arbitrary LLM somewhere, but it does not require using Foundation Models API whatsoever.
Comment by kcb 1 day ago
Comment by wuliwong 1 day ago
Comment by dlev_pika 1 day ago
Now if they can further reinforce their angle on Privacy, they might continue to be what they are (or more)
Comment by post-it 1 day ago
Ahh I was hoping for the opposite: all of the existing features of Claude Code but somehow running locally on my laptop's neural engine. A pipe dream on an M2 with 8 GB of RAM, but I had a flicker of hope there.
Comment by inickt 1 day ago
https://developer.apple.com/videos/play/wwdc2026/232/ https://www.youtube.com/watch?v=wykPErJ8M-8
Comment by satvikpendem 1 day ago
Comment by FuriouslyAdrift 1 day ago
Comment by godzillabrennus 1 day ago
In 10 years, I hope my MacBook Pro can run today's frontier models and has 1TB of unified Memory.
Comment by shadowpho 1 day ago
Comment by tempoponet 1 day ago
Comment by shadowpho 13 hours ago
It’s like saying “well if Subaru launches a nice hybrid suv for $1k it’ll sell like pancakes” and yeah.. but it costs more in steel/ram to build that lol
Comment by connicpu 1 day ago
Comment by FuriouslyAdrift 1 day ago
GIGABYTE G383-R80-AAP1 for example
Comment by jayd16 1 day ago
Comment by Danox 1 day ago
Comment by manoDev 1 day ago
Comment by dboreham 1 day ago
Comment by bigyabai 1 day ago
I don't think you understand why people buy Nvidia hardware if you're beating the "just add more dual channel DDR, bro" drum. Apple wouldn't even be able to extinguish AMD with a product like that, it's all slow memory being fed into a raster-first GPU architecture.
Comment by jubilanti 1 day ago
You can use environment variables to have claude code query literally any endpoint you choose as long as it has a compatible API.
Comment by 5701652400 1 day ago
..but instead we get Claude, hosted who-knows-where. maybe in X-AI datacenters? maybe in Amazon somewhere? who knows..
Comment by willy_k 1 day ago
Comment by rock_artist 2 days ago
I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.
I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).
I didn't see anything suggesting that's the case.
Comment by scosman 1 day ago
They were wrong when their on-device model was way behind. They still might be right in the long term.
While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.
Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.
- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.
- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.
- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.
- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)
You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.
Comment by scotty79 1 day ago
Comment by scosman 1 day ago
Comment by jtfrench 2 days ago
Comment by alwillis 2 days ago
Comment by rock_artist 2 days ago
- Application can ask for specific model, if available use it. if not, ask to download it (or try some fallback / alternative)
- User can manage models. So as a user I can clean unused models (and for non-techie have something similar to offloading apps when unused for some period of time).
Comment by klausa 2 days ago
Comment by satvikpendem 1 day ago
Comment by trvz 2 days ago
Comment by rock_artist 2 days ago
Comment by mft_ 2 days ago
Comment by DrScientist 2 days ago
And now given everybody now does this I guess the incentive to stop breaking stuff reduces even further.
Might as well have static binaries.
Comment by simondotau 2 days ago
It’s a nice language though.
Comment by kstrauser 1 day ago
The main difference is that Python use to make you have to know that the virtualenv existed. Now `uv run` and `poetry run` abstract that away so you don’t have to interact with it if you don’t want to.
Comment by DrScientist 23 hours ago
I'm just speculating that's it's a self reinforcing pattern - compatibility problems leads to isolated builds, which reduces peoples concern for backwards compatibility, which makes isolated builds ever more important.
Maybe it's fine - a trade off that allows greater velocity of development, it just seems attention to backwards compatibility is becoming a thing of the past.
Comment by whstl 2 days ago
The original plan was to ship Python. However I found out I can migrate them to CoreML, and now it's a model file + Swift code. I got some massive performance improvements as well.
Of course, this doesn't work at all for non-Mac environments, but it was nice to be able to do it. (Also doesn't solve the duplicate large models problem)
Comment by hedora 1 day ago
Python heaviness is a more fundamental problem.
Comment by ac29 1 day ago
Comment by taneq 2 days ago
Comment by mohamedkoubaa 1 day ago
Comment by GeekyBear 1 day ago
> At WWDC, Apple announced that it's opening its Foundation Models framework to third-party cloud model providers. Starting with iOS 27, macOS 27, iPadOS 27, visionOS 27 and watchOS 27, model providers can implement the new public LanguageModel protocol to provide a common interface for model inference. We've made Gemini models available to the Foundation Models framework through the Firebase Apple SDK.
This provides a fully native development experience — cloud-hosted Gemini models can plug directly into the Foundation Models framework using the same API. That means the on-device Apple model and cloud-hosted Gemini models sit behind a shared API surface, so you can easily swap between local and cloud inference to fit your use case.
https://blog.google/innovation-and-ai/technology/developers-...
Comment by jdgoesmarching 1 day ago
Comment by klausa 1 day ago
Protocol in this context means a Swift language feature, like interface in some other languages: https://docs.swift.org/swift-book/documentation/the-swift-pr...
Comment by daniel_iversen 2 days ago
Comment by tarcon 2 days ago
Comment by HDThoreaun 1 day ago
Comment by drivebyhooting 1 day ago
Comment by HDThoreaun 1 day ago
Comment by willis936 2 days ago
Comment by klausa 2 days ago
The framework's whole deal is that it lets you use the same API to target either the device built-in models, the Apple-hosted online models (Private Cloud Computer), or write your own shims to call out to arbitrarily hosted online models.
You can then dynamically route your calls to a different kind of model/provider, using system APIs, without having to write your own abstraction layer over "I want to use local model for this, but I want to use Claude for that", or having to integrate your own API integration with Anthropic/OpenAI APIs.
It abstracts things like tool calling in one place; and has a bunch of other niceties/oddities (it keeps the same "transcript" going, even if you dynamically switch providers/models during a session) and some other things.
Comment by claud_ia 2 days ago
Comment by pprotas 2 days ago
Comment by _the_inflator 2 days ago
Comment by coldtea 2 days ago
Comment by saagarjha 2 days ago
Comment by Danox 1 day ago
Comment by NorwegianDude 2 days ago
Comment by oefrha 2 days ago
Comment by cush 1 day ago
Comment by aesthesia 1 day ago
> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses. Usage is billed to your Anthropic account at standard API pricing. Your app decides when to use Claude and when to use Apple's on-device model: pass whichever model you want to each session.
Comment by thombles 2 days ago
Comment by FinnKuhn 2 days ago
Comment by mathisfun123 2 days ago
Lol bro this is literally it this is the model they've been training (was Apple Foundation model not a big enough hint?)
Comment by mcintyre1994 2 days ago
Comment by embedding-shape 2 days ago
With other words, it's unlikely to happen as there is no money in it. Better for Apple to create some new subscription "AI" and "AI-lite" plans people can subscribe to, and since Apple is a company and we all know what those care about, it's unlikely to become a utopia of local models running on your phone.
Comment by criddell 2 days ago
Comment by Danox 1 day ago
Comment by VadimPR 2 days ago
Comment by hajile 1 day ago
"You pay an indeterminant amount of money to ask a question and you might not even get the response you want without spending even more money" doesn't appeal to most people who aren't gamblers and explaining how "thank you" at the end of a long exchange can be expensive due to context is an even harder thing for an average person to swallow.
Token cost going up/down like a yo-yo also doesn't help. Normal users NEED fixed costs and don't want to expend energy constantly keeping up with the AI meta. "My subscription lasted much longer last month" isn't a winning problem either.
I think Apple is correct that Local LLM for most things is the future.
Comment by nate 1 day ago
Right now for allihat.com I just let people use the Apple model locally if you don't feel like using the claude key. And my conversions to paying user shot up like 3x! But it really isn't a replacement obviously to claude. I was hoping Apple would make proxying to Claude some kind of thing they do for me so I also don't have to proxy to my own server just to try and manage API to Claude usage.
Comment by daralthus 1 day ago
Comment by Maxious 2 days ago
Apple is offering developers with less than 2 million downloads free AI models via their servers https://techcrunch.com/2026/06/08/apple-bets-cheaper-ai-will...
Comment by klausa 2 days ago
Comment by cush 1 day ago
Comment by otter0 1 day ago
Then Apple quietly refuses to participate by not investing tens or hundreds of billions in creating a competing LLM. Sure, they resell Claude for the marks or utilize Gemini to placate the gullible fools but they know what's up.
https://www.microsoft.com/en-us/microsoft-copilot/for-indivi...
Comment by adithyassekhar 2 days ago
I know this is from a developer perspective. But as a consumer this is just funny.
Comment by saretup 2 days ago
Comment by zkmon 2 days ago
Layers are luxury and remove control and transparency.
Comment by _pdp_ 2 days ago
Comment by nl 2 days ago
Proxy (production)
For production, route requests through your own back end with .proxied. The relay at baseURL adds the Claude API credential server-side, so the app ships no key. The headers you provide are sent on every request so your proxy can authorize the caller.
https://platform.claude.com/docs/en/cli-sdks-libraries/libra...
Comment by r0fl 17 hours ago
Seems that the UX will be enough to win over users and investors
Comment by _josh_meyer_ 2 days ago
Comment by Traster 2 days ago
It's also smart for them to make sure the billing is going direct from Anthropic to the developer. The initial thought is "That means Apple's not taking a cut", but from the other side of it, developers who use this API are going to have to expose that cost to customers somehow, and that translates to subscription/InAppPurchase etc. on top of which Apple will get it's 30%.
Comment by mark_l_watson 1 day ago
What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?
Comment by pgt 2 days ago
Comment by klausa 2 days ago
Special emphasis on the "isn't compiled in yet" and "or construct one" bit.
Comment by 21-DOT-DEV 2 days ago
While expected, it’s still a bummer.
Comment by isoprophlex 2 days ago
Comment by theopsimist 1 day ago
Comment by gregman1 2 days ago
Comment by HelloUsername 2 days ago
Comment by ryanshrott 1 day ago
Comment by cush 1 day ago
Comment by hmokiguess 1 day ago
Comment by londons_explore 2 days ago
I don't like this model. Then all the user data is visible to the proxy.
Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.
We just need to live in the alternate universe where micro payments succeeded.
Comment by me551ah 2 days ago
Comment by laxmansharma 2 days ago
Comment by jedisct1 2 days ago
Comment by bentt 2 days ago
Comment by klausa 1 day ago
They are.
Comment by neuropacabra 2 days ago
Comment by 5701652400 1 day ago
Comment by ChrisArchitect 1 day ago
Comment by simianwords 1 day ago
Comment by hedora 1 day ago
Comment by xducn1 1 day ago
Comment by 64lamei 1 day ago
Comment by mlpicker 2 days ago
Comment by brookst 1 day ago
Comment by ABS 1 day ago
so Claude via FM dies offline while Apple's on-device SystemLanguageModel (the ~3B one) keeps working. It isn't a hybrid really: the framework just has both implement the same LanguageModelSession protocol so "local 3B" and "remote frontier model" become a one-argument swap.
IMHO what's worth internalising is that the two share an API but nothing else: the on-device path runs on Apple's Neural Engine and costs battery (you can watch ANE power ramp while it works) while the cloud path costs API credits/tokens and does zero local compute. Same code, opposite cost model.
Comment by tonyoconnell 2 days ago
Apple's Foundation Models framework (shipping in iOS 27 / macOS 27 this fall) is the standard Swift API for on-device AI — the same API Apple uses for their own small model. This package makes Claude plug into that same API as a drop-in swap.
// Apple's on-device model
let session = LanguageModelSession(model: SystemLanguageModel.default)
// Claude — same API, just different model constructor
let session = LanguageModelSession(model: ClaudeLanguageModel(name: .sonnet4_6, auth: auth))
One API, two tiers. You write your app once against the Foundation Models protocol. On-device model handles fast/free/private tasks; Claude handles heavy reasoning, long context, or capability gaps — you swap the model, not your code.You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.
Comment by swordlucky666 1 day ago
Comment by stackedinserter 1 day ago
Comment by hedora 1 day ago
Enough is enough. I’m seriously evaluating open models this week.
Comment by hit8run 2 days ago
Comment by insumanth 2 days ago