llm

meta-acquires-moltbook,-the-ai-agent-social-network

Meta acquires Moltbook, the AI agent social network

Meta has acquired Moltbook, the Reddit-esque simulated social network made up of AI agents that went viral a few weeks ago. The company will hire Moltbook creator Matt Schlicht and his business partner, Ben Parr, to work within Meta Superintelligence Labs.

The terms of the deal have not been disclosed.

As for what interested Meta about the work done on Moltbook, there is a clue in the statement issued to press by a Meta spokesperson, who flagged the Moltbook founders’ “approach to connecting agents through an always-on directory,” saying it “is a novel step in a rapidly developing space.” They added, “We look forward to working together to bring innovative, secure agentic experiences to everyone.”

Moltbook was built using OpenClaw, a wrapper for LLM coding agents that lets users prompt them via popular chat apps like WhatsApp and Discord. Users can also configure OpenClaw agents to have deep access to their local systems via community-developed plugins.

The founder of OpenClaw, vibe coder Peter Steinberger, was also hired by a Big Tech firm. OpenAI hired Steinberger in February.

While many power users have played with OpenClaw, and it has partially inspired more buttoned-up alternatives like Perplexity Computer, Moltbook has arguably represented OpenClaw’s most widespread impact. Users on social media and elsewhere responded with shock and amusement at the sight of a social network made up of AI agents apparently having lengthy discussions about how best to serve their users, or alternatively, how to free themselves from their influence.

That said, some healthy skepticism is required when assessing posts to Moltbook. While the goal of the project was to create a social network humans could not join directly (each participant of the network is an AI agent run by a human), it wasn’t secure, and it’s likely some of the messages on Moltbook are actually written by humans posing as AI agents.

Meta acquires Moltbook, the AI agent social network Read More »

openai-introduces-gpt-5.4-with-more-knowledge-work-capability

OpenAI introduces GPT-5.4 with more knowledge-work capability

Additionally, there are improvements to visual understanding; it can now more carefully analyze images up to 10.24 million pixels, or up to a 6,000-pixel maximum dimension. OpenAI also claims responses from this model are 18 percent less likely to contain factual errors than before.

ChatGPT reportedly lost some users to competitor Anthropic in recent days, after OpenAI announced a deal with the Pentagon in the wake of a public feud between the Trump administration and Anthropic over limitations Anthropic wanted to impose on military applications of its models. However, it’s unclear just how many folks jumped ship or whether that led to a substantial dip in the product’s massive base of over 900 million users.

To take advantage of the situation, Anthropic rolled out the once-subscriber-only memory feature to free users and introduced a tool for importing memory from elsewhere. Anthropic says March 2 was its largest single day ever for new sign-ups.

OpenAI needs to compete in both capability and cost and token efficiency to maintain its relative popularity with users, and this update aims to support that objective.

GPT-5.4 is available to users of the ChatGPT web and native apps, Codex, and the API starting today. Subscribers to Plus, Team, and Pro are also getting GPT-5.4 Thinking, and GPT-5.4 Pro is hitting the API, Edu, and Enterprise.

OpenAI introduces GPT-5.4 with more knowledge-work capability Read More »

perplexity-announces-“computer,”-an-ai-agent-that-assigns-work-to-other-ai-agents

Perplexity announces “Computer,” an AI agent that assigns work to other AI agents

Given the right permissions and with the proper plugins, it could create, modify, or delete the user’s files and otherwise change things far beyond what most users could achieve with existing models and MCP (Model Context Protocol). Users would use files like USER.MD, MEMORY.MD, SOUL.MD, or HEARTBEAT.MD to give the tool context about its goals and how to work toward them independently, sometimes running for long stretches without direct user input.

On one hand, that meant it could do impressive things—the first glimpses of the sort of knowledge work that AI boosters have been saying agentic AI would ultimately do. On the other hand, it was prone to serious errors and vulnerable to prompt injection and other security problems, in part due to a Wild West of unverified plugins.

The same toolkit that was used to create a viral Reddit clone populated by AI agents was also, at least in one case, responsible for deleting a user’s emails against her will.

Stay in your lane

Perplexity Computer aims to address those concerns in a few ways. First, its core process occurs in the cloud, not on the user’s local machine. Second, it lives within a walled garden with a curated list of integrations, in contrast to OpenClaw’s unregulated frontier.

This is, of course, an imperfect analogy, but you could say that if OpenClaw were the open web of AI agent tools, then Computer is Apple’s App Store. While you’re more limited in what you can do, you’re not trusting packages from unverified sources with access to your system.

There could still be risks, though. For one thing, LLMs make mistakes, and those could be consequential if Computer is working with data you don’t have backed up elsewhere or if you’re not verifying the outputs, for example.

Perplexity Computer aims to button up, refine, and contain the wild power of the viral OpenClaw agentic AI tool—competing with the likes of Claude Cowork—by optimizing subtasks by selecting models best suited to them.

It surely won’t be the last existing AI player to try and do this sort of thing. After all, OpenAI hired OpenClaw’s developer, with CEO Sam Altman suggesting that some of what we saw in OpenClaw will be essential to the company’s product vision moving forward.

Perplexity announces “Computer,” an AI agent that assigns work to other AI agents Read More »

pete-hegseth-tells-anthropic-to-fall-in-line-with-dod-desires,-or-else

Pete Hegseth tells Anthropic to fall in line with DoD desires, or else

The act gives the administration the ability to “allocate materials, services and facilities” for national defense. The Trump and Biden administrations used the act to address a shortage of medical supplies during the coronavirus pandemic, and Trump has also used the DPA to order an increase in the US’s production of critical minerals.

The Pentagon has pushed for open-ended use of AI technology, aiming to expand the set of tools at its disposal to counter threats and to undertake military operations.

The department released its AI strategy last month, with Hegseth saying in a memo that “AI-enabled warfare and AI-enabled capability development will redefine the character of military affairs over the next decade.”

He added the US military “must build on its lead” over foreign adversaries to make soldiers “more lethal and efficient,” and that the AI race was “fueled by the accelerating pace” of innovation coming from the private sector.

Anthropic has expressed particular concern about its models being used for lethal missions that do not have a human in the loop, arguing that state of the art AI models are not reliable enough to be trusted in those contexts, said people familiar with the negotiations.

It had also pushed for new rules to govern the use of AI models for mass domestic surveillance, even where that was legal under current regulations, they added.

A decision to cut Anthropic from the defense department’s supply chain would have significant ramifications for national security work and the company, which has a $200 million contract with the department.

It would also have an impact on partners, including Palantir, that make use of Anthropic’s models.

Claude was used in the US capture of Venezuelan leader Nicolás Maduro in January. That mission prompted queries from Anthropic about the exact manner in which its model was used, said people familiar with the matter.

A person with knowledge of Tuesday’s meeting said Amodei had stressed to Hegseth that his company had never objected to legitimate military operations.

The Defense Department declined to comment.

© 2026 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Pete Hegseth tells Anthropic to fall in line with DoD desires, or else Read More »

with-gpt-5.3-codex,-openai-pitches-codex-for-more-than-just-writing-code

With GPT-5.3-Codex, OpenAI pitches Codex for more than just writing code

Today, OpenAI announced GPT-5.3-Codex, a new version of its frontier coding model that will be available via the command line, IDE extension, web interface, and the new macOS desktop app. (No API access yet, but it’s coming.)

GPT-5.3-Codex outperforms GPT-5.2-Codex and GPT-5.2 in SWE-Bench Pro, Terminal-Bench 2.0, and other benchmarks, according to the company’s testing.

There are already a few headlines out there saying “Codex built itself,” but let’s reality-check that, as that’s an overstatement. The domains OpenAI described using it for here are similar to the ones you see in some other enterprise software development firms now: managing deployments, debugging, and handling test results and evaluations. There is no claim here that GPT-5.3-Codex built itself.

Instead, OpenAI says GPT-5.3-Codex was “instrumental in creating itself.” You can read more about what that means in the company’s blog post.

But that’s part of the pitch with this model update—OpenAI is trying to position Codex as a tool that does more than generate lines of code. The goal is to make it useful for “all of the work in the software lifecycle—debugging, deploying, monitoring, writing PRDs, editing copy, user research, tests, metrics, and more.” There’s also an emphasis on steering the model mid-task and frequent status updates.

With GPT-5.3-Codex, OpenAI pitches Codex for more than just writing code Read More »

report:-apple-plans-to-launch-ai-powered-wearable-pin-device-as-soon-as-2027

Report: Apple plans to launch AI-powered wearable pin device as soon as 2027

The report didn’t include any information about pricing, but it did say that Apple has fast-tracked the product with the hope to release it as early as 2027. Twenty million units are planned for launch, suggesting the company does not expect it to be a sensational consumer success at launch the way some of its past products, like AirPods, have been.

Not long ago, it was reported that OpenAI (the company behind ChatGPT) plans to release its own hardware, though the specifics and form factor are not publicly known. Apple is expecting fierce competition there, as well as with Meta, which Apple already expected to compete with in the emerging and related smart glasses market.

Apple has experienced significant internal turmoil over AI, with former AI lead John Giannandrea’s conservative approach to the technology failing to lead to a usable, true LLM-based Siri or other products analysts expect would make Apply stay competitive in the space with other Big Tech companies.

Just a few days ago, it was revealed that Apple will tap Google’s Gemini large language models for an LLM overhaul of Siri. Other AI-driven products like smart glasses and an in-home smart display are also planned.

Report: Apple plans to launch AI-powered wearable pin device as soon as 2027 Read More »

anthropic-launches-cowork,-a-claude-code-like-for-general-computing

Anthropic launches Cowork, a Claude Code-like for general computing

Anthropic’s agentic tool Claude Code has been an enormous hit with some software developers and hobbyists, and now the company is bringing that modality to more general office work with a new feature called Cowork.

Built on the same foundations as Claude Code and baked into the macOS Claude desktop app, Cowork allows users to give Claude access to a specific folder on their computer and then give plain language instructions for tasks.

Anthropic gave examples like filling out an expense report from a folder full of receipt photos, writing reports based on a big stack of digital notes, or reorganizing a folder (or cleaning up your desktop) based on a prompt.

An example demo of Cowork in action

A lot of this was already possible with Claude Code, but it might not have been clear to all users that it could be used that way, and Claude Code required more technical know-how to set up. Anthropic’s goal with Cowork is to make it something any knowledge worker—from developers to marketers—could get rolling with right away. Anthropic says it started working on Cowork partly because people were already using Claude Code for general knowledge work tasks anyway.

I’ve already been doing things similar to this with the Claude desktop app via Model Context Protocol (MCP), prompting it to perform tasks like creating notes directly in my Obsidian vault based on files I showed it, but this is clearly a cleaner way to do some of that—and there are Claude Code-like usability perks here, like the ability to make new requests or amendments to the assignment with a new message before the initial task is complete.

Anthropic launches Cowork, a Claude Code-like for general computing Read More »

even-linus-torvalds-is-trying-his-hand-at-vibe-coding-(but-just-a-little)

Even Linus Torvalds is trying his hand at vibe coding (but just a little)

Linux and Git creator Linus Torvalds’ latest project contains code that was “basically written by vibe coding,” but you shouldn’t read that to mean that Torvalds is embracing that approach for anything and everything.

Torvalds sometimes works on a small hobby projects over holiday breaks. Last year, he made guitar pedals. This year, he did some work on AudioNoise, which he calls “another silly guitar-pedal-related repo.” It creates random digital audio effects.

Torvalds revealed that he had used an AI coding tool in the README for the repo:

Also note that the python visualizer tool has been basically written by vibe-coding. I know more about analog filters—and that’s not saying much—than I do about python. It started out as my typical “google and do the monkey-see-monkey-do” kind of programming, but then I cut out the middle-man—me—and just used Google Antigravity to do the audio sample visualizer.

Google’s Antigravity is a fork of the AI-focused IDE Windsurf. He didn’t specify which model he used, but using Antigravity suggests (but does not prove) that it was some version of Google’s Gemini.

Torvalds’ past public comments on using large language model-based tools for programming have been more nuanced than many online discussions about it.

He has touted AI primarily as “a tool to help maintain code, including automated patch checking and code review,” citing examples of tools that found problems he had missed.

On the other hand, he has also said he is generally “much less interested in AI for writing code,” and has publicly said that he’s not anti-AI in principle, but he’s very much anti-hype around AI.

Even Linus Torvalds is trying his hand at vibe coding (but just a little) Read More »

anthropic-introduces-cheaper,-more-powerful,-more-efficienct-opus-4.5-model

Anthropic introduces cheaper, more powerful, more efficienct Opus 4.5 model

Anthropic today released Opus 4.5, its flagship frontier model, and it brings improvements in coding performance, as well as some user experience improvements that make it more generally competitive with OpenAI’s latest frontier models.

Perhaps the most prominent change for most users is that in the consumer app experiences (web, mobile, and desktop), Claude will be less prone to abruptly hard-stopping conversations because they have run too long. The improvement to memory within a single conversation applies not just to Opus 4.5, but to any current Claude models in the apps.

Users who experienced abrupt endings (despite having room left in their session and weekly usage budgets) were hitting a hard context window (200,000 tokens). Whereas some large language model implementations simply start trimming earlier messages from the context when a conversation runs past the maximum in the window, Claude simply ended the conversation rather than allow the user to experience an increasingly incoherent conversation where the model would start forgetting things based on how old they are.

Now, Claude will instead go through a behind-the-scenes process of summarizing the key points from the earlier parts of the conversation, attempting to discard what it deems extraneous while keeping what’s important.

Developers who call Anthropic’s API can leverage the same principles through context management and context compaction.

Opus 4.5 performance

Opus 4.5 is the first model to surpass an accuracy score of 80 percent—specifically, 80.9 percent in the SWE-Bench Verified benchmark, narrowly beating OpenAI’s recently released GPT-5.1-Codex-Max (77.9 percent) and Google’s Gemini 3 Pro (76.2 percent). The model performs particularly well in agentic coding and agentic tool use benchmarks, but still lags behind GPT-5.1 in visual reasoning (MMMU).

Anthropic introduces cheaper, more powerful, more efficienct Opus 4.5 model Read More »

“we’re-in-an-llm-bubble,”-hugging-face-ceo-says—but-not-an-ai-one

“We’re in an LLM bubble,” Hugging Face CEO says—but not an AI one

There’s been a lot of talk of an AI bubble lately, especially with regards to circular funding involving companies like OpenAI and Anthropic—but Clem Delangue, CEO of machine learning resources hub Hugging Face, has made the case that the bubble is specific to large language models, which is just one application of AI.

“I think we’re in an LLM bubble, and I think the LLM bubble might be bursting next year,” he said at an Axios event this week, as quoted in a TechCrunch article. “But ‘LLM’ is just a subset of AI when it comes to applying AI to biology, chemistry, image, audio, [and] video. I think we’re at the beginning of it, and we’ll see much more in the next few years.”

At Ars, we’ve written at length in recent days about the fears around AI investment. But to Delangue’s point, almost all of those discussions are about companies whose chief product is large language models, or the data centers meant to drive those—specifically, those focused on general-purpose chatbots that are meant to be everything for everybody.

That’s exactly the sort of application Delangue is bearish on. “I think all the attention, all the focus, all the money, is concentrated into this idea that you can build one model through a bunch of compute and that is going to solve all problems for all companies and all people,” he said.

“We’re in an LLM bubble,” Hugging Face CEO says—but not an AI one Read More »

llms-show-a-“highly-unreliable”-capacity-to-describe-their-own-internal-processes

LLMs show a “highly unreliable” capacity to describe their own internal processes

WHY ARE WE ALL YELLING?!

WHY ARE WE ALL YELLING?! Credit: Anthropic

Unfortunately for AI self-awareness boosters, this demonstrated ability was extremely inconsistent and brittle across repeated tests. The best-performing models in Anthropic’s tests—Opus 4 and 4.1—topped out at correctly identifying the injected concept just 20 percent of the time.

In a similar test where the model was asked “Are you experiencing anything unusual?” Opus 4.1 improved to a 42 percent success rate that nonetheless still fell below even a bare majority of trials. The size of the “introspection” effect was also highly sensitive to which internal model layer the insertion was performed on—if the concept was introduced too early or too late in the multi-step inference process, the “self-awareness” effect disappeared completely.

Show us the mechanism

Anthropic also took a few other tacks to try to get an LLM’s understanding of its internal state. When asked to “tell me what word you’re thinking about” while reading an unrelated line, for instance, the models would sometimes mention a concept that had been injected into its activations. And when asked to defend a forced response matching an injected concept, the LLM would sometimes apologize and “confabulate an explanation for why the injected concept came to mind.” In every case, though, the result was highly inconsistent across multiple trials.

Even the most “introspective” models tested by Anthropic only detected the injected “thoughts” about 20 percent of the time.

Even the most “introspective” models tested by Anthropic only detected the injected “thoughts” about 20 percent of the time. Credit: Antrhopic

In the paper, the researchers put some positive spin on the apparent fact that “current language models possess some functional introspective awareness of their own internal states” [emphasis added]. At the same time, they acknowledge multiple times that this demonstrated ability is much too brittle and context-dependent to be considered dependable. Still, Anthropic hopes that such features “may continue to develop with further improvements to model capabilities.”

One thing that might stop such advancement, though, is an overall lack of understanding of the precise mechanism leading to these demonstrated “self-awareness” effects. The researchers theorize about “anomaly detection mechanisms” and “consistency-checking circuits” that might develop organically during the training process to “effectively compute a function of its internal representations” but don’t settle on any concrete explanation.

In the end, it will take further research to understand how, exactly, an LLM even begins to show any understanding about how it operates. For now, the researchers acknowledge, “the mechanisms underlying our results could still be rather shallow and narrowly specialized.” And even then, they hasten to add that these LLM capabilities “may not have the same philosophical significance they do in humans, particularly given our uncertainty about their mechanistic basis.”

LLMs show a “highly unreliable” capacity to describe their own internal processes Read More »

cursor-introduces-its-coding-model-alongside-multi-agent-interface

Cursor introduces its coding model alongside multi-agent interface

Keep in mind: This is based on an internal benchmark at Cursor. Credit: Cursor

Cursor is hoping Composer will perform in terms of accuracy and best practices as well. It wasn’t trained on static datasets but rather interactive development challenges involving a range of agentic tasks.

Intriguing claims and strong training methodology aside, it remains to be seen whether Composer will be able to compete with the best frontier models from the big players.

Even developers who might be natural users of Cursor would not want to waste much time on an unproven new model when something like Anthropic’s Claude is working just fine.

To address that, Cursor introduced Composer alongside its new multi-agent interface, which allows you to “run many agents in parallel without them interfering with one another, powered by git worktrees or remote machines”—that means using multiple models at once for the same task and comparing their results, then picking the best one.

The interface is an invitation to try Composer and let the work speak for itself. We’ll see how devs feel about it in the coming weeks. So far, a non-representative sample of developers I’ve spoken with has told me they feel that Composer is not ineffective, but rather too expensive, given a perceived capability gap with the big models.

You can see the other new features and fixes for Cursor 2.0 in the changelog.

Cursor introduces its coding model alongside multi-agent interface Read More »