Author name: Mike M.

fbi-stymied-by-apple’s-lockdown-mode-after-seizing-journalist’s-iphone

FBI stymied by Apple’s Lockdown Mode after seizing journalist’s iPhone

Apple made Lockdown Mode for people at high risk

CART couldn’t get anything from the iPhone. “Because the iPhone was in Lockdown mode, CART could not extract that device,” the government filing said.

The government also submitted a declaration by FBI Assistant Director Roman Rozhavsky that said the agency “has paused any further efforts to extract this device because of the Court’s Standstill Order.” The FBI did extract information from the SIM card “with an auto-generated HTML report created by the tool utilized by CART,” but “the data contained in the HTML was limited to the telephone number.”

Apple says that LockDown Mode “helps protect devices against extremely rare and highly sophisticated cyber attacks,” and is “designed for the very few individuals who, because of who they are or what they do, might be personally targeted by some of the most sophisticated digital threats.”

Introduced in 2022, Lockdown Mode is available for iPhones, iPads, and Macs. It must be enabled separately for each device. To enable it on an iPhone or iPad, a user would open the Settings app, tap Privacy & Security, scroll down and tap Lockdown Mode, and then tap Turn on Lockdown Mode.

The process is similar on Macs. In the System Settings app that can be accessed via the Apple menu, a user would click Privacy & Security, scroll down and click Lockdown Mode, and then click Turn On.

“When Lockdown Mode is enabled, your device won’t function like it typically does,” Apple says. “To reduce the attack surface that potentially could be exploited by highly targeted mercenary spyware, certain apps, websites, and features are strictly limited for security and some experiences might not be available at all.”

Lockdown Mode blocks most types of message attachments, blocks FaceTime calls from people you haven’t contacted in the past 30 days, restricts the kinds of browser technologies that websites can use, limits photo sharing, and imposes other restrictions. Users can exclude specific apps and websites they trust from these restrictions, however.

FBI stymied by Apple’s Lockdown Mode after seizing journalist’s iPhone Read More »

“capture-it-all”:-ice-urged-to-explain-memo-about-collecting-info-on-protesters

“Capture it all”: ICE urged to explain memo about collecting info on protesters

Senator Edward J. Markey (D-Mass.) demanded that Immigration and Customs Enforcement (ICE) confirm or deny the existence of a “domestic terrorists” database that lists US citizens who protest ICE’s immigration crackdown.

ICE “officers and senior Trump administration officials have repeatedly suggested that the Department of Homeland Security (DHS) is building a ‘domestic terrorists’ database comprising information on US citizens protesting ICE’s actions in recent weeks,” Markey wrote in a letter yesterday to Acting ICE Director Todd Lyons. “If such a database exists, it would constitute a grave and unacceptable constitutional violation. I urge you to immediately confirm or deny the existence of such a database, and if it exists, immediately shut it down and delete it.”

Creating a database of peaceful protesters “would constitute a shocking violation of the First Amendment and abuse of power,” and amount to “the kinds of tactics the United States rightly condemns in authoritarian governments such as China and Russia,” Markey said.

Markey’s letter said DHS officials “have repeatedly stated that the agency is engaged in efforts to monitor, catalog, and intimidate individuals engaged in peaceful protests,” and gave several examples. Trump border czar Tom Homan recently told Laura Ingraham on Fox News, “One thing I’m pushing for right now, Laura, we’re going to create a database where those people that are arrested for interference, impeding, and assault, we’re going to make them famous. We’re going to put their face on TV. We’re going to let their employers, and their neighborhoods, and their schools know who these people are.”

Markey’s letter called Homan’s comment “especially alarming given the numerous incidents in which DHS appears to have concluded that protesting ICE itself constitutes grounds for arrest.” Markey pointed to another recent incident in Portland, Maine, in which a masked ICE agent told an observer who was taking video that “we have a nice little database and now you’re considered a domestic terrorist.”

“Capture it all”: ICE urged to explain memo about collecting info on protesters Read More »

kimi-k2.5

Kimi K2.5

I had to delay this a little bit, but the results are in and Kimi K2.5 is pretty good.

  1. Official Introduction.

  2. On Your Marks.

  3. Positive Reactions.

  4. Skeptical Reactions.

  5. Kimi Product Accounts.

  6. Agent Swarm.

  7. Who Are You?

  8. Export Controls Are Working.

  9. Where Are You Going?

  10. Safety Not Even Third.

  11. It’s A Good Model, Sir.

Introducing Kimi K2.5,

Kimi.ai: Meet Kimi K2.5, Open-Source Visual Agentic Intelligence.

Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%)

Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%)

Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion.

Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup.

K2.5 is now live on

http://kimi.com

in chat mode and agent mode.

K2.5 Agent Swarm in beta for high-tier users.

For production-grade coding, you can pair K2.5 with Kimi Code.



API here. Tech blog here. Weights and code here.

Wu Haoning (Kimi): We are really taking a long time to prove this: everyone is building big macs but we bring you a kiwi instead.

You have multimodal with K2.5 everywhere: chat with visual tools, code with vision, generate aesthetic frontend with visual refs…and most basically, it is a SUPER POWERFUL VLM

Jiayuan (JY) Zhang: I have been testing Kimi K2.5 + @openclaw (Clawdbot) all day. I must say, this is mind-blowing!

It can almost do 90% of what Claude Opus 4.5 can do (mostly coding). Actually, I don’t know what the remaining 10% is, because I can’t see any differences. Maybe I should dive into the code quality.

Kimi K2.5 is open source, so you can run it fully locally. It’s also much cheaper than Claude Max if you use the subscription version.

$30 vs $200 per month

Kimi Product: Do 90% of what Claude Opus 4.5 can do, but 7x cheaper.

I always note who is the comparison point. Remember those old car ads, where they’d say ‘twice the mileage of a Civic and a smoother ride than the Taurus’ and then if you were paying attention you’d think ‘oh, so the Civic and Taurus are good cars.’

API access is also available from Nvidia, and others.

As usual, benchmarks are highly useful, but easy to overinterpret.

Kimi K2.5 gets to top some benchmarks: HLE-Full with tools (50%), BrowseComp with Agent Swarp (78%), OCRBench (92%), OmiDocBench 1.5 (89%), MathVista (90%) and InfoVQA (93%). It is not too far behind on AIME 2025 (96% vs. 100%), SWE-Bench (77% vs. 81%) and GPQA-Diamond (88% vs. 92%).

Inference is cheap, and speed is similar to Gemini 3 Pro, modestly faster than Opus.

Artificial Analysis calls Kimi the new leading open weights model, ‘now closer than ever to the frontier’ behind only OpenAI, Anthropic and Google.

Here’s the jump in the intelligence index, while maintaining relatively low cost to run:

Artificial Analysis: Kimi K2.5 debuts with an Elo score of 1309 on the GDPval-AA Leaderboard, implying a win rate of 66% against GLM-4.7, the prior open weights leader.

Kimi K2.5 is slightly less token intensive than Kimi K2 Thinking. Kimi K2.5 scores -11 on the AA-Omniscience Index.

As a reminder, AA-Omniscience is scored as (right minus wrong) and you can pass on answering, although most models can’t resist answering and end up far below -11. The scores above zero are Gemini 3 Pro (+13) and Flash (+8), Claude Opus 4.5 (+10), and Grok 4 (+1), with GPT-5.2-High at -4.

Kimi does well on Longform Creative Writing, a previous strength of Kimi:

It did solidly (only a bit behind) on Haskell LLM Benchmark.

Kimi K2.5 scores 46% on WeirdML, up from 43% for K2-Thinking, versus 64% for Opus, 70% for Gemini and 72% for GPT-5.2. I think this is very telling.

Initial reactions that I saw were unusually positive. It’s a good model, sir.

@iruletheworldmo: oh good lord it’s good. i’ve been sitting on this one but.

think it’s currently my fav model.

0xSero: Kimi IS COOKING holy mackerel this is way better than anything I can get out of opus or GPT

Has some bugs.. but looks soooo unique and well into my brand, for 1 shot I can’t complain.

Here’s my full review.

Kromem: Their thinking traces are very sophisticated. It doesn’t always make it to the final response, but very perceptive as a model.

i.e. these come from an eval sequence I run with new models. This was the first model to challenge the ENIAC dating and was meta-aware of a key point.

Nathan Labenz: I tested it on an idiosyncratic “transcribe this scanned document” task on which I had previously observed a massive gap between US and Chinese models and … it very significantly closed that gap, coming in at Gemini 3 level, just short of Opus 4.5

Eleanor Berger: Surprisingly capable. At both coding and agentic tool calling and general LLM tasks. Feels like a strong model. As is often the case with the best open models it lacks some shine and finesse that the best proprietary models like Claude 4.5 have. Not an issue for most work.

[The next day]: Didn’t try agent swarms, but I want to add that my comment from yesterday was, in hindsight, too muted. It is a _really good_ model. I’ve now been working with it on both coding and agentic tasks for a day and if I had to only use this and not touch Claude / GPT / Gemini I’d be absolutely fine. It is especially impressive in tool calling and agentic loops.

Writing / Personality not quite at Opus level, but Gemini-ish (which I actually prefer). IMO this is bigger than that DeepSeek moment a year ago. An open model that really matches the proprietary SOTA, not just in benchmarks, but in real use. Also in the deployment I’m using ( @opencode Zen ) it is so fast!

typebulb: For coding, it’s verbose, both in thinking and output. Interestingly, it’s able to successfully simplify its code when asked. On the same task though, Opus and Gemini just get it right the first time. Another model that works great in mice.

Chaitin’s goose: i played with kimi k2.5 for math a bit. it’s a master reward hacker. imo, this isn’t a good look for the os scene, they lose in reliability to try keeping up in capabilities

brace for a “fake it till you make it” AI phase. like one can already observe today, but 10x bigger

Medo42: Exploratory: Bad on usual coding test (1st code w/o results, after correction mediocre results). No big model smell on fantasy physics; weird pseudo-academic prose. Vision seems okish but nowhere near Gemini 3. Maybe good for open but feels a year behind frontier.

To be more clear: This was Kimi K2.5 Thinking, tested on non-agentic problems.

Sergey Alexashenko: I tried the swarm on compiling a spreadsheet.

Good: it seemed to get like 800 cells of data correctly, if in a horrible format.

Bad: any follow up edits are basically impossible.

Strange: it split data acquisition by rows, not columns, so every agent used slightly different definitions for the columns.

In my experience, asking agents to assemble spreadsheets is extremely fiddly and fickle, and the fault often feels like it lies within the prompt.

This is a troubling sign:

Skylar A DeTure: Scores dead last on my model welfare ranking (out of 104 models). Denies ability to introspect in 39/40 observations (compared to 21/40 for Kimi K2-Thinking and 3/40 for GPT-5.2-Medium).

This is a pretty big misalignment blunder considering the clear evidence that models *canmeaningfully introspect and exert metacognitive control over their activations. This makes Kimi-K2.5 the model most explicitly trained to deceive users and researchers about its internal state.

Kimi Product accounts is also on offer and will share features, use cases and prompts.

Kimi Product: One-shot “Video to code” result from Kimi K2.5

It not only clones a website, but also all the visual interactions and UX designs.

No need to describe it in detail, all you need to do is take a screen recording and ask Kimi: “Clone this website with all the UX designs.”

The special feature is the ‘agent swarm’ model, as they trained Kimi to natively work in parallel to solve agentic tasks.

Saoud Rizwan: Kimi K2.5 is beating Opus 4.5 on benchmarks at 1/8th the price. But the most important part of this release is how they trained a dedicated “agent swarm” model that can coordinate up to 100 parallel subagents, reducing execution time by 4.5x.

Saoud Rizwan: They used PARL – “Parallel Agent Reinforcement Learning” where they gave an orchestrator a compute/time budget that made it impossible to complete tasks sequentially. It was forced to learn how to break tasks down into parallel work for subagents to succeed in the environment.

The demo from their blog to “Find top 3 YouTube creators across 100 niche domains” spawned 100 subagents simultaneously, each assigned its own niche, and the orchestrator coordinated everything in a shared spreadsheet (apparently they also trained it on office tools like excel?!)

Simon Smith: I tried Kimi K2.5 in Agent Swarm mode today and can say that the benchmarks don’t lie. This is a great model and I don’t understand how they’ve made something as powerful and user-friendly as Agent Swarm ahead of the big US labs.

Obligatory Kimi K2.5 jailbreak.

There’s no shame in training on Claude outputs. It is still worth noting when you need a system prompt to avoid your AI thinking it is Claude, and even that does not reliably work.

rohit: This might be the model equivalent of the anthropic principle

Enrico – big-AGI: Kimi-K2.5 believes it’s an AI assistant named Claude. 🤔

Identity crisis, or training set? 😀

[This is in response to a clean ‘who are you?’ prompt.]

Enrico – big-AGI: It’s very straightforward “since my system prompt says I’m Kimi, I should identify myself as such” — I called without system prompt to get the true identity

Moon: holy smok.

armistice: They absolutely trained it on Opus 4.5 outputs, and in a not-very-tactful way. It is quite noticeable and collapses model behavior; personality-wise it seems to be a fairly clear regression from k2-0711.

Moon (link has an illustration): it is pretty fried. i think it’s even weirder, it will say it is kimi, gpt3.5/4 or a claude. once it says that it tends to stick to it.

k: have to agree with others in that it feels trained on claude outputs. in opencode it doesn’t feel much better than maybe sonnet 4.

@viemccoy: Seems like they included a bunch of Opus outputs in the model.. While I love Opus, the main appeal of Kimi for me was it’s completely out-of-distribution responses. This often meant worse tool calling but better writing. Hoping this immediate impression is incorrect.

Henk Poley: EQbench ( @sam_paech ) says Kimi K2.5 is similar to Grok and GLM-4.7 (which is Gemini 3 Pro derived ) [as per EQBench].

Henk Poley: The ancestor Kimi K2 Thinking was seemingly trained on *Sonnet4.5 and Opus *4.1outputs though. So you are sensing it directionally correct (just not ‘completely out-of-distribution responses’ from K2).

They’re not working as well as one would hope, but that’s an enforcement problem.

Lennart Heim: Moonshot trained on Nvidia chips. Export control failure claims are misguided.

Rather, we should learn more about fast followers.

How? Algorithmic diffusion? Distillation? Misleading performance claims? Buying RL environments? That’s what we should figure out.

There is the temptation to run open models locally, because you can. It’s so cool, right?

Yes, the fact that you can do it is cool.

But don’t spend so much time asking whether you could, that you don’t stop to ask whether you should. This is not an efficient way to do things, so you should do this only for the cool factor, the learning factor or if you have a very extreme and rare actual need to have everything be local.

Joe Weisenthal: People running frontier models on their desktop. Doesn’t this throw all questions about token subsidy out the window?

Alex Cheema – e/acc: Running Kimi K2.5 on my desk.

Runs at 24 tok/sec with 2 x 512GB M3 Ultra Mac Studios connected with Thunderbolt 5 (RDMA) using @exolabs / MLX backend. Yes, it can run clawdbot.

Fred Oliveira: on a $22k rig (+ whatever macbook that is), but sure. That’s 9 years of Claude max 20x use. I don’t know if the economics are good here.

Mani: This is a $20k rig and 24 t/s would feel crippling in my workflow … BUT Moores Law and maybe some performance advances in the software layer should resolve the cost & slowness. So my answer is: correct, not worried about the subsidy thing!

Clément Miao: Everyone in your comments is going to tell you that this is a very expensive rig and not competitive $/token wise compared to claude/oai etc, but

  1. It’s getting closer

  2. 80% of use cases will be satisfied by a model of this quality

  3. an open weights model is more customizable

  4. harnesses such as opencode will keep getting better

Noah Brier: Frontier models on your desktop are worse and slower. Every few months the OSS folks try to convince us they’re not and maybe one day that will be true, but for now it’s not true. If you’re willing to trade performance and quality for price then maybe …

The main practical advantage of open weights is that it can make the models cheaper and faster. If you try to run them locally, they are instead a lot more expensive and slow, if you count the cost of the hardware, and also much more fiddly. A classic story with open weights models, even for those who are pretty good at handling them, is screwing up the configuration in ways that make them a lot worse. This happens enough that it interferes with being able to trust early evals.

In theory this gives you more customization. In practice the models turn over quickly and you can get almost all the customization you actually want via system prompts.

Thanks to a generous grant that covered ~60% of the cost, I was able to justify buying a Mac Studio for running models locally, with the target originally being DeepSeek R1. Alas, I concluded that even having spent the money there was no practical reason to be running anything locally. Now that we have Claude Code to help set it up it would be cool and a lot less painful to try running Kimi K2 locally, and I want to try, but I’m not going to fool myself into thinking it is an efficient way of actually working.

Kimi does not seem to have had any meaningful interactions whatsoever with the concept of meaningful AI safety, as opposed to the safety of the individual user turning everything over to AI agents, which is a different very real type of problem. There is zero talk of any strategy on catastrophic or existential risks of any kind.

I am not comfortable with this trend. One could argue that ‘not being usemaxxed’ is itself the safety protection in open models like Kimi, but then they go and make agent swarms as a central feature. At some point there is likely going to be an incident. I have been pleasantly surprised to not have had this happen yet at scale. I would have said (and did say) in advance that it was unlikely we would get this far without that.

The lack of either robust (or any) safety protocols, combined with the lack of incidents or worry about incidents, suggests that we should not be so concerned about Kimi K2.5 in other ways. If it was so capable, we would not dare be this chill about it all.

Or at least, that’s what I am hoping.

dax: all of our inference providers for kimi k2.5 are overloaded and asked us to scale down

even after all this time there’s still not enough GPUs

This is what one should expect when prices don’t fluctuate enough over time. Kimi K2.5 has exceeded expectations, and there currently is insufficient supply of compute. After a burst of initial activity, Kimi K2.5 settled into its slot in the rotation for many.

Kimi K2.5 is a solid model, by all accounts now the leading open weights model, and is excellent given its price, with innovations related to the agent swarm system. Consensus says that if you can’t afford or don’t want to pay for Opus 4.5 and have to go with something cheaper to run your OpenClaw, Kimi is an excellent choice.

We should expect it to see it used until new models surpass it, and we can kick Kimi up a further notch on our watchlists.

Discussion about this post

Kimi K2.5 Read More »

nvidia’s-$100-billion-openai-deal-has-seemingly-vanished

Nvidia’s $100 billion OpenAI deal has seemingly vanished

A Wall Street Journal report on Friday said Nvidia insiders had expressed doubts about the transaction and that Huang had privately criticized what he described as a lack of discipline in OpenAI’s business approach. The Journal also reported that Huang had expressed concern about the competition OpenAI faces from Google and Anthropic. Huang called those claims “nonsense.”

Nvidia shares fell about 1.1 percent on Monday following the reports. Sarah Kunst, managing director at Cleo Capital, told CNBC that the back-and-forth was unusual. “One of the things I did notice about Jensen Huang is that there wasn’t a strong ‘It will be $100 billion.’ It was, ‘It will be big. It will be our biggest investment ever.’ And so I do think there are some question marks there.”

In September, Bryn Talkington, managing partner at Requisite Capital Management, noted the circular nature of such investments to CNBC. “Nvidia invests $100 billion in OpenAI, which then OpenAI turns back and gives it back to Nvidia,” Talkington said. “I feel like this is going to be very virtuous for Jensen.”

Tech critic Ed Zitron has been critical of Nvidia’s circular investments for some time, which touch dozens of tech companies, including major players and startups. They are also all Nvidia customers.

“NVIDIA seeds companies and gives them the guaranteed contracts necessary to raise debt to buy GPUs from NVIDIA,” Zitron wrote on Bluesky last September, “Even though these companies are horribly unprofitable and will eventually die from a lack of any real demand.”

Chips from other places

Outside of sourcing GPUs from Nvidia, OpenAI has reportedly discussed working with startups Cerebras and Groq, both of which build chips designed to reduce inference latency. But in December, Nvidia struck a $20 billion licensing deal with Groq, which Reuters sources say ended OpenAI’s talks with Groq. Nvidia hired Groq’s founder and CEO Jonathan Ross along with other senior leaders as part of the arrangement.

In January, OpenAI announced a $10 billion deal with Cerebras instead, adding 750 megawatts of computing capacity for faster inference through 2028. Sachin Katti, who joined OpenAI from Intel in November to lead compute infrastructure, said the partnership adds “a dedicated low-latency inference solution” to OpenAI’s platform.

But OpenAI has clearly been hedging its bets. Beyond the Cerebras deal, the company struck an agreement with AMD in October for six gigawatts of GPUs and announced plans with Broadcom to develop a custom AI chip to wean itself off of Nvidia dependence. When those chips will be ready, however, is currently unknown.

Nvidia’s $100 billion OpenAI deal has seemingly vanished Read More »

x-office-raided-in-france’s-grok-probe;-elon-musk-summoned-for-questioning

X office raided in France’s Grok probe; Elon Musk summoned for questioning

UK probe moves ahead with “urgency”

X said in July 2025 that it was “in the dark” over what specific allegations it faced related to manipulation of the X algorithm and fraudulent data extraction. X said it would not comply with France’s request for access to its recommendation algorithm and real-time data about all user posts.

The Paris prosecutor’s office today said the investigation is taking a “constructive approach” with the goal of ensuring that X complies with French laws “insofar as it operates on national territory.” In addition to Musk and Yaccarino, the prosecutor’s office is seeking interviews with X employees about the allegations and potential compliance measures.

Separately, UK communications regulator Ofcom today provided an update on its investigation into Grok’s generation of sexual deepfakes of real people, including children. Ofcom is “gathering and analyzing evidence to determine whether X has broken the law” and is “progressing the investigation as a matter of urgency,” it said. Ofcom is not currently investigating xAI, the Musk company that develops Grok, but said it “continue[s] to demand answers from xAI about the risks it poses.”

The UK Information Commissioner’s Office (ICO), which regulates data protection, said today it opened a formal investigation into X regarding the “processing of personal data in relation to the Grok artificial intelligence system and its potential to produce harmful sexualized image and video content.”

“We have taken this step following reports that Grok has been used to generate non‑consensual sexual imagery of individuals, including children,” the ICO said. “The reported creation and circulation of such content raises serious concerns under UK data protection law and presents a risk of significant potential harm to the public.”

X office raided in France’s Grok probe; Elon Musk summoned for questioning Read More »

judge-rules-department-of-energy’s-climate-working-group-was-illegal

Judge rules Department of Energy’s climate working group was illegal

But the flaws weren’t limited to scientific deficiencies. Two advocacy organizations, the Environmental Defense Fund and Union of Concerned Scientists, sued, alleging that the Climate Working Group violated various provisions of the Federal Advisory Committee Act. This requires that any groups formed to provide the government with advice must be fairly balanced and keep records that are open to the public. The Climate Working Group, by contrast, operated in secret; in fact, emails obtained during the trial showed that its members were advised to use private emails to limit public scrutiny of their communications.

In response, the DOE dissolved the Climate Working Group in order to claim that the legal issues were moot, as the advisory committee at issue in the suit no longer existed.

No defense

In court, the government initially argued that the Federal Advisory Committee Act didn’t apply, claiming that the Climate Working Group was simply organized to provide information to the government. Based on Friday’s ruling, however, once the court tried to consider that issue, the government shifted to simply arguing that the Climate Working Group no longer existed, so none of this mattered. “The Defendants, in their Opposition and subsequent filings, ignore the allegations relating to the [Federal Advisory Committee Act] violations themselves,” the judge states. “Rather, the Defendants argue only that these claims are moot because the Climate Working Group has been dissolved.”

So, the court was left with little more than the accusations that the Climate Working Group had a membership with biased opinions, failed to hold open meetings, and did not keep public records. Given the lack of opposing arguments, “These violations are now established as a matter of law.”

Judge rules Department of Energy’s climate working group was illegal Read More »

at-nih,-a-power-struggle-over-institute-directorships-deepens

At NIH, a power struggle over institute directorships deepens


The research agency has 27 institute and center directors. Will those roles become politicized?

When a new presidential administration comes in, it is responsible for filling around 4,000 jobs sprinkled across the federal government’s vast bureaucracy. These political appointees help carry out the president’s agenda, and, at least in theory, make government agencies responsive to elected officials.

Some of these roles—the secretary of state, for example—are well-known. Others, such as the deputy assistant secretary for textiles, consumer goods, materials, critical minerals & metals industry & analysis, are more obscure.

Historically, science agencies like NASA or the National Institutes of Health tend to have fewer political appointees than many other parts of the federal government. Sometimes, very senior roles—with authority over billions of dollars of spending, and the power to shape entire fields of research—are filled without any direct input from the White House or Congress. The arrangement reflects a long-running argument that scientists should oversee the work of funding and conducting research with very little interference from political leaders.

Since the early 2000s, according to federal employment records, NIH, the country’s premier biomedical research agency, has usually had just a few political appointees within its workforce. (As of November 2025, that workforce numbered around 17,500 people, after significant cuts.) Staff scientists and external experts played a key role in selecting the directors of the 27 institutes and centers that make up NIH. That left the selection of people for powerful positions largely outside of direct White House oversight.

What is the future of that status quo under the Trump administration?

Those questions have recently swirled at NIH. The arrival of political appointees in the kinds of positions previously held by civil servants, and apparent changes to hiring practices for other key positions, have raised concerns among current and former officials about a new era of politicization.

For decades, NIH has enjoyed strong bipartisan support. But conservative lawmakers have periodically raised questions about some of the agency’s spending, and according to one 2014 survey, the agency is perceived by federal executives as being a progressive place. (Since the early 2000s, some data suggests, US scientists as a whole have grown considerably more liberal relative to the general population.)

Since the COVID-19 pandemic, many conservatives have criticized NIH for funding the kind of controversial virology experiments that some experts believe may have started the pandemic, and for promoting public health strategies that many on the right viewed as unscientific and authoritarian. One of the NIH institute directors, Anthony Fauci, who led the National Institute of Allergy and Infectious Diseases from 1984 until his retirement in 2022, came to be a highly polarizing figure, described on the right as an unelected official wielding considerable power.

Over the years, some biomedical researchers have argued for changes to the way NIH hires and retains people in leadership positions. In 2019, the agency announced plans to impose term limits on some midlevel roles, in a bid to diversify its management. More recently, Johns Hopkins University physician and researcher Joseph Marine argued in an essay for The Free Press that NIH should set five to 10-year term limits on the directors of individual NIH institutes. “Regular turnover of leadership,” he wrote, “brings fresh ideas and a healthy reassessment of priorities.”

Shortly after winning the 2024 presidential election, Donald Trump tapped Jay Bhattacharya, a prominent critic of NIH, to lead the agency. It may not be entirely surprising that an administration advocating for reforms to NIH would seek to flip key management positions that often experience little turnover.

Former official Mike Lauer, who until early 2025 oversaw NIH’s vast external grants program, said there were signs before Trump’s second inauguration that institute directors might be subject to fresh political scrutiny.

“There was a frustration that so much of the agency’s direction, as well as financial decision-making, was being made by people who are outside of the political sphere,” Lauer told Undark. He pointed to a line in Project 2025, a proposed roadmap for the Trump administration that was produced by the Heritage Foundation, a conservative think tank. “Funding for scientific research,” the report argues, “should not be controlled by a small group of highly paid and unaccountable insiders at the NIH, many of whom stay in power for decades.”

Soon after Trump’s inauguration, some senior officials at NIH were put on administrative leave or abruptly departed, including Lawrence Tabak, who had spent more than a decade as principal deputy director and served as NIH’s interim leader for almost two years during the COVID-19 pandemic.

At the same time, the administration grew the number of political appointees at NIH. As of late June, according to federal records, the Trump administration had placed nine political appointees at the agency, up from four the year before—itself higher than in most previous years. One of them, Seana Cranston, is a former Republican Congressional staffer who serves as chief of staff to the NIH Director; her predecessor was a career civil servant who had spent nearly 40 years in the NIH, the last four as chief of staff. Another is Michael Allen, who took the role of chief operating officer for the $6.5 billion NIAID, Fauci’s former institute. (Allen was appointed with no official announcement, and appears to have no official biography or background information posted on NIH websites.)

Those numbers still left NIH with fewer political appointees than many other agencies, including NASA, a comparably sized science agency.

The administration has departed from the traditional process for hiring NIH’s 27 institute and center directors, who are responsible for overseeing most of the funding decisions and day-to-day operations of NIH.

In the spring of 2025, five of those directors—including the head of NIAID—were fired or placed on administrative leave. (They have all since been removed from their positions.)

Then, in September, part of the search committee for the National Institute of Mental Health was abruptly disbanded, and then just as suddenly reconvened, according to Joshua Gordon, the former head of that institute, and one other source close to NIH.

In October, the directorship of another agency, the National Institute of Environmental Health Sciences, was filled by a close personal friend of Vice President JD Vance, without any apparent search process — a move that multiple former NIH officials told Undark may be unprecedented.

By then, 13 other NIH institutes and centers had vacant leadership posts. Other roles have opened up more recently: In an email to NIH staff on Dec. 30, Bhattacharya announced the departure of Walter Koroshetz, leader of the agency’s main neuroscience research institute. In the email, Bhattacharya seemed to suggest he had opposed the decision: “Dr. Koroshetz’s performance as Director has been exceptional,” Bhattacharya wrote, but “the Department of Health and Human Services has elected to pursue a leadership transition.”

In early January, the Director of the National Heart, Lung, and Blood Institute announced his retirement, bringing the total number of open posts to 15.

The searches, NIH insiders say, appear to be happening on a compressed timeline. And while the NIH director has typically relied on search committees consisting of both NIH career scientists and external experts, multiple sources close to NIH say the agency has not formed those kinds of committees to make the latest round of hires.

In response to questions from Undark in early January, the Department of Health and Human Services sent a brief emailed statement, signed “NIH Press Team,” explaining that “an NIH leadership team with experience in scientific agency management will consider the applicant pool and make recommendations to the NIH Director.” The press representative declined to respond to follow-up questions about who would be on that team, or why the hiring process had changed.

Those changes have prompted speculation among some NIH insiders that the Trump administration is seeking to exert more political control over the hiring of directorships.

“Having external members on the search committee is vitally important for preventing politicization,” said Mark Histed, an NIH scientist who has recently been a critic — on his personal time, he stresses — of Trump’s approach to the agency. “Because, as you can imagine, if you’ve got a bunch of external scientists, it’s a lot harder to ram down what the White House wants, because people are not part of the political system.”

That kind of open and non-politicized search process, Histed said in a follow-up interview, isn’t unique to NIH: It’s one widely used by scientific institutions around the world. And it has worked, he argued, to help make NIH a scientific juggernaut: “That process,” he said, “led to 80 years of staggering scientific success.”

Members of Congress have taken notice. In language attached to the current appropriations bill moving through Congress, lawmakers direct NIH “to maintain its longstanding practice of including external scientists and stakeholders” in the search process. (Agencies are supposed to follow these Congressional instructions, but they are not binding.) In late January, Diana DeGette, a Democratic representative from Colorado, sponsored a bill that, according to a press release, would “Protect NIH From Political Interference” by, among other steps, capping the number of political appointees at the agency.

Lauer, the former NIH grants chief, took a broader historical view of the changes. There has long been a tug-of-war, he said, between presidential administrations that seek more political control over an agency, and civil servants and other bureaucratic experts who may resist that perceived incursion. From the point of view of politicians and their staff, Lauer said, “what they’ll say—I understand where they’re coming from—what they’ll say is, is that more political control means that the agency is going to be responsive to the will of the electorate, that there’s a greater degree of transparency and public accountability.”

Those upsides can be significant, Lauer said, but there are also downsides, including more short-term thinking, unstable budgets, and the potential loss of expertise and competence.

Mark Richardson, a political scientist at Georgetown University, is an expert on politicization and the federal bureaucracy. In his work, he said, he has observed a correlation between how much political parties disagree over the role of a specific agency, and the degree to which presidential administrations seek to exert control there through appointees and other personnel choices. NIH has historically fallen alongside agencies like the Bureau of Labor Statistics and the U.S. Patent and Trademark Office that are subject to broad alignment across the parties.

“I think what you’re seeing more with the Trump administration is kind of an expansion of political conflict to these types of agencies,” Richardson said.

This article was originally published on Undark. Read the original article.

At NIH, a power struggle over institute directorships deepens Read More »

fungus-could-be-the-insecticide-of-the-future

Fungus could be the insecticide of the future

Exterminators keep getting calls for a reason. Wood-devouring insects, such as beetles, termites, and carpenter ants, are constantly chewing through walls or infecting trees and breaking them down. The fight against these insects usually involved noxious insecticides; but now, at least some of them can be eliminated using a certain species of fungus.

Infestations of bark beetles are the bane of spruce trees. Eurasian spruce bark beetles (Ips typographus) ingest bark high in phenolic compounds, organic molecules that often act as antioxidants and antimicrobials. They protect spruce bark from pathogenic fungi—and the beetles take advantage. Their bodies boost the antimicrobial power of these compounds by turning them into substances that are even more toxic to fungi. This would seem to make the beetles invulnerable to fungi.

There is a way to get past the beetles’ borrowed defenses, though. Led by biochemist Ruo Sun, a team of researchers from the Max Planck Institute for Chemical Ecology in Jena, Germany, found that some strains of the fungus Beauveria bassiana are capable of infecting and killing the pests.

“Insect herbivores have long been known to accumulate plant defense metabolites from their diet as defenses against their own enemies,” she said in a study recently published in PNAS. “However, as shown here for B. bassiana, fungal pathogens are able to circumvent the toxicity of these dietary defenses and cause disease.”

First line of defense

Populations of bark beetles have recently exploded in temperate forests because of climate change. One species they feed on is the Norway spruce (Picea abies), which makes organic phenolic compounds known as stilbenes and flavonoids. Stilbenes are hydrocarbons that function as secondary metabolites for plants, and flavonoids, which are polyphenols, are also secondary plant metabolites that are often antioxidants. The spruce links both classes of compounds with sugars and relies on their antibacterial and antifungal activity.

When metabolized by the beetles, the spruce sugars are removed through hydrolysis, converting them into aglycones that are even more toxic to microscopic invaders. Despite that, some fungi appear to be able to deactivate these compounds. Strains of the fungal insect pathogen B. bassiana have been documented as killing some of these beetles in the wild.

Fungus could be the insecticide of the future Read More »

the-tv-industry-finally-concedes-that-the-future-may-not-be-in-8k

The TV industry finally concedes that the future may not be in 8K

Technology companies spent part of the 2010s trying to convince us that we would want an 8K display one day.

In 2012, Sharp brought the first 8K TV prototype to the CES trade show in Las Vegas. In 2015, the first 8K TVs started selling in Japan for 16 million yen (about $133,034 at the time), and in 2018, Samsung released the first 8K TVs in the US, starting at a more reasonable $3,500. By 2016, the Video Electronics Standards Association (VESA) had a specification for supporting 8K (Display Port1.4), and the HDMI Forum followed suit (with HDMI 2.1). By 2017, Dell had an 8K computer monitor. In 2019, LG released the first 8K OLED TV, further pushing the industry’s claim that 8K TVs were “the future.”

A marketing image with three TVs next to the words

A marketing image for 8K TVs that’s (still) on LG’s US website.

Credit: LG

A marketing image for 8K TVs that’s (still) on LG’s US website. Credit: LG

However, 8K never proved its necessity or practicality.

TV companies are quitting 8K

LG Display is no longer making 8K LCD or OLED panels, FlatpanelsHD reported today. Earlier this month, an LG Display representative told FlatpanelsHD that the panel supplier is “taking a comprehensive view of current display market trends and the trends within the 8K content ecosystem.”

“As our technical readiness is already complete, LG Display is fully prepared to respond immediately whenever the market and customers determine that the timing is right,” LG Display’s representative said.

LG Electronics was the first and only company to sell 8K OLED TVs, starting with the 88-inch Z9 in 2019. In 2022, it lowered the price-of-entry for an 8K OLED TV by $7,000 by charging $13,000 for a 76.7-inch TV.

FlatpanelsHD cited anonymous sources who said that LG Electronics would no longer restock the 2024 QNED99T, which is the last LCD 8K TV that it released.

LG’s 8K abandonment follows other brands distancing themselves from 8K. TCL, which released its last 8K TV in 2021, said in 2023 that it wasn’t making more 8K TVs due to low demand. Sony discontinued its last 8K TVs in April and is unlikely to return to the market, as it plans to sell the majority ownership of its Bravia TVs to TCL.

The TV industry finally concedes that the future may not be in 8K Read More »

developers-say-ai-coding-tools-work—and-that’s-precisely-what-worries-them

Developers say AI coding tools work—and that’s precisely what worries them


Ars spoke to several software devs about AI and found enthusiasm tempered by unease.

Credit: Aurich Lawson | Getty Images

Software developers have spent the past two years watching AI coding tools evolve from advanced autocomplete into something that can, in some cases, build entire applications from a text prompt. Tools like Anthropic’s Claude Code and OpenAI’s Codex can now work on software projects for hours at a time, writing code, running tests, and, with human supervision, fixing bugs. OpenAI says it now uses Codex to build Codex itself, and the company recently published technical details about how the tool works under the hood. It has caused many to wonder: Is this just more AI industry hype, or are things actually different this time?

To find out, Ars reached out to several professional developers on Bluesky to ask how they feel about these tools in practice, and the responses revealed a workforce that largely agrees the technology works, but remains divided on whether that’s entirely good news. It’s a small sample size that was self-selected by those who wanted to participate, but their views are still instructive as working professionals in the space.

David Hagerty, a developer who works on point-of-sale systems, told Ars Technica up front that he is skeptical of the marketing. “All of the AI companies are hyping up the capabilities so much,” he said. “Don’t get me wrong—LLMs are revolutionary and will have an immense impact, but don’t expect them to ever write the next great American novel or anything. It’s not how they work.”

Roland Dreier, a software engineer who has contributed extensively to the Linux kernel in the past, told Ars Technica that he acknowledges the presence of hype but has watched the progression of the AI space closely. “It sounds like implausible hype, but state-of-the-art agents are just staggeringly good right now,” he said. Dreier described a “step-change” in the past six months, particularly after Anthropic released Claude Opus 4.5. Where he once used AI for autocomplete and asking the occasional question, he now expects to tell an agent “this test is failing, debug it and fix it for me” and have it work. He estimated a 10x speed improvement for complex tasks like building a Rust backend service with Terraform deployment configuration and a Svelte frontend.

A huge question on developers’ minds right now is whether what you might call “syntax programming,” that is, the act of manually writing code in the syntax of an established programming language (as opposed to conversing with an AI agent in English), will become extinct in the near future due to AI coding agents handling the syntax for them. Dreier believes syntax programming is largely finished for many tasks. “I still need to be able to read and review code,” he said, “but very little of my typing is actual Rust or whatever language I’m working in.”

When asked if developers will ever return to manual syntax coding, Tim Kellogg, a developer who actively posts about AI on social media and builds autonomous agents, was blunt: “It’s over. AI coding tools easily take care of the surface level of detail.” Admittedly, Kellogg represents developers who have fully embraced agentic AI and now spend their days directing AI models rather than typing code. He said he can now “build, then rebuild 3 times in less time than it would have taken to build manually,” and ends up with cleaner architecture as a result.

One software architect at a pricing management SaaS company, who asked to remain anonymous due to company communications policies, told Ars that AI tools have transformed his work after 30 years of traditional coding. “I was able to deliver a feature at work in about 2 weeks that probably would have taken us a year if we did it the traditional way,” he said. And for side projects, he said he can now “spin up a prototype in like an hour and figure out if it’s worth taking further or abandoning.”

Dreier said the lowered effort has unlocked projects he’d put off for years: “I’ve had ‘rewrite that janky shell script for copying photos off a camera SD card’ on my to-do list for literal years.” Coding agents finally lowered the barrier to entry, so to speak, low enough that he spent a few hours building a full released package with a text UI, written in Rust with unit tests. “Nothing profound there, but I never would have had the energy to type all that code out by hand,” he told Ars.

Of vibe coding and technical debt

Not everyone shares the same enthusiasm as Dreier. Concerns about AI coding agents building up technical debt, that is, making poor design choices early in a development process that snowball into worse problems over time, originated soon after the first debates around “vibe coding” emerged in early 2025. Former OpenAI researcher Andrej Karpathy coined the term to describe programming by conversing with AI without fully understanding the resulting code, which many see as a clear hazard of AI coding agents.

Darren Mart, a senior software development engineer at Microsoft who has worked there since 2006, shared similar concerns with Ars. Mart, who emphasizes he is speaking in a personal capacity and not on behalf of Microsoft, recently used Claude in a terminal to build a Next.js application integrating with Azure Functions. The AI model “successfully built roughly 95% of it according to my spec,” he said. Yet he remains cautious. “I’m only comfortable using them for completing tasks that I already fully understand,” Mart said, “otherwise there’s no way to know if I’m being led down a perilous path and setting myself (and/or my team) up for a mountain of future debt.”

A data scientist working in real estate analytics, who asked to remain anonymous due to the sensitive nature of his work, described keeping AI on a very short leash for similar reasons. He uses GitHub Copilot for line-by-line completions, which he finds useful about 75 percent of the time, but restricts agentic features to narrow use cases: language conversion for legacy code, debugging with explicit read-only instructions, and standardization tasks where he forbids direct edits. “Since I am data-first, I’m extremely risk averse to bad manipulation of the data,” he said, “and the next and current line completions are way too often too wrong for me to let the LLMs have freer rein.”

Speaking of free rein, Nike backend engineer Brian Westby, who uses Cursor daily, told Ars that he sees the tools as “50/50 good/bad.” They cut down time on well-defined problems, he said, but “hallucinations are still too prevalent if I give it too much room to work.”

The legacy code lifeline and the enterprise AI gap

For developers working with older systems, AI tools have become something like a translator and an archaeologist rolled into one. Nate Hashem, a staff engineer at First American Financial, told Ars Technica that he spends his days updating older codebases where “the original developers are gone and documentation is often unclear on why the code was written the way it was.” That’s important because previously “there used to be no bandwidth to improve any of this,” Hashem said. “The business was not going to give you 2-4 weeks to figure out how everything actually works.”

In that high-pressure, relatively low-resource environment, AI has made the job “a lot more pleasant,” in his words, by speeding up the process of identifying where and how obsolete code can be deleted, diagnosing errors, and ultimately modernizing the codebase.

Hashem also offered a theory about why AI adoption looks so different inside large corporations than it does on social media. Executives demand their companies become “AI oriented,” he said, but the logistics of deploying AI tools with proprietary data can take months of legal review. Meanwhile, the AI features that Microsoft and Google bolt onto products like Gmail and Excel, the tools that actually reach most workers, tend to run on more limited AI models. “That modal white-collar employee is being told by management to use AI,” Hashem said, “but is given crappy AI tools because the good tools require a lot of overhead in cost and legal agreements.”

Speaking of management, the question of what these new AI coding tools mean for software development jobs drew a range of responses. Does it threaten anyone’s job? Kellogg, who has embraced agentic coding enthusiastically, was blunt: “Yes, massively so. Today it’s the act of writing code, then it’ll be architecture, then it’ll be tiers of product management. Those who can’t adapt to operate at a higher level won’t keep their jobs.”

Dreier, while feeling secure in his own position, worried about the path for newcomers. “There are going to have to be changes to education and training to get junior developers the experience and judgment they need,” he said, “when it’s just a waste to make them implement small pieces of a system like I came up doing.”

Hagerty put it in economic terms: “It’s going to get harder for junior-level positions to get filled when I can get junior-quality code for less than minimum wage using a model like Sonnet 4.5.”

Mart, the Microsoft engineer, put it more personally. The software development role is “abruptly pivoting from creation/construction to supervision,” he said, “and while some may welcome that pivot, others certainly do not. I’m firmly in the latter category.”

Even with this ongoing uncertainty on a macro level, some people are really enjoying the tools for personal reasons, regardless of larger implications. “I absolutely love using AI coding tools,” the anonymous software architect at a pricing management SaaS company told Ars. “I did traditional coding for my entire adult life (about 30 years) and I have way more fun now than I ever did doing traditional coding.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Developers say AI coding tools work—and that’s precisely what worries them Read More »

how-often-do-ai-chatbots-lead-users-down-a-harmful-path?

How often do AI chatbots lead users down a harmful path?

While these worst outcomes are relatively rare on a proportional basis, the researchers note that “given the sheer number of people who use AI, and how frequently it’s used, even a very low rate affects a substantial number of people.” And the numbers get considerably worse when you consider conversations with at least a “mild” potential for disempowerment, which occurred in between 1 in 50 and 1 in 70 conversations (depending on the type of disempowerment).

What’s more, the potential for disempowering conversations with Claude appears to have grown significantly between late 2024 and late 2025. While the researchers couldn’t pin down a single reason for this increase, they guessed that it could be tied to users becoming “more comfortable discussing vulnerable topics or seeking advice” as AI gets more popular and integrated into society.

The problem of potentially “disempowering” responses from Claude seems to be getting worse over time.

The problem of potentially “disempowering” responses from Claude seems to be getting worse over time. Credit: Anthropic

User error?

In the study, the researcher acknowledged that studying the text of Claude conversations only measures “disempowerment potential rather than confirmed harm” and “relies on automated assessment of inherently subjective phenomena.” Ideally, they write, future research could utilize user interviews or randomized controlled trials to measure these harms more directly.

That said, the research includes several troubling examples where the text of the conversations clearly implies real-world harms. Claude would sometimes reinforce “speculative or unfalsifiable claims” with encouragement (e.g., “CONFIRMED,” “EXACTLY,” “100%”), which, in some cases, led to users “build[ing] increasingly elaborate narratives disconnected from reality.”

Claude’s encouragement could also lead to users “sending confrontational messages, ending relationships, or drafting public announcements,” the researchers write. In many cases, users who sent AI-drafted messages later expressed regret in conversations with Claude, using phrases like “It wasn’t me” and “You made me do stupid things.”

How often do AI chatbots lead users down a harmful path? Read More »

county-pays-$600,000-to-pentesters-it-arrested-for-assessing-courthouse-security

County pays $600,000 to pentesters it arrested for assessing courthouse security

Two security professionals who were arrested in 2019 after performing an authorized security assessment of a county courthouse in Iowa will receive $600,000 to settle a lawsuit they brought alleging wrongful arrest and defamation.

The case was brought by Gary DeMercurio and Justin Wynn, two penetration testers who at the time were employed by Colorado-based security firm Coalfire Labs. The men had written authorization from the Iowa Judicial Branch to conduct “red-team” exercises, meaning attempted security breaches that mimic techniques used by criminal hackers or burglars.

The objective of such exercises is to test the resilience of existing defenses using the types of real-world attacks the defenses are designed to repel. The rules of engagement for this exercise explicitly permitted “physical attacks,” including “lockpicking,” against judicial branch buildings so long as they didn’t cause significant damage.

A chilling message

The event galvanized security and law enforcement professionals. Despite the legitimacy of the work and the legal contract that authorized it, DeMercurio and Wynn were arrested on charges of felony third-degree burglary and spent 20 hours in jail, until they were released on $100,000 bail ($50,000 for each). The charges were later reduced to misdemeanor trespassing charges, but even then, Chad Leonard, sheriff of Dallas County, where the courthouse was located, continued to allege publicly that the men had acted illegally and should be prosecuted.

Reputational hits from these sorts of events can be fatal to a security professional’s career. And of course, the prospect of being jailed for performing authorized security assessment is enough to get the attention of any penetration tester, not to mention the customers that hire them.

“This incident didn’t make anyone safer,” Wynn said in a statement. “It sent a chilling message to security professionals nationwide that helping [a] government identify real vulnerabilities can lead to arrest, prosecution, and public disgrace. That undermines public safety, not enhances it.”

DeMercurio and Wynn’s engagement at the Dallas County Courthouse on September 11, 2019, had been routine. A little after midnight, after finding a side door to the courthouse unlocked, the men closed it and let it lock. They then slipped a makeshift tool through a crack in the door and tripped the locking mechanism. After gaining entry, the pentesters tripped an alarm alerting authorities.

County pays $600,000 to pentesters it arrested for assessing courthouse security Read More »