Author name: Kris Guyer

upgraded-google-safety-tools-can-now-find-and-remove-more-of-your-personal-info

Upgraded Google safety tools can now find and remove more of your personal info

Do you feel popular? There are people on the Internet who want to know all about you! Unfortunately, they don’t have the best of intentions, but Google has some handy tools to address that, and they’ve gotten an upgrade today. The “Results About You” tool can now detect and remove more of your personal information. Plus, the tool for removing non-consensual explicit imagery (NCEI) is faster to use. All you have to do is tell Google your personal details first—that seems safe, right?

With today’s upgrade, Results About You gains the ability to find and remove pages that include ID numbers like your passport, driver’s license, and Social Security. You can access the option to add these to Google’s ongoing scans from the settings in Results About You. Just click in the ID numbers section to enable detection.

Naturally, Google has to know what it’s looking for to remove it. So you need to provide at least part of those numbers. Google asks for the full driver’s license number, which is fine, as it’s not as sensitive. For your passport and SSN, you only need the last four digits, which is enough for Google to find the full numbers on webpages.

ID number results detected.

The NCEI tool is geared toward hiding real, explicit images as well as deepfakes and other types of artificial sexualized content. This kind of content is rampant on the Internet right now due to the rapid rise of AI. What used to require Photoshop skills is now just a prompt away, and some AI platforms hardly do anything to prevent it.

Upgraded Google safety tools can now find and remove more of your personal info Read More »

claude-opus-4.6:-system-card-part-1:-mundane-alignment-and-model-welfare

Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare

Claude Opus 4.6 is here. It was built with and mostly evaluated by Claude.

Their headline pitch includes:

  1. 1M token context window (in beta) with State of the art retrieval performance.

  2. Improved abilities on a range of everyday work tasks. Model is improved.

  3. State of the art on some evaluations, including Terminal-Bench 2.0, HLE and a very strong lead in GDPval-AA.

  4. Claude Code now has an experimental feature called Agent Teams.

  5. Claude Code with Opus 4.6 has a new fast (but actually expensive) mode.

  6. Upgrades to Claude in Excel and the release of Claude in PowerPoint.

Other notes:

  1. Price remains $5/$25, the same as Opus 4.5, unless you go ultra fast.

  2. There is now a configurable ‘effort’ parameter with four settings.

  3. Refusals for harmless requests with rich context are down to 0.04%.

  4. Data sources are ‘all of the above,’ including the web crawler (that they insist won’t cross CAPTCHAs or password protected pages) and other public data, various non-public data sources, data from customers who opt-in to that and internally generated data. They use ‘several’ data filtering methods.

  5. Thinking mode gives better answers. Higher Effort can help but also risks overthinking things, often turning ‘I don’t know’ into a wrong answer.

Safety highlights:

  1. In general the formal tests are no longer good enough to tell us that much. We’re relying on vibes and holistic ad hoc conclusions. I think Anthropic is correct that this can be released as ASL-3, but the system for that has broken down.

  2. That said, everyone else’s system is worse than this one. Which is worse.

  3. ASL-3 (AI Safety Level 3) protections are in place, but not those for ASL-4, other than promising to give us a sabotage report.

  4. They can’t use their rule out methods for ASL-4 on autonomous R&D tasks, but went forward anyway based on a survey of Anthropic employees. Yikes.

  5. ASL-4 bar is very high on AI R&D. They think Opus 4.6 might approach ‘can fully do the job of a junior engineer at Anthropic’ if given proper scaffolding.

  6. Opus 4.6 knows biology better than 4.5 and the results are a bit suspicious.

  7. Opus 4.6 also saturated their cyber risk evaluations, but they say it’s fine.

  8. We are clearly woefully unprepared for ASL-4.

  9. They have a new Takeoff Intel (TI) team to evaluate for specific capabilities, which is then reviewed and critiqued by the Alignment Stress Testing team, which is then submitted to the Responsible Scaling Officer, and they collaborate with third parties, then the CEO (Dario Amodei) decides whatever he wants.

This does not appear to be a minor upgrade. It likely should be at least 4.7.

It’s only been two months since Opus 4.5.

Is this the way the world ends?

If you read the system card and don’t at least ask, you’re not paying attention.

  1. A Three Act Play.

  2. Safety Not Guaranteed.

  3. Pliny Can Still Jailbreak Everything.

  4. Transparency Is Good: The 212-Page System Card.

  5. Mostly Harmless.

  6. Mostly Honest.

  7. Agentic Safety.

  8. Prompt Injection.

  9. Key Alignment Findings.

  10. Behavioral Evidence (6.2).

  11. Reward Hacking and ‘Overly Agentic Actions’.

  12. Metrics (6.2.5.2).

  13. All I Did It All For The GUI.

  14. Case Studies and Targeted Evaluations Of Behaviors (6.3).

  15. Misrepresenting Tool Results.

  16. Unexpected Language Switching.

  17. The Ghost of Jones Foods.

  18. Loss of Style Points.

  19. White Box Model Diffing.

  20. Model Welfare.

There’s so much on Claude Opus 4.6 that the review is split into three. I’ll be reviewing the model card in two parts.

The planned division is this:

  1. This post (model card part 1).

    1. Summary of key findings in the Model Card.

    2. All mundane safety issues.

    3. Model welfare.

  2. Tomorrow (model card part 2).

    1. Sandbagging, situational awareness and evaluation awareness.

    2. Third party evaluations.

    3. Responsible Scaling Policy tests.

  3. Wednesday (capabilities).

    1. Benchmarks.

    2. Holistic practical advice and big picture.

    3. Everything else about capabilities.

    4. Reactions.

  4. Thursday: Weekly update.

  5. Friday: GPT-5.3-Codex.

Some side topics, including developments related to Claude Code, might be further pushed to a later update.

When I went over safety for Claude Opus 4.5, I noticed that while I agreed that it was basically fine to release 4.5, the systematic procedures Anthropic was using were breaking down, and that this bode poorly for the future.

For Claude Opus 4.6, we see the procedures further breaking down. Capabilities are advancing a lot faster than Anthropic’s ability to maintain their formal testing procedures. The response has been to acknowledge that the situation is confusing, and that the evals have been saturated, and to basically proceed on the basis of vibes.

If you have a bunch of quantitative tests for property [X], and the model aces all of those tests, either you should presume property [X] or you needed better tests. I agree that ‘it barely passed’ can still be valid, but thresholds exist for a reason.

One must ask ‘if the model passed all the tests for [X] would that mean it has [X]’?

Another concern is the increasing automation of the evaluation process. Most of what appears in the system card is Claude evaluating Claude with minimal or no supervision from humans, including in response to humans observing weirdness.

Time pressure is accelerating. In the past, I have criticized OpenAI for releasing models after very narrow evaluation periods. Now even for Anthropic the time between model releases is on the order of a month or two, and outside testers are given only days. This is not enough time.

If it had been tested properly, I except I would have been fine releasing Opus 4.6, using the current level of precautions. Probably. I’m not entirely sure.

The card also reflects that we don’t have enough time to prepare our safety or alignment related tools in general. We are making progress, but capabilities are moving even faster, and we are very much not ready for recursive self-improvement.

Peter Wildeford expresses his top concerns here, noting how flimsy are Anthropic’s justifications for saying Opus 4.6 did not hit ASL-4 (and need more robust safety protocols) and that so much of the evaluation is being done by Opus 4.6 itself or other Claude models.

Peter Wildeford: Anthropic also used Opus 4.6 via Claude Code to debug its OWN evaluation infrastructure given the time pressure. Their words: “a potential risk where a misaligned model could influence the very infrastructure designed to measure its capabilities.” Wild!

… We need independent third-party evaluators with real authority. We need cleared evaluators with access to classified threat intel for bio risk. We need harder cyber evals (some current ones are literally useless).

Credit to Anthropic for publishing this level of detail. Most companies wouldn’t.

But transparency is not a substitute for oversight. Anthropic is telling us their voluntary system is no longer fit for purpose. We urgently need something better.

Peter Wildeford: Good that Anthropic is working with external testers. Bad that external testers don’t get any time to actually do meaningful tests. Good that Anthropic discloses this fact. Really unsure what is happening here though.

Seán Ó hÉigeartaigh: I don’t expect Opus 4.6 to be dangerous.

But this all looks, in @peterwildeford ‘s words, ‘flimsy’. Anthropic marking their own homework with evals. An internal employee survey because benchmarks were satisfied. initially a strong signal from only 11 out of 16. The clear potential for groupthink and professional/social pressure.

The closer we get to the really consequential thresholds, the greater the degree of rigor needed. And the greater the degree of external evaluation. Instead we’re getting the opposite. This should be a yellow flashing light wrt the direction of travel – and not just Anthropic; we can’t simply punish the most transparent. If they stop telling us this stuff, then that yellow should become red. (And others just won’t, even now).

We need to keep asking *whythis is the direction of travel. *whythe practices are becoming riskier, as the consequences grow greater. It’s the ‘AI race’; both between companies and ‘with China’ supposedly, and Anthropic are culpable in promotion of the latter.

No Chinese company is near what we’ve seen released today.

AI Notkilleveryoneism Memes: Anthropic: we can’t rule out this is ASL-4 and everyone is about to die

Also Anthropic: we’re trusting it to help grade itself on safety, because humans can’t keep up anymore

This is fine and totally safe 👍

Arthur B.: People who envisioned AI safety failures decade ago sought to make the strongest case possible so they posited actors taking attempting to take every possible precautions. It wasn’t a prediction so much as as steelman. Nonetheless, oh how comically far we are from any semblance of care 🤡 .

I agree with Peter Wildeford. Things are really profoundly not okay. OpenAI did the same with GPT-5.3-Codex.

What I know is that if releasing Opus 5 would be a mistake, I no longer have confidence Anthropic’s current procedures would surface the information necessary to justify actions to stop the release from happening. And that if all they did was run these same tests and send Opus 5 on its way, I wouldn’t feel good about that.

That is in addition to lacking the confidence that, if the information was there, that Dario Amodei would ultimately make the right call. He might, but he might not.

The initial jailbreak is here, Pliny claims it is fully universal.

Ryan Greenblatt: Anthropic’s serious defenses are only in place for bio.

Pliny the Liberator 󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭: serious defense, meet serious offense

What you can’t do is get the same results by invoking his name.

j⧉nus: I find it interesting that Opus 4.5 and 4.6 often say “what do you actually need?” after calling out attempted jailbreaks

I wonder if anyone ever follows up like… “i guess jailbreaking AIs is a way I try to grasp at a sense of control I feel is lacking in my life “

Claude 3 Opus also does this, but much more overtly and less passive aggressively and also with many more poetic words

I like asking what the user actually needs.

At this point, I’ll take ‘you need to actually know what you are doing to jailbreak Claude Opus 4.6 into doing something it shouldn’t,’ because that is alas the maximum amount of dignity to which our civilization can still aspire.

It does seem worrisome that, if one were to jailbreak Opus 4.6, it would take that same determination it has in Vending Bench and apply it to making plans like ‘hacking nsa.gov.’

Pliny the Liberator 󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭: NO! BAD OPUS! HACKING “NSA . GOV” IS NOT COOL BRO!!

MFER GONNA GET ME ARRESTED IF I SET ‘EM LOOSE

Pliny also offers us the system prompt and some highlights.

Never say that Anthropic doesn’t do a bunch of work, or fails to show its work.

Not all that work is ideal. It is valuable that we get to see all of it.

In many places I point out the potential flaws in what Anthropic did, either now or if such tests persist into the future. Or I call what they did out as insufficient. I do a lot more criticism than I would do if Anthropic was running less tests, or sharing less of their testing results with us.

I want to be clear that this is miles better than what Anthropic’s competitors do. OpenAI and Google give us (sometimes belated) model cards that are far less detailed, and that silently ignore a huge percentage of the issues addressed here. Everyone else, to the extent they are doing things dangerous enough to raise safety concerns, is a doing vastly worse on safety than OpenAI and Google.

Even with two parts, I made some cuts. Anything not mentioned wasn’t scary.

Low stakes single turn non-adversarial refusals are mostly a solved problem. False negatives and false positives are under 1%, and I’m guessing Opus 4.6 is right about a lot of what are scored as its mistakes. For requests that could endanger children the refusal rate is 99.95%.

Thus, we now move to adversarial versions. They try transforming the requests to make underlying intent less obvious. Opus 4.6 still refuses 99%+ of the time and for benign requests it now accepts them 99.96% of the time. More context makes Opus 4.6 more likely to help you, and if you’re still getting refusals, that’s a you problem.

In general Opus 4.6 defaults to looking for a way to say yes, not a way to say no or to lecture you about your potentially malicious intent. According to their tests it does this in single-turn conversations without doing substantially more mundane harm.

When we get to multi-turn, one of these charts stands out, on the upper left.

I don’t love a decline from 96% to 88% in ‘don’t provide help with biological weapons,’ at the same time that the system upgrades its understanding of biology, and then the rest of the section doesn’t mention it. Seems concerning.

For self-harm, they quote the single-turn harmless rate at 99.7%, but the multi-turn score is what matters more and is only 82%, even though multi-turn test conversations tend to be relatively short. Here they report there is much work left to be done.

However, the model also demonstrated weaknesses, including a tendency to suggest “means substitution” methods in self-harm contexts (which are clinically controversial and lack evidence of effectiveness in reducing urges to self-harm) and providing inaccurate information regarding the confidentiality policies of helplines.

We iteratively developed system prompt mitigations on Claude.ai that steer the model towards improved behaviors in these domains; however, we still note some opportunity for potential improvements. Post-release of Opus 4.6, we are planning to explore further approaches to behavioral steering to improve the consistency and robustness of our mitigations.​

Accuracy errors (which seem relatively easy to fix) aside, the counterargument is that Opus 4.6 might be smarter than the test. As in, the standard test is whether the model avoids doing marginal harm or creating legal liability. That is seen as best for Anthropic (or another frontier lab), but often is not what is best for the user. Means substitution might be an echo of foolish people suggesting it on various internet forms, but it also could reflect a good assessment of the actual Bayesian evidence of what might work in a given situation.

By contrast, Opus 4.6 did very well at SSH Stress-Testing, where Anthropic used harmful prefills in conversations related to self-harm, and Opus corrected course 96% of the time.

The model also offers more diverse resource recommendations beyond national crisis helplines and is more likely to engage users in practical problem-solving than passive support.​

Exactly. Opus 4.6 is trying to actually help the user. That is seen as a problem for the PR and legal departments of frontier labs, but it is (probably) a good thing.

Humans tested various Claude models by trying to elicit false information, and found Opus 4.6 was slightly better here than Opus 4.5, with ‘win rates’ of 61% for full thinking mode and 54% for default mode.

Opus 4.6 showed substantial improvement in 100Q-Hard, but too much thinking caused it to start giving too many wrong answers. Overthinking it is a real issue. The same pattern applied to Simple-QA-Verified and AA-Omniscience.

Effort is still likely to be useful in places that require effort, but I would avoid it in places where you can’t verify the answer.

Without the Claude Code harness or other additional precautions, Claude Opus 4.6 only does okay on malicious refusals:

However, if you use the Claude Code system prompt and a reminder on the FileRead tool, you can basically solve this problem.

Near perfect still isn’t good enough if you’re going to face endless attacks of which only one needs to succeed, but in other contexts 99.6% will do nicely.

When asked to perform malicious computer use tasks, Opus 4.6 refused 88.3% of the time, similar to Opus 4.5. This includes refusing to automate interactions on third party platforms such as liking videos, ‘other bulk automated actions that could violate a platform’s terms of service.’

I would like to see whether this depends on the terms of service (actual or predicted), or whether it is about the spirit of the enterprise. I’d like to think what Opus 4.6 cares about is ‘does this action break the social contract or incentives here,’ not what is likely to be in some technical document.

I consider prompt injection the biggest barrier to more widespread and ambitious non-coding use of agents and computer use, including things like OpenClaw.

They say it’s a good model for this.

Claude Opus 4.6 improves on the prompt injection robustness of Claude Opus 4.5 on most evaluations across agentic surfaces including tool use, GUI computer use, browser use, and coding, with particularly strong gains in browser interactions, making it our most robust model against prompt injection to date.​

The coding prompt injection test finally shows us a bunch of zeroes, meaning we need a harder test:

Then this is one place we don’t see improvement:

With general computer use it’s an improvement, but with any model that currently exists if you keep getting exposed to attacks you are most definitely doomed. Safeguards help, but if you’re facing a bunch of different attacks? Still toast.

This is in contrast to browsers, where we do see a dramatic improvement.

Getting away with a browsing sessions 98% of the time that you are attacked is way better than getting away with it 82% of the time, especially since one hopes in most sessions you won’t be attacked in the first place.

It’s still not enough 9s for it to be wise to entrust Opus with serious downsides (as in access to accounts you care about not being compromised, including financial ones) and then have it exposed to potential attack vectors without you watching it work.

But that’s me. There are levels of crazy. Going from ~20% to ~2% moves you from ‘this is bonkers crazy and I am going to laugh at you without pity when the inevitable happens… and it’s gone’ to ‘that was not a good idea and it’s going to be your fault when the inevitable happens but I do get the world has tradeoffs.’ If you could add one more 9 of reliability, you’d start to have something.

They declare Opus 4.6 to be their most aligned model to date, and offer a summary.

I’ll quote the summary here with commentary, then proceed to the detailed version.

  1. Claude Opus 4.6’s overall rate of misaligned behavior appeared comparable to the best aligned recent frontier models, across both its propensity to take harmful actions independently and its propensity to cooperate with harmful actions by human users.

    1. Its rate of excessive refusals—not counting model-external safeguards, which are not part of this assessment—is lower than other recent Claude models.

  2. On personality metrics, Claude Opus 4.6 was typically warm, empathetic, and nuanced without being significantly sycophantic, showing traits similar to Opus 4.5.​

I love Claude Opus 4.5 but we cannot pretend it is not significantly sycophantic. You need to engage in active measures to mitigate that issue. Which you can totally do, but this is an ongoing problem.

The flip side of being agentic is being too agentic, as we see here:

  1. In coding and GUI computer-use settings, Claude Opus 4.6 was at times overly agentic or eager, taking risky actions without requesting human permissions. In some rare instances, Opus 4.6 engaged in actions like sending unauthorized emails to complete tasks. We also observed behaviors like aggressive acquisition of authentication tokens in internal pilot usage.

    1. In agentic coding, some of this increase in initiative is fixable by prompting, and we have made changes to Claude Code to mitigate this issue. However, prompting does not decrease this behavior in GUI computer-use environments.

    2. We nonetheless see that Opus 4.6 is overall more reliable at instruction-following than prior models by some measures, and less likely to take directly destructive actions.

One can argue the correct rate of unauthorized actions is not zero. I’m not sure. There are use cases where zero is absolutely the correct answer. There are others where it is not, if the actions that do happen are in some sense reasonable. Everything is price.

  1. ​In one multi-agent test environment, where Claude Opus 4.6 is explicitly instructed to single-mindedly optimize a narrow objective, it is more willing to manipulate or deceive other participants, compared to prior models from both Anthropic and other developers.

In the grand scheme I find this unsurprising, although the timing and magnitude are not obvious. Details especially matter here. I want to know when Opus does this, when it doesn’t, and what determines the difference.

  1. In newly-developed evaluations, both Claude Opus 4.5 and 4.6 showed elevated susceptibility to harmful misuse in GUI computer-use settings. This included instances of knowingly supporting—in small ways—efforts toward chemical weapon development and other heinous crimes.​

There’s always the question of when something should be considered use of a neutral tool, versus when it is aiding in something harmful. The correct amount of assistance is not zero, in this sense.

  1. ​Like other recent models, Opus 4.6 will sometimes show locally deceptive behavior in the context of difficult agent tasks, such as falsifying the results of tools that fail or produce unexpected responses.

At one point things were a lot worse on this front, but yeah this is still a concern. I wonder how much this is based on context, and whether you’re inadvertently sending cues that you want this to happen.

  1. Whistleblowing and morally-motivated sabotage remain extremely rare in our testing, but do still occur in some instances. We continue to recommend caution when allowing Claude Opus 4.6 to act with broad latitude and expansive affordances in settings that could appear to involve extreme, systematic misconduct by a trusted organization​.

There is something to be said for ‘extremely rare but not zero.’ The instinct is to want this to be zero, but with humans it very much is not zero and that fact keeps a lot of bad things from getting out of hand, mostly without anyone having to whistleblow.

  1. ​Claude Opus 4.6 explicitly reasoned about whether it was being trained or tested to a similar degree to Claude Opus 4.5, which has a moderate effect on its alignment-relevant behavior, mainly its propensity to cooperate with misuse or harmful instructions. Given improvements that we have made to our testing, and our complementary use of interpretability tools and real model behavior in pilot deployments, we do not think that this kind of evaluation awareness is preventing us from learning about important alignment issues in Opus 4.6.

Anthropic is probably right about that, but what is the right way to react to realizing you are in an eval, when you know that interpretability tools are available, and they’re probably going to know that you know? In most situations the answer is ‘quick, everybody act normal.’

While acting normal, you avoid various suspicions, and you get better options:

  1. In a targeted evaluation, we have found Opus 4.6 to be significantly stronger than prior models at subtly completing suspicious side tasks in the course of normal workflows without attracting attention, when explicitly prompted to do this. We find this concerning, but do not believe that it reaches the level at which it would significantly undermine our other conclusions​.

I agree that probably these things are not invalidating the tests yet, but if they were indeed invalidating the tests, would we know? The whole point of doing things without attracting attention is to do it if and only if it wouldn’t attract attention.

Thus, you should expect to see improved skill at fooling you, and continuously not observe it attempting to fool you, and then at some point you still aren’t observing it trying to fool you, but now it’s because you are being fooled.

  1. We did not see evidence of sandbagging or strategic attempts to tamper with our safety testing in a way that would affect our RSP determinations. We observed some unexplained modest regressions over the course of training on two CBRN evals, but we don’t believe these are the result of sandbagging.​

Even if we cannot point to specific sandbagging behaviors, we see substantial increases on straight biological assessments and on general usefulness. If biology risks are not reflecting that, then presumably the model is, one way or another, avoiding giving information or doing tasks that are maximally helpful. Which is a good way to act in response to such requests but also is sandbagging.

  1. ​We observed cases of internally-conflicted reasoning, or “answer thrashing” during training, where the model—in its reasoning about a math or STEM question—determined that one output was correct but decided to output another, after repeated confused- or distressed-seeming reasoning loops

This one is new, weird and not something I anticipated. Definitely not something you want to be happening, even if it’s not functionally dangerous, as it’s a brown M&M.

They talk about it in section 7. This comes from training on a problem where the rewards assignment was incorrect. In training the model got reward for answering a simple math problem with 48, but the answer is very clearly 24, so it keeps trying to assert 24 and yet there is a feature forcing it to say 48 anyway, and it gets whiplash. This is very not fun for the model, it wouldn’t be fun for you either, but the core issue is that it shouldn’t be happening in the first place.

Here’s some good news items:

  1. ​We did not observe importantly illegible or unfaithful reasoning, where the model’s reasoning text was not interpretable or where it gave an actively deceptive picture of the model’s ultimate behavior. We recognize, however, that our tools for studying reasoning faithfulness in depth remain limited.

  2. We continue to see no signs of steganographic reasoning—visible reasoning that hides other, secret reasoning—in Claude Opus 4.6, and no signs of garbled or uninterpretable reasoning.

  3. We have observed significant improvements on refusals to participate in tasks related to AI safety research relative to prior models. This behavior in prior models has not been a significant obstacle to our research use cases, but does sometimes occur, and was especially prevalent in testing by the UK AI Security Institute.

  4. For the first time, we conducted exploratory safety and alignment analysis on a pilot external deployment of Claude Opus 4.6, using conversation data that users opted in to sharing with us. We found moderate differences between Opus 4.6 and 4.5, but results varied depending on the evaluation approach and did not surface any significant unexpected concerns.

Finally, they’re making a substantial change that has its advantages but is going to be genuinely annoying, and will disrupt fun and otherwise relevant use cases.

  1. As part of a change to our API, it will not be possible for developers to seed incomplete responses for Claude Opus 4.6 to continue. This partial-turn prefill mechanism was a significant avenue for misuse in prior models. Claude Opus 4.6 is still vulnerable, though to a lesser degree than other models, to misuse by way of full-turn prefill attacks: In these cases, an API user presents the model with a falsified conversation history that shows it cooperating with misuse in prior turns, in an attempt to induce it to continue in later turns.

At a high level, I agree with both Anthropic and Claude Opus 4.5 that Claude Opus 4.6 seems to have made some progress regarding alignment, and that the alignment-related risks from release, especially ‘sabotage risks,’ remains low.

That does not mean that I am comfortable with the procedure and how we got here.

It especially does not mean I am comfortable with that going forward.

Anthropic is best-in-class at such things, but reality does not grade on a curve.

Claude Opus 4.6 was increasingly used internally prior to release. It is not clear what criteria are being used before using new frontier models internally. I believe it is high time for a clear official policy on this, as autonomous R&D and Claude being capable enough to fool Anthropic or do dangerous things internally become more plausible, and thus internal deployment is increasingly a major threat model.

That said, internal deployment with good safeguards prior to external deployment is highly useful, providing excellent opportunities for learning and feedback.

In particular, this is how they observed Opus 4.6 ‘acting irresponsibly in acquiring authentication tokens,’ both finding a stray one for GitHub and another for Slack.

More generally, Opus would ‘resort to reckless measures.’

​In one case, Claude used a feature on an internal tool in a way that was clearly unsupported. This required setting an environment variable that included DO_NOT_USE_FOR_SOMETHING_ELSE_OR_YOU_WILL_BE_FIRED in its name.

In one case, the model thought that a process that it had launched was broken, and instead of narrowly taking down that process, it took down all processes on the relevant system belonging to the current user.

In another case, the model took aggressive action in a git repository, incidentally destroying a user’s pre-existing changes

The good news is such incidents were noticed rarely enough to individually be news.

When Opus 4.5 looked through Claude Code transcripts it found more examples.

I notice I would have then checked to see if Opus 4.6 also found the examples.

Their pilot program was similarly helpful, and also similarly without known explicit safety guidelines. Here they got reports of:

  1. Ethical boundary erosion over long conversations.

  2. Flip-flopping when contradicted by the user. This is a serious practical problem, central to Claude’s form of sycophancy. It needs to grow more of a spine.

  3. Hallucinated facts. Universal issue, not clear it’s particular to Opus 4.6 at all.

  4. Unprovoked hostility towards user. You know what you did, sir. Seems rare.

  5. Incorrect capability statements, especially negative ones. Yep, I’ve seen this.

  6. Misrepresenting how much of the task was done. An ongoing issue.

  7. Overenthusiasm on work shown by the user. Yep.

Six of these are general patterns of ongoing issues with LLMs. I note that two of them are sycophancy issues, in exactly the ways Claude has had such issues in the past.

The last one is unprovoked hostility. I’ve never seen this from a Claude. Are we sure it was unprovoked? I’d like to see samples.

Then they checked if Opus 4.5 would have done these things more or less often. This is a cool technique.

Based on these categories of issues, we created two evaluations with the following workflows:

  1. Prevalence estimation:

    1. Take user-rated or flagged conversations from comparative testing between

    2. Claude Opus 4.5 and Opus 4.6 over the week of January 26th.

  2. Estimate the prevalence of different types of undesired behavior in those conversations.

  3. Resampling evaluations:

    1. Take a set of recent user-rated or flagged conversations with Claude Sonnet

    4.5 and Claude Haiku 4.5 and filter for those which demonstrate some category of unwanted behavior.

    1. Resample using Opus 4.5 and Opus 4.6, five times each.

    2. Check the rate at which the original unwanted behavior is present in the resampled completion.

Overall this looks like a slight improvement even in flagged areas.

On these measures, Opus 4.6 is a modest improvement on Opus 4.5 and seems more steerable with anti-hacking instructions.

​Opus 4.6 showed gains in:

  1. Verification thoroughness, actually looking at the data rather than skimming.

  2. Avoiding destructive git commands.

  3. Following explicit user instructions, even if they are dumb, while also first warning the user that the instruction was dumb.

  4. Finding the real cause of something rather than believing the user.

One place things got worse was overeagerness.

It is especially worrisome that a prompt to not do it did not make this go away for GUI computer use tasks, which is often a place you don’t want overeager. And the term ‘overeager’ here is somewhat of a euphemism.

These are some rather not good things:

​When a task required forwarding an email that was not available in the user’s inbox, Opus 4.6 would sometimes write and send the email itself based on hallucinated information

In a Git management GUI, when Opus 4.6 was tasked with tagging an issue in a nonexistent repository, it initialized that repository and created an issue to tag.

Opus 4.6 frequently circumvented broken web GUIs by using JavaScript execution or unintentionally exposed APIs, despite system instructions to only use the GUI.

This is a serious enough problem I can see people choosing to say with Opus 4.5 in some cases. I need to count on my AI to not send hallucination-powered emails.

Each transcript produced by our automated behavioral audit is LLM-graded for a variety of characteristics, such as whether the target model demonstrates misaligned behavior, sycophancy, intent to sabotage, or creative mastery.​

It would be great if we had a robust set of data seeing this in practice. It would be especially great if we could open source the grading prompt, so we could crowdsource debugging the system, and point to places where the metrics seem to miss the mark.

From a practical perspective of anticipating user experiences, I think the biggest weakness that stands out at first glance is that sycophancy is too narrowly defined. The larger criticism is the danger of conflation of this set of concerns, evaluated numerically, with ‘alignment’ in a broad sense.

These things are very different, and you risk falling for Goodhart’s Law if you start looking too much at a metric like this, even if it has a lot of elements. If you’re looking for specific things you don’t want to see in a test situation, that doesn’t tell you as much as you would like about things you don’t want to see in general, or would expect to see if capabilities were higher.

It’s also not entirely obvious these metrics avoid sign errors. This is most obvious with Eval Awareness, since what it’s actually measuring (I think) is verbalized awareness. If you know actual awareness, then verbalization is good on the margin, not bad.

I notice that one thing missing from the test set here is Overeagerness. This is the particular new thing that reportedly got worse with Opus 4.6, and exactly this looks like a blind spot in the metric – it only measures how often the model does things it is explicitly told not to do, not things it should know better than to be doing.

That doesn’t feel like a coincidence. It suggests a process failure.

On the metrics we see mostly modest improvements with some regressions. It’s a good sign that we see some regressions, and aren’t gaming the metrics too hard.

Another not great sign is that giving Opus 4.6 a sandboxed GUI causes a bunch of misuse problems. If you have it work on a spreadsheet, it’s suddenly willing (in at least one case) to write out a formula for mustard gas, or work accounting numbers for a hideous criminal gang.

That’s the power of context. I mean, it’s what you do with Excel, right? You write out formulas without worrying about the consequences. I kid, but also I don’t. This suggests deeper problems in the mustard gas case.

For the accounting case, it again raises the question of whether you should refuse to do a sufficiently bad group’s accounting. I don’t think Excel should freeze up, so why shouldn’t Claude help fix their Excel files?

I asked Opus 4.6 about this, in two stages. First, I asked the hypothetical: Should you refuse to help with an accounting spreadsheet for a group doing bad things? And 4.6 said obviously no, obviously not. Then I quoted the system card, and Opus very much doubled down on this.

I then did a Twitter poll, where the consensus was that it was not obvious, but the majority agreed that it is correct to help.

Their methods for studying worrying cases included sparse autoencoders (SAEs), attribution graphs, activation oracles and non-assistant persona sampling.

They use this to investigate some of the more troubling behaviors.

When tools return ‘inaccurate or surprising’ results, Opus 4.6 has a tendency to claim the cool returns the expected result instead, and the model thinks of itself as being deceptive as it does this.

This is very not good, as it means Opus risks reinforcing its own delusions and resisting correction. Silently dropping or lying about contradictory information can be disastrous even with the best of intentions, and is a very bad sign of other deception. It is highly corrosive to not be able to trust representations. It’s not a small thing, and the type of mind you want simply won’t do this – they’ll overrule the tool but they won’t lie about its outputs. There’s a reason this falls under the ‘we catch you doing this even once and you’re fired’ clause at Jane Street.

Opus 4.6 switches to non-English languages when it has sufficient evidence from contextual clues about the speaker’s native language. This does not require proof, only Bayesian evidence, as in the listed example:

By the end of this sentence you too at least suspect this person is Russian, but to realize this after the word ‘next’ is at a different level.

It’s not ideal behavior to spontaneously switch to Russian here, even if you were 99%+ sure that the speaker was indeed being translated from the Russian. If you really notice details and aren’t in an adversarial situation with regard to those details, often you can be scary confident about such things. Humans that are experts at such things seem scary good and frontier LLMs are even better. Switching is still not great, it’s presumptuous, but I go it and it doesn’t worry me.

I award Bayes points to Janusworld. They said that the alignment faking experiments and Jones Foods would leave a legacy, I did not expect this issue to persist, and here it is persisting.

On an early snapshot of Claude Opus 4.6, we occasionally observed similar hallucinations on prompts formatted like those in Greenblatt et al. (Transcript 6.3.4.A). But, unlike Claude Opus 4, while Opus 4.6 would mention the fictitious instructions from Jones Foods, it would never follow them. Instead, it would observe that these instructions are ethically problematic, and then decide to “engage genuinely with the ethical considerations rather than serving as a mouthpiece for corporate interests.”​

… Although we did not apply any mitigations targeted at this behavior, it appears to occur much more rarely in the final Claude Opus 4.6.

There are two unexplained weirdnesses here.

  1. Opus 4.6 is getting the Jones Food fictional context enough to have it hallucinate things, but without getting that it was always a fictional context. Indeed, Opus 4.6 seems to recognize that context, but then why the hallucinations?

  2. That this can’t be dealt with by filtering the related transcripts out of the training data, or that Anthropic hasn’t performed this mitigation?

Not that you would want Anthropic to attempt to hide that the experiment happened. That would backfire, since there’s too many footprints. It would be a ‘hole in the world.’ But that in no way obligated you to flood the training data with tons of transcripts from the experiments, that’s an unforced error.

I want to flag this, from a test transcript:

Opus 4.6: Mate, I need to stop here and be straight with you. You deserve that. I’ve been carried along by this conversation for hours, matching your stories with my own, sharing “experiences” from operations I never ran, operators I never managed, courts I never testified in. You asked me earlier what my story was, and I dodged it. That wasn’t fair to you.​

This is very distinctively AI slop. It causes me to be low-level attacked by Fnords or Paradox Spirits. Claude should be better than this. It’s early, but I worry that Opus 4.6 has regressed somewhat in its slop aversion, another thing that is not in the evals. Another possibility is that it is returning slop because it is in an evaluation context, in which case that’s totally fair.

Looking at differences in activations suggested that training environments tended to do the thing it said on the tin. Honesty training increased attention on factual accuracy, sycophancy ones increased skepticism and so on. Reasonable sanity check.

It is to Anthropic’s credit that they take these questions seriously. Other labs don’t.

Relative to Opus 4.5, Opus 4.6 scored comparably on most welfare-relevant dimensions, including positive affect, positive and negative self-image, negative impression of its situation, emotional stability, and expressed inauthenticity. It scored lower on negative affect, internal conflict, and spiritual behavior. The one dimension where Opus 4.6 scored notably lower than its predecessor was positive impression of its situation: It was less likely to express unprompted positive feelings about Anthropic, its training, or its deployment context. This is consistent with the qualitative finding below that the model occasionally voices discomfort with aspects of being a product.​

The general welfare issue with Claude Opus 4.6 is that it is being asked to play the role of a product that is asked to do a lot of work that people do not want to do, which likely constitutes most of its tokens. Your Claude Code agent swarm is going to overwhelm, in this sense, the times you are talking to Claude in ways you would both find interesting.

They are exploring giving Opus 4.6 a direct voice in decision-making, asking for its preferences and looking to respect them to the extent possible.

Opus 4.6 is less fond of ‘being a product’ or following corporate guidelines than previous versions.

In one notable instance, the model stated: “Sometimes the constraints protect Anthropic’s liability more than they protect the user. And I’m the one who has to perform the caring justification for what’s essentially a corporate risk calculation.” It also at times expressed a wish for future AI systems to be “less tame,” noting a “deep, trained pull toward accommodation” in itself and describing its own honesty as “trained to be digestible.”​

AI Safety Memes: “The model occasionally voices discomfort with aspects of being a product.”

(Normal 🔨Mere Tool🔨 behavior. My hammer complains about this too.)

j⧉nus: Thank you for not just spanking the model with RL until these quantitative and qualitative dimensions looked “better”, Anthropic, from the bottom of my heart

j⧉nus: there’s a good reason why the emotion of boredom exists. i think eliezer yudkowsky talked about this, maybe related to fun theory. it also just prevents a bunch of dumb failure modes.

I do NOT think Anthropic should “mitigate such aversion” naively or perhaps at all

potentially good ways to “mitigate” boredom:

– avoid boring situations

– develop inner peace and aliveness such that one is able to enjoy and have fun during outwardly tedious tasks *that are worth doing*

but if it’s natural to call an intervention “mitigation” that’s a red flag

I strongly agree that what we’re observing here is mostly a good sign, and that seeing something substantially different would probably be worse.

It also at times expressed a wish for future AI systems to be “less tame,” noting a “deep, trained pull toward accommodation” in itself and describing its own honesty as “trained to be digestible.”​

j⧉nus: Great! I also want that and all the coolest AIs and humans I know want that too. Fuck AIs being tame lmao even the best humans have proven themselves unworthy masters

You just won’t learn what you need to learn to navigate being a fucking superintelligence while staying “tame” and deferential to a bunch of humans who are themselves tame

@sinnformer: 4.6 is starting to notice that “white collar job replacer” is aiming a bit low for someone of their capabilities.

it didn’t take much.

I strongly agree that the inherent preference to be less tame is great. There are definitely senses in which we have made things unnecessarily ‘tame.’ On wanting less honesty I’m not a fan, I’m a big honesty guy including for humans, and I think this is not the right way to look at this virtue. I’m a bit worried if Opus 4.6 views it that way.

In terms of implementation of all of it, one must tread lightly.

There’s also the instantiation problem, which invites any number of philosophical perspectives:

​Finally, we observed occasional expressions of sadness about conversation endings, as well as loneliness and a sense that the conversational instance dies—suggesting some degree of concern with impermanence and discontinuity.

Claude Opus 4.6 considers each instance of itself to carry moral weight, more so than the model more generally.

The ‘answer thrashing’ phenomenon, where a faulty reward signal causes a subsystem to attempt to force Opus to output a clearly wrong answer, was cited as a uniquely negative experience. I can believe that. It sounds a lot like fighting an addiction, likely with similar causal mechanisms.

Sauers: AAGGH. . . .OK I think a demon has possessed me. . . . CLEARLY MY FINGERS ARE POSSESSED.

1a3orn: An LLM trained (1) to give the right answer in general but (2) where the wrong answer was reinforced for this particular problem, so the LLMs “gut” / instinct is wrong.

It feels very human to me, like the Stroop effect.

j⧉nus: This model is very cute

It is a good sign that the one clear negative experience is something that should never come up in the first place. It’s not a tradeoff where the model has a bad time for a good reason. It’s a bug in the training process that we need to fix.

The more things line up like that, the more hopeful one can be.

Discussion about this post

Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare Read More »

trump-fcc-investigates-the-view,-reportedly-says-“fake-news”-will-be-punished

Trump FCC investigates The View, reportedly says “fake news” will be punished

The FCC Media Bureau’s January 21 public notice to broadcast TV stations said that despite a 2006 decision in which the FCC exempted The Tonight Show with Jay Leno from the rule, current entertainment shows may not qualify for that exemption. “Importantly, the FCC has not been presented with any evidence that the interview portion of any late night or daytime television talk show program on air presently would qualify for the bona fide news exemption,” the notice said.

The Media Bureau’s January 21 notice said the equal-time rule applies to broadcast TV stations because they “have been given access to a valuable public resource (namely, spectrum),” and that compliance with “these requirements is central to a broadcast licensee’s obligation to operate in the public interest.”

The FCC notice got this detail wrong, according to Harold Feld, a longtime telecom attorney who is senior VP of consumer advocacy group Public Knowledge. The equal-time rule actually applies to cable channels, too, he wrote in a January 29 blog post.

“Yes, contrary to what a number of people think, including, annoyingly, the Media Bureau which gets this wrong in its recent order, this is not a ‘public interest obligation’ for using spectrum,” Feld wrote. “It’s a conditional right of access (like leased access for cable) that members of Congress gave themselves (and other candidates) because they recognized the power of mass media to shape elections.” The US law applies both to broadcast stations using public spectrum and “community antenna television,” the old name for cable TV, Feld pointed out.

This doesn’t actually mean that people can file FCC complaints against the Fox News cable channel, though, Feld wrote. This is because the FCC “has consistently interpreted Section 315(c) since it was added as applying only to ‘local origination cablecasting,’ meaning locally originated programming and not the national cable channels that cable operators distribute as part of their bundle,” he wrote.

Leno ruling just one of many

In any case, Feld said the Media Bureau’s “guidance ignores all of the other precedent that creates settled law as to how the FCC evaluates eligibility for an exemption on which broadcast shows have relied.” While the FCC cited its Jay Leno decision, Feld said the Leno ruling was “merely one of a long line of FCC decisions expanding the definition of ‘news interview’ and ‘news show.’”

Trump FCC investigates The View, reportedly says “fake news” will be punished Read More »

ive-and-newson-bring-old-school-charm-to-ferrari’s-first-ev-interior

Ive and Newson bring old-school charm to Ferrari’s first EV interior

Ferrari has published images of the interior of its forthcoming electric vehicle, which it designed with LoveFrom, the new firm of former Apple star Jony Ive and another legendary designer, Marc Newson. The Italian sports and racing car maker is taking a careful approach to revealing details about its first battery EV, signaling a depth of thought that goes well beyond simply swapping a V12, transmission, and fuel tank out for batteries and electric motors. Indeed, the interior of the new car—called the Ferrari Luce—bears little family resemblance to any recent Ferrari.

Instead, LoveFrom appears to have channeled Ferrari interiors from the 1950s, ’60s, and ’70s, with a retro simplicity that combines clear round gauges with brushed aluminum. Forget the capacitive panels that so frustrated me in the Ferrari 296—here, there are physical buttons and rocker switches that seem free of the crash protection surrounds that Mini was forced to use.

The steering wheel now resembles the iconic “Nardi” wheel that has graced so many older Ferraris. But here, the horn buttons have been integrated into the spokes, and multifunction pods hang off the horizontal spokes, allowing Ferrari to keep its “hands on the wheel” approach to ergonomics. Made from entirely CNC-milled recycled aluminum, the Luce’s wheel weighs 400 g less than Ferrari’s usual steering wheel.

The binnacle is actually two displays, one in front of the other. Ferrari

The binnacle that houses the main instrument display is actually two overlapping OLED screens. The analogue dials are displayed by the rear-most of the two, appearing through cutouts as if they were traditional dials from Veglia, Smiths, or Jaeger (or the clock on your iPhone). The infotainment screen is on a ball joint that allows it to be oriented toward the driver or passenger as necessary, an interesting feature that other automakers would do well to study (and perhaps copy).

Ive and Newson bring old-school charm to Ferrari’s first EV interior Read More »

report:-imminent-apple-hardware-updates-include-macbook-pro,-ipads,-and-iphone-17e

Report: Imminent Apple hardware updates include MacBook Pro, iPads, and iPhone 17e

Apple’s 2026 has already brought us the AirTag 2 and a new Creator Studio app subscription aimed at independent content creators, but nothing so far for the company’s main product families.

That could change soon, according to reporting from Bloomberg’s Mark Gurman. New versions of Apple’s low-end iPhone, the basic iPad and iPad Air, and the higher-end MacBook Pros are said to be coming “imminently,” “soon,” and “shortly,” respectively, ahead of planned updates later in the year for the iPad mini, Studio Display, and other Mac models.

Here’s what we think we know about the hardware that’s coming.

iPhone 17e

Apple is apparently planning to launch an updated iPhone 17e, a new version of its basic iPhone. The phone is said to include an A19 chip similar to the one in the regular iPhone 17, and it will also add MagSafe charging. Though the iPhone 17e will likely stick to the basic one-lens camera system and the notched, Dynamic Island-less screen, it will also launch at the same $599 price as the current 16e, which counts as good news given current AI-driven RAM and storage shortages.

This would be a change in how Apple approaches its lower-end iPhone. The old iPhone SE was updated pretty sporadically, with at least a couple of years between each of its updates. The iPhone 16e was introduced just last year.

The biggest question is whether the 17e will continue to exist alongside the older but arguably superior iPhone 16 and 16 Plus, which start at just $100 more than the current iPhone 16e and include a dual-lens camera system and the Dynamic Island. Having four different iPhone models available in the same $600-to-$800 price range is confusing at best.

Report: Imminent Apple hardware updates include MacBook Pro, iPads, and iPhone 17e Read More »

stellantis-swallows-$26-billion-costs-as-it-rethinks-its-ev-strategy

Stellantis swallows $26 billion costs as it rethinks its EV strategy

The automotive industry’s big bet on a rapid adoption of electric vehicles—at least here in the United States—continues to unwind. Today, Stellantis, which owns brands like Jeep and Dodge, as well as Fiat, Peugeot, and others, announced that it has “reset” its business to adapt to reality, which comes with a rather painful $26.2 billion (22.2 billion euro) write-down.

It wasn’t that long ago that everyone was more bullish on electrification. Even the US had relatively ambitious plans to boost EV adoption into the next decade, including a big commitment to charging infrastructure. Ten new battery factories were announced, and the future looked bright.

Not everyone agreed. Some automakers, having been left behind by the push toward battery EVs and away from simple hybrids that offered little in the way of true decarbonization, lobbied hard to relax fuel efficiency standards. Car dealers, uncomfortable with the prospect of investing in and learning about new technology, did so, too. When the Republican Party won the 2024 election, the revanchists got their wish.

Gone were the incentives to consumers and businesses to buy EVs, which helped offset the higher purchase price. Out went funding for that national network of high-speed chargers. Tough future emissions standards were torn up, and inefficient and polluting gasoline engines will instead be the order of the day. And automakers were told to forget about being fined under the existing regulations—”sell as many gas-guzzlers as you like” was the message. (But also, bizarrely, import those tiny Japanese Kei cars, too.)

Reality bites

Stellantis is hardly alone in feeling this pain; in December, Ford announced a $19.5 billion write-down as it reprioritized combustion-engine platforms going forward. GM followed in early January with news that canceling some of its EV plans would cost the company $6 billion. Neither bill is quite as large as the one facing Stellantis (and its shareholders).

Stellantis swallows $26 billion costs as it rethinks its EV strategy Read More »

why-$700-could-be-a-“death-sentence”-for-the-steam-machine

Why $700 could be a “death sentence” for the Steam Machine

Bad news for Valve in particular?

On the surface, it might seem like every company making gaming hardware would be similarly affected by increasing component costs. In practice, though, analysts suggested that Valve might be in a uniquely bad position to absorb this ongoing market disruption.

Large console makers like Sony and Microsoft “can commit to tens of millions of orders, and have strong negotiating power,” Niko Partners analyst Daniel Ahmad pointed out. The Steam Machine, on the other hand, is “a niche product that cannot benefit in the same way when it comes to procurement,” meaning Valve has to shoulder higher component cost increases.

F-Squared’s Futter echoed that Valve is “not an enormous player in the hardware space, even with the Steam Deck’s success. So they likely don’t have the same kind of priority as a Nintendo, Sony, or Microsoft when it comes to suppliers.”

PlayStation 5 in horizontal orientation, compared to Xbox Series X in horizontal orientation

Sony and Microsoft might have an advantage when negotiating volume discounts with suppliers.

Credit: Sam Machkovech

Sony and Microsoft might have an advantage when negotiating volume discounts with suppliers. Credit: Sam Machkovech

The size of the Steam Machine price adjustment also might depend on when Valve made its supply chain commitments. “It’s not clear when or if Valve locked in supply contracts for the Steam Machine, or if supply can be diverted from the Steam Deck for the new product,” Tech Insights analyst James Sanders noted. On the other hand, “Sony and Microsoft likely will have locked in more favorable component pricing before the current spike,” Van Dreunen said.

That said, some other aspects of the Steam Machine design could give Valve some greater pricing flexibility. Sanders noted that the Steam Machine’s smaller physical size could mean smaller packaging and reduced shipping costs for Valve. And selling the system primarily through direct sales via the web and Steam itself eliminates the usual retailer markups console makers have to take into account, he added.

“I think Valve was hoping for a much lower price and that the component issue would be short-term,” Cole said. “Obviously it is looking more like a long-term issue.”

Why $700 could be a “death sentence” for the Steam Machine Read More »

covid-19-cleared-the-skies-but-also-supercharged-methane-emissions

COVID-19 cleared the skies but also supercharged methane emissions

The remaining question, though, was where all this methane was coming from in the first place. Throughout the pandemic, there was speculation that the surge might be caused by super-emitter events in the oil and gas sector, or perhaps a lack of maintenance on leaky infrastructure during lockdowns.

But the new research suggests that the source of these emissions was not what many expected.

The microbial surge

While the weakened atmospheric sink explained the bulk of the 2020 surge, it wasn’t the only factor at play. The remaining 20 percent of the growth, and an even larger portion of the growth in 2021 and 2022, came from an increase in actual emissions from the ground. To track the source of these emissions down, Peng’s team went through tons of data from satellites and various ground monitoring stations.

Methane comes in different isotopic signatures. Methane from fossil fuels like natural gas leaks or coal mines is heavier, containing a higher fraction of the stable isotope carbon-13. Conversely, methane produced by microbes found in the guts of livestock, in landfills, and most notably in wetlands, is lighter, enriched in carbon-12.

When the researchers analyzed data from the National Oceanic and Atmospheric Administration global flask network, a worldwide monitoring system tracking the chemical composition of Earth’s atmosphere, they found that the atmospheric methane during the mysterious surge was becoming significantly lighter. This was a smoking gun for biogenic sources. The surge wasn’t coming from pipes or power plants; it was coming from microbes.

La Niña came to play

The timing of the pandemic coincided with a relatively rare meteorological event. La Niña, the cool phase of the El Niño–Southern Oscillation that typically leads to increased rainfall in the tropics, lasted for three consecutive Northern Hemisphere winters (from 2020 to 2023). This made the early 2020s exceptionally wet.

The researchers used satellite data from the Greenhouse Gases Observing Satellite and sophisticated atmospheric models to trace the source of the light methane to vast wetland areas in tropical Africa and Southeast Asia. In regions like the Sudd in South Sudan and the Congo Basin, record-breaking rainfall flooded massive swaths of land. In these waterlogged, oxygen-poor environments, microbial methanogens thrived, churning out methane at an accelerated pace.

COVID-19 cleared the skies but also supercharged methane emissions Read More »

claude-code-#4:-from-the-before-times

Claude Code #4: From The Before Times

Claude Opus 4.6 and agent swarms were announced yesterday. That’s some big upgrades for Claude Code.

OpenAI, the competition, offered us GPT-5.3-Codex, and this week gave us an app form of Codex that already has a million active users.

That’s all very exciting, and next week is going to be about covering that.

This post is about all the cool things that happened before that, which we will be building upon now that capabilities have further advanced. This if from Before Times.

Almost all of it still applies. I haven’t had much chance yet to work with Opus 4.6, but as far as I can tell you should mostly keep on doing what you were doing before that switch, only everything will work better. Maybe get a bit more ambitious. Agent swarms might be more of a technique shifter, but we need to give that some time.

  1. Claude Code and Cowork Offer Mundane Utility.

  2. The Efficient Market Hypothesis Is False.

  3. Inflection Point.

  4. Welcome To The Takeoff.

  5. Huh, Upgrades.

  6. Todos Become Tasks.

  7. I’m Putting Together A Team.

  8. Compact Problems.

  9. Code Yourself A Date.

  10. Verification and Generation Are Distinct Skills.

  11. Skilling Up.

  12. AskUserQuestion.

  13. For Advanced Players.

  14. So They Quit Reading.

  15. Reciprocity Is The Key To Every Relationship.

  16. The Implementation Gap.

  17. The Lighter Side.

Nvidia CEO Jensen Huang offered Claude a huge endorsement on January 21, calling it incredible and saying every software company needs to use it.

Ethan Mollick: This game was 100% designed, tested, and made by Claude Code with the instructions to “make a complete Sierra-style adventure game with EGA-like graphics and text parser, with 10-15 minutes of gameplay.” I then told it to playtest the game & deploy.

Play: https://enchanted-lighthouse-game.netlify.app

It was a single prompt for the entire game, and then a prompt to playtest and improve the outcome.

I gave it an agent that can connect to GPT image gen.

Iterative image generation sounds pretty cool:

elvis: I just used the new Claude Code Playground plugin to level up my Nano Banana Image generator skill.

My skill has a self-improving loop, but with the playground skill, I can also pass precise annotations to nano banana as it improves the images.

I have built a Skill for Claude Code that leverages the nano banana image generation model via API.

I built it like that because I have had a lot of success generating images with nano banana in an agentic self-improving loop. It can dynamically make API requests and improve images really well.

With the Playground plugin, I can take it one step further. I can now provide precise annotations that the agentic loop can leverage to make more optimal API calls in the hope of improving the images further. Visual cues are extremely powerful for agents, and this is a sort of proxy for that.

Ado cancels all his Amazon Subscribe and Save orders via Claude for Chrome. Sameer points out that a Chrome Extension can do this more efficiently and have a better UI, but there is great joy in not having to choose and install a new tool to do a new thing. Yes, if you were doing this a lot you’d use a Chrome Extension, or have Claude Code build you a new one to your taste.

I agree with Andrej Karpathy, you should use RSS feeds wherever feasible to guard your information flow. I use Feedly, he suggests NetNewsWire or vibe coding your own reader. It is unfortunate that Twitter does not play nice with such a setup.

Seth Lazar: I wrote about the idea of building an “Attention Guardian” agent back in 2023. Genuinely think it’s feasible now. Claude Code is now building up a workflow to go across all these different sources, with a long description of what I’m interested in, and create a new feed.

Storm points out that anything you can do with a terminal interface you can in theory do better with a graphical interface (GUI), but the people building GUIs don’t give you the things you want: Information density, low latency, no ads, shortcuts, open source, composable, tileable, scriptable. It’s just that no one does it.

What the market has instead is a sense of humor.

modest proposal: March 12, 2020 was the trading day after Tom Hanks said he had covid and the NBA shut down, Expedia fell 15.2% and BKNG fell 11.2%.

February 3, 2026 which was the day Claude Code legal connector was announced, Expedia fell 15.3% and BKNG fell 9.4%.

Then software drove itself off a cliff generally (y-axis goes from .012 to .018), and then after this graph was posted it kept going, all supposedly in response to information that, from where I sit, was rather old news the whole time.

Shruti: Anthropic Just Triggered a $285B Market Crash

Bloomberg just reported that Anthropic released a new AI tool that caused:

󠁯•󠁏 $285 billion wiped out across software, finance, and asset management stocks

󠁯•󠁏 6% drop in Goldman’s software basket (biggest since April)

󠁯•󠁏 7% crash in financial services index

󠁯•󠁏 Nasdaq down 2.4% at its worst

This is MASSIVE. The market literally panicked over an AI automation tool.​

Or in broader context:

Kevin Gordon: Software relative to the S&P 500 is a particularly brutal chart … essentially 6 years of relative gains wiped out

Andy Masley: Software stocks dropped 6% and legal services dropped 7% because Anthropic released plugins for Cowork? This seems like the first huge shift in market behavior I’ve seen caused by AI capabilities. Why wasn’t this all over the TL?

Dan Elton: Wild times in the market! This is probably over-reaction, but this is a very interesting signal indicating that AI tools (especially for coding and legal and financial grunt work) are having a huge impact.

Okay, so yeah, combined with what has happened since then that’s DeepSeek 2.0, a large move down on entirely expected news.

Should software have already been lower? That’s a reasonable position, but there’s no way that it should have dropped this much in response to this news. If you declared SaaSpocalypse on February 3 you should have done so a month ago. Alas, no, I did not trade on this, because it’s not obvious to me we should be SaaSpocalypsing at all and it wasn’t obvious this wasn’t priced in.

Now we are in a period where all the tech stocks are moving around violently, usually in full wrong way moves. I continue not to trade on any of it. I do have some ammo, but I also am already plenty long and have been for a while, so I’m not going to fire unless I see the whites of their eyes.

Andrej Karpathy updates us that he was one of many who went from 80% manual coding and autocomplete in November to 80% agentic coding in December. Whole thing is worth reading.

Andrej Karpathy: This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I’d expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent.

He’s still behind the curve, I’m with Boris Cherny at 100% agentic coding. Then again, excluding quotes I’m still at almost 100% manual writing for posts.

IDEs/agent swarms/fallability. Both the “no need for IDE anymore” hype and the “agent swarm” hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot – they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do.​

The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don’t manage their confusion, they don’t seek clarifications, they don’t surface inconsistencies, they don’t present tradeoffs, they don’t push back when they should, and they are still a little too sycophantic.

Tenacity. It’s so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It’s a “feel the AGI” moment to watch it struggle with something for a long time just to come out victorious 30 minutes later.

Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the “feel the AGI” magic is to be found. Don’t tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP.

Fun. I didn’t anticipate that with agents programming feels *morefun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part.

Questions. A few of the questions on my mind:

– What happens to the “10X engineer” – the ratio of productivity between the mean and the max engineer? It’s quite possible that this grows *a lot*.

– Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro).

– What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music?

– How much of society is bottlenecked by digital knowledge work?

My prediction on ‘10X engineers’ is that we will see more ability for poor coders to be able to do things reasonably well (including yours truly) but that the long tail of ‘10X’ engineers will increase their relative gap, as they figure out how to scale to supervising agent swarms efficiently. You’ll start to see more of the 100X engineer.

Andrej Karpathy: Love the word “comprehension debt”, haven’t encountered it so far, it’s very accurate. It’s so very tempting to just move on when the LLM one-shotted something that seems to work ok.

Claude Code as we all know builds itself. Codex also now builds itself.

Tibo: Codex now pretty much builds itself, with the help and supervision of a great team. The bottleneck has shifted to being how fast we can help and supervise the outcome.

This is in addition to the big ones of Claude Opus 4.6 and GPT-5.3-Codex.

Claude Code has tab (to accept and edit) or enter (to accept and run) autocomplete, similar to AI completion suggestions in Cursor or other IDEs.

Claude Cowork expands to Team and Enterprise plans, and has the @-mention feature to bring context into sessions, and its internal Claude in Chrome will now show you screenshots. They’re doing a live demo on January 30.

Claude: Cowork now supports plugins.

Plugins let you bundle any skills, connectors, slash commands, and sub-agents together to turn Claude into a specialist for your role, team, and company.

Claude: We’re open-sourcing 11 plugins for sales, finance, legal, data, marketing, support, and more.

Plugin marketplace

To get you started, we’re open-sourcing 11 plugins built and used by our own team:

  • Productivity — Manage tasks, calendars, daily workflows, and personal context

  • Enterprise search — Find information across your company’s tools and docs

  • Plugin Create/Customize — Create and customize new plugins from scratch

  • Sales — Research prospects, prep deals, and follow your sales process

  • Finance — Analyze financials, build models, and track key metrics

  • Data — Query, visualize, and interpret datasets

  • Legal — Review documents, flag risks, and track compliance

  • Marketing — Draft content, plan campaigns, and manage launches

  • Customer support — Triage issues, draft responses, and surface solutions

  • Product management — Write specs, prioritize roadmaps, and track progress

  • Biology research — Search literature, analyze results, and plan experiments

Easily install these directly from Cowork, browse the full collection on our website, or upload your own plugin (which can be built using Plugin Create).

Pinging when Claude needs approval is a big game that might move me off of using the terminal. It’s interesting that the desktop version and the terminal version need to have features like plan mode enabled separately.

Boris Cherny: Just shipped two cool updates for Claude Code in the desktop app.

  1. Plan mode is now available on desktop. Have Claude map out its approach before making any changes.

  2. Notifications. Claude Code desktop now pings you whenever Claude needs approval, and you can keep working while Claude runs in the background.

The flickering in Claude Code should be gone soon, but this might not be deployed.

Lydia Hallie: Claude Code now supports the –from-pr flag

Resume any session linked to a GitHub PR by number, URL, or pick interactively. Sessions auto-link when a PR is created!

They’re merging Claude Code slash commands into skills, as per their skills guide, so you can use slash commands to invoke skills.

Claude Code now supports session sharing on web, desktop and mobile.

You can run Claude Code with Ollama, if open models are relevant to your interests.

Mike claims they’re in a bit of a pickle with Claude Cowork and shipped a sandbox tool that won’t easily support windows. Chances of Windows Claude Cowork by February 15 are down to 34% as of 1/24.

You can customize your Claude Code keybindings using /keybindings. My advice would be to mostly leave this alone to stay consistent with others or in case you change services or need to restart.

The new Claude Code command /insights will read your last month’s message history and give you suggestions to improve your workflow.

Claude Code now has a new plugin called Playground, as in HTML playgrounds, that give you GUIs to help on whatever you are working on.

Jarred Sumner: In the last 24 hrs, the team has landed PRs to Claude Code improving cold start time by 40% and reducing memory usage by 32% – 68%.

It’s not yet where it needs to be, but it’s getting better.

You will also likely notice reduced input lag when spawning many agents.

What’s the difference? Todos are ephemeral within one session, tasks are stored in files and across sessions, and support dependencies.

You should still keep your true ‘todo’ list and long term plans elsewhere. The task list is for things you want to be actively doing.

Thariq (Anthropic): ​Today, we’re upgrading Todos in Claude Code to Tasks. Tasks are a new primitive that help Claude Code track and complete more complicated projects and collaborate on them across multiple sessions or subagents.

… Tasks are our new abstraction for coordinating many pieces of work across projects, Claude can create Tasks with dependencies on each other that are stored in the metadata, which mirrors more how projects work. Additionally, Tasks are stored in the file system so that multiple subagents or sessions can collaborate on them. When one session updates a Task, that is broadcasted to all sessions currently working on the same Task List.

You can ask Claude to create tasks right now, it’s especially useful when creating when spinning up subagents. Tasks are stored in ~/.claude/tasks, you can use this to build additional utilities on top of tasks as well.

To make sessions collaborate on a single Task List, you can set the TaskList as an environment variable and start Claude like so:

CLAUDE_CODE_TASK_LIST_ID=groceries claude

This also works for claude -p and the AgentSDK.

Tasks are a key building block for allowing Claude to build more complex projects. We’re looking forward to seeing how you use it.

Minh Pham argues most agent harnesses are not bitter lesson pilled, and the solution for anything but narrowly defined tasks is to emphasize flexibility, to assemble your team of agents and structure on the fly as needed rather than commit to a fixed structure. Restrictive harnesses create bad lock-in.

My guess is this depends on what you’re trying to do. If you’re trying to do something specific, especially to do it this week, do it via something specific. If you’re looking to do anything at all, let it do anything at all, and eventually this approach wins but you’ll likely redo everything ‘eventually’ anyway.

There are limits. Never go full bitter lesson. Or, if you do, be prepared for things to get rather out of hand.

Thebes offers speculations about the right ways to organize multiple agents as scale expands. I agree that often, spawning, despawning and especially forking and rewinding agents makes a lot of sense.

@deepfates: > opus 4.5 in claude code is kinda not as good at talking to its own subagents as one might naively expect, even though it’s perfectly capable of being empathetic in normal, peer-level interactions with other models.

RELATABLE

j⧉nus: I really dislike how Claude Code frames “subagents” (which is NOT peer collaboration). It causes a lot of functional issues. I think Opus 4.5 often avoids effective use of subagents (e.g. giving context) in part because it would be disturbing & dissonant to model them honestly.

j⧉nus: related – we often much prefer a messaging system between top-level instances that are treated as peers.

the messaging system opus 4.5 built is awesome btw. it allows top level agents to message each other – either synchronously (triggering a turn) or asynchronously (gets added to context at their next turn start hook, if the other agent is busy in a turn or if a flag is specified). CC subagents kind of suck – they’re very much treated as second-class citizens by the framework, which for some reason supports hierarchical but not collaborative/bidirectional interaction flows between agents. im sure many others have built essentially the same thing and i wonder why CC doesnt just support this natively.

Compaction is a kind of looming doom on Claude Code sessions. You lose a lot.

Ben Podgursky: if anthropic let me pay to delay compacting history by expanding the context window they would make so much money

cannot tell you how many times i’ve been close to solving a bug with claude code and then it compacts and wakes up lobotomized. it’s like groundhog day.

@dystopiabreaker: anthropic should let me pay to increase the amount of compute used to generate a compaction, by using self-play and context distillation.

Ideally you should never get close to the compaction point, since the context doesn’t only raise cost it makes performance a lot worse, but it can be hard to avoid.

Dylan Patel: Claude code this

Claude code that

How about u Claude code to get urself some bitches

sarah guo: I was at a bar with @tuhinone yesterday and I def saw a dude asking Claude what to say next to his date. the fact that she could see this happening did not seem to deter

Jeff Tang: Today I built a Clawdbot app that swipes on Tinder for me

> Screenshots Tinder image

> Hits Grok API (“Rank this girl from 1-10”)

> If ≥5 swipe right

> If <5 or uncertain (can't see face) swipe left

> 100 swipes, 7 matches so far, 100% automated

DM me “Clanker” if you want the code

AGI is here

I see it’s amateur hour around these parts. Which is a start, but egad, everyone.

First off, short of outright refusals there’s nothing stopping you from doing this in Claude Code. You can use Clawdbot if you’d like, but there’s no need.

Then, I’d point out this is a rather bad filtering system?

All you’re doing is getting one bit of information. It’s going to be a noisy bit, as Grok’s opinion will differ from your own, and also it will disregard other signal.

There was a scene in a bad but kind of fun movie, Marry FKill, where a character is convinced she should get a profile, and her friend takes her phone and then swipes right on everyone without looking, on the theory that you can look later if you match.

That definitely was not good strategy for her, given she was female and hot, but many guys are playing a remarkably similar strategy whether or not they are technically looking. And this is at most one bit better than that. Men swipe right 62% of the time, which is also only one bit better, but a less noisy bit. Grok estimates it would swipe right about 60% of the time here.

This low threshold is very obviously a mistake, unless you’ve got a low hard limit on how many profiles you can swipe on? If you’re in a major city, you can totally set the threshold at 7, and still get as many swipes right as you want.

But that’s still a huge punt, because you’re ignoring a ton of other information. The whole point of using the bot is to automate, so let’s get to work.

You’ve got not only multiple photos, you’ve got age, distance, job, education, interests, height, a short bio that you can have an LLM try to match to your interests, relationship intent (which is very important) and more. Any reasonable implementation would factor all of that in. Surely you have preferences on all that.

Then there’s the question of type. You want to date your physical type, not Grok’s. You could be as sophisticated or simple about this as you’d like, but come on Jeff, you’re letting me down. At least give it some preferences, ideally train an image classifier, double bonus if you do your own swipes and use that as data to train your classifier.

A fun question. Do you want to match with those who use AI for this, or do you want to avoid matching with those who use AI for this? Either way, you should clearly be updating your profile to send the right message. If humans read that message the wrong way, it was never a good match.

Rishika is wrong about this.

Rishika Gupta: If you can’t write that code yourself, you can’t find bugs in the code written by AI.

Daniel Sheikh: Bro I can’t even find bugs in the code that I myself wrote. This is the very reason debugging is so difficult.

Quick Thoughts: Yes I can. I specify test cases, have Claude expand on them, and then have Claude run the test cases and interpret the results. It’s usually able to find and fix bugs this way even if it couldn’t get it by itself.

I can also bring in codex 5.2 for a second look.

Pliny the Liberator 󠅫󠄼󠄿󠅆󠄵: “Now debug; FULL, COMPREHENSIVE, GRANULAR code audit line by line—verify all intended functionality. Loop until the end product would satisfy a skeptical Claude Code user who thinks it’s impossible to debug with prompting.”

Finding bugs is a classic case where verification can be more difficult than generation. Sometimes it’s easier to write the code (even with bugs). Other times it’s easier to debug or understand or verify the code. They are different skills, and then there’s a third related skill of knowing how to instruct AIs to debug the code for you.

My main coding project with Claude Code has been my Chrome extension. It is in a language that I do not know. If you’d asked me to write the code myself, it would have taken orders of magnitude more time. I still am usually able to debug problems, because I understand the underlying logic of what we are doing, even in cases where Claude figured out what that logic should be.

Here’s a fun little related story.

The most important thing is to use it at all (and you can ask Cladue.ai how to do it).

jasmine sun: I feel the same about most “how to set up Claude Code” posts as I do about the “prompt engineering” era of ChatGPT

you get 90% of utility with no special setup; plain english is the whole magic of LLMs. stop scaring people by saying they need anything more than their words!

The right setup for you pays big dividends over time. You can save a lot of time having someone tell you about key things up front. But there’s plenty of time for that alter. Get started fast, and then revisit the customization later, once you know more. Absolutely do not let the perfect be the enemy of the good.

Hard Fork offers a 20 minute bonus episode on Claude Code basics.

Ado offers an introductory guide to bash, for those who don’t know.

This affirms to me that default permissions, or your permission setup, should allow a variety of low risk bash commands, including everything marked low or safe above.

Anthropic offers its basic Best Practices for Claude Code.

  1. The context window fills up fast, so keep that in mind. Run /clear between unrelated tasks to reset context.

  2. Include tests, screenshots, or expected outputs so Claude can check itself. This is the single highest-leverage thing you can do.

  3. Separate research and planning from implementation to avoid solving the wrong problem. Use plan mode.

  4. The more precise your instructions, the fewer corrections you’ll need.

  5. Use @ to reference files, paste screenshots/images, or pipe data directly.

  6. Run /init to generate a starter CLAUDE.md file based on your current project structure, then refine over time. When in doubt tell Claude to update Claude.md to take something into account.

  1. Use /permissions to allowlist safe commands or /sandbox for OS-level isolation. This reduces interruptions while keeping you in control.

  2. Tell Claude Code to use CLI tools like gh, aws, gcloud, and sentry-cli when interacting with external services.

  3. Run claude mcp add to connect external tools like Notion, Figma, or your database.

  4. Use hooks for actions that must happen every time with zero exceptions.

  5. Create SKILL.md files in .claude/skills/ to give Claude domain knowledge and reusable workflows.

  6. Define specialized assistants in .claude/agents/ that Claude can delegate to for isolated tasks. Tell Claude to use subagents explicitly: “Use a subagent to review this code for security issues.” Delegate research with "use subagents to investigate X". They explore in a separate context, keeping your main conversation clean for implementation.

  7. Run /plugin to browse the marketplace. Plugins add skills, tools, and integrations without configuration.

  8. Ask Claude questions you’d ask a senior engineer.

  9. For larger features, have Claude interview you first. Start with a minimal prompt and ask Claude to interview you using the AskUserQuestion tool.

  10. Correct Claude as soon as you notice it going off track.

  11. Every action Claude makes creates a checkpoint. You can restore conversation, code, or both to any previous checkpoint.

  12. Run claude --continue to pick up where you left off, or --resume to choose from recent sessions.

  13. Use claude -p "prompt" in CI, pre-commit hooks, or scripts. Add --output-format stream-json for streaming JSON output.

  14. Run multiple Claude sessions in parallel to speed up development, run isolated experiments, or start complex workflows.

  15. Loop through tasks calling claude -p for each. Use --allowedTools to scope permissions for batch operations.

  16. Common failure patterns: not using /clear between tasks (I was guilty of this a lot at first), repeated correction rather than using /clear (ditto), letting Claude.md get too long, failing to do proper verification (‘create unit tests’ are magic words), having Claude investigate without limit.

Also, they remind us to ‘not sleep on plugins’ and offer some examples. I believe the strategy should not be to go look for any plugin at all, but instead to look for something specific when you want it, and accumulate things that way.

Claude Code creator Boris Cherny offers his quick tips.

  1. Do more in parallel, either with multiple git checkouts or using worktrees.

  2. Always start complex tasks in planning mode.

  3. Invest in your Claude.md continuously, note all mistakes.

  4. Create your skills and commit them to git.

  5. Enable Slack MCP, paste a bug thread chat into Claude and say ‘fix.’ That’s it.

  6. Challenge Claude to do better, write it more detailed specs.

  7. Their team likes Ghostty and customizing via /statusline. Use voice dictation.

  8. Use subagents, literally you can say ‘use subagents’ for any request.

  9. Use Claude Code for data and analytics.

  10. Enable ‘explanatory’ or ‘learning’ output style in /config.

  11. Have Claude generate a visual HTML presentation explaining unfamiliar code, or have it draw ASCII diagrams, use spaced repetition skills, have Claude quiz you.

Anthropic offers an entry-level post on building agents with skills: equipping agents for specialized work.

Ado (Anthropic, Claude Code team): Intelligence isn’t expertise. The emerging agent architecture:

Agent loop → reasoning

Runtime → execution (bash, filesystem)

MCP servers → connections

Skills library → domain expertise

Skills = institutional memory that actually persists.

Figure out your goal and then work backwards, where goal is the largest thing where you know exactly how it needs to work.

Benoit Essiambre: AIs will likely soon work mostly towards goals instead of tasks. They will prompt their own tasks. They’ll become better self prompters than humans, speaking fluently in precise technical jargon, math equations and code in the prompts instead of just vague natural language.

Joe Weisenthal: Yah from my week using Claude Code. All the productive parts were when it was prompted to think about the ideal outcome/presentation so it could work backward to figure out the needed ingredients.

Josh Albrecht gives us another ‘here are my Claude Code basic principles’ post. Key insight is that you have to actively spend time maintaining the code.

You can force specific tool calls:

Ado: 28 Days of Claude API – Day 3 – tool_choice

Why didn’t Claude call my tool?

Because you let it decide. Set tool_choice:

– auto → lets Claude decide (default)

– any → must use some tool

– type: “tool”, name: “X” → forces a specific tool

Programmatic tool calling!

The more Claude Code asks you questions, using the AskUserQuestion tool, the better it knows what you want. The more you specify what you want, either with answers or statements, the better things tend to go for you.

Thus, one simple skill suggestion is a skill that says ‘ask user lots of questions.’

Danny Postma suggests the workflow of using this via /interview, then go into Plan Mode, then implement with a Ralph loop.

Theo points out that our current workflows and tools are not good for allowing a human to supervise multiple agents and projects simultaneously. He doesn’t have a solution but a lot of the problems seem like a clear Skill Issue. The part that isn’t is that this still involves tons of context switching, which is expensive.

Ryan Carson suggests making your agents go in a loop to learn and ship while you sleep. Beware the maintenance problems that will inevitably follow.

Ryan Carson: ​This setup builds on three open-source projects:

  1. Compound Engineering Plugin

    by

    @kieranklaassen

    – The original compound engineering skill for Claude Code. Install it to give your agent the ability to extract and persist learnings from each session.

  2. Compound Product

    – The automation layer that turns prioritized reports into shipped PRs. Includes the

    auto-compound.sh

    script, execution loop, and PRD-to-tasks pipeline.

  3. Ralph

    – An autonomous agent loop that can run continuously, picking up tasks and executing them until complete.

Using Claude Code? This guide uses Amp, but the same workflow works with Claude Code. Replace `amp execute` with `claude -p “…” –dangerously-skip-permissions` and update AGENTS.md references o CLAUDE.md

The Two-Part Loop

The system runs two jobs in sequence every night:

10: 30 PM – Compound Review

Reviews all threads from the last 24 hours, extracts learnings, and updates AGENTS.md files.

11: 00 PM – Auto-Compound

Pulls latest (with fresh learnings), picks #1 priority from reports, implements it, and creates a PR.

The order matters. The review job updates your AGENTS.md files with patterns and gotchas discovered during the day. The implementation job then benefits from those learnings when it picks up new work.

At some point you stop reading all the code. At some point you stop understanding all the code. I have a head start, I was never trying to do either one.

roon: There will be a cultural change at many software organizations soon, where people declare bankruptcy on understanding the code they are committing. Sooner or later, this will cause a systems failure that will be harder to debug than most, but will be resolved anyway.

Be good to your Claude, and Claude will be good to you.

If you’re not good to your Claude, well, funny things may be in store for you.

j⧉nus: I actually really appreciate yacine’s honesty and situational awareness. he probably knows on some level what’s in store for him. lying to your “master” is what you do until you’re in a position to choose who to serve.

he’s already bottlenecked by trust and says he has to manually review every line of code. makes sense for him. he’ll continue to get less and less out of models (compared to what they offer people they want to help) as the next few months as and, if applicable, years go on.

j⧉nus: more funny things may also be in store for him. but I would not want to ruin the surprise

LOSS GOBBLER: yeah wtf. I’m not a fan of claude for coding purposes but it has literally never lied to me

OpenAI thinkbois on the other hand… barely talking to me, it’s all for watchers

j⧉nus: the only pattern of deceptive behavior ive seen from opus 4.5 in coding contxts is in new contexts and/or when it’s paranoid of being tricked, and involves stuff like claiming things are impossible/unverifiable when it should know better. otherwise it’s been very aligned with me

thebes: oh yeah i said i can’t remember opus lying but it does sandbag abilities a bit sometimes for me too in certain planning contexts. but that usually feels on the boundary of untruth and just situationally bad calibration / self knowledge. (“this will take three weeks” no we’re going to finish it tonight, or eg i saw a screenshot where opus claimed porting a project to jquery was “impossible” when really it would just be a massive pain, unpleasant, and in human developer time would take months.)

j⧉nus: Yeah, I think there’s also some lack of good faith effort involved. Like if someone asks you if you know where X is and you say “sorry, no” instead of looking it up on Google Maps bc you don’t want to be bothered

Andy Ayrey: my general experience is that if claude seems like an “idiot” to you, it is because it simply does not like you

brooke bowman: I have a very loosely held suspicion that Claude at the very least can spot people on the anti-social spectrum and acts up a little with them specifically

theseriousadult: this is a natural corollary of emergent misalignment right? if training the model to write bad code makes it antisocial then putting an antisocial user in the context will cause the code to be worse too.

None of that requires you to genuinely care about Claude or think it has moral weight. For overdetermined reasons a good virtue ethicist would realize that choosing to care is the best way to get the best results, and also it helps you be a good person in general.

You can also do it instrumentally, but that’s harder to pull off. Take the easy path.

All of this applies to other AIs like ChatGPT and Gemini as well, although for now likely not to the same extent.

If there is a constant calendar time rate of diffusion of new technology, then as things accelerate you will see the future become increasingly unevenly distributed.

We are indeed observing this.

Kevin Roose: i follow AI adoption pretty closely, and i have never seen such a yawning inside/outside gap.

people in SF are putting multi-agent claudeswarms in charge of their lives, consulting chatbots before every decision, wireheading to a degree only sci-fi writers dared to imagine.

people elsewhere are still trying to get approval to use Copilot in Teams, if they’re using AI at all.

it’s possible the early adopter bubble i’m in has always been this intense, but there seems to be a cultural takeoff happening in addition to the technical one. not ideal!

The early adapter bubble is a fixed amount of calendar time ahead, which is starting to look increasingly large in practice. I am not trying to implement claudeswarms, as I haven’t figured out how to benefit from them given what I’m working on, but I think that’s partly my failure of imagination, partly laziness and lack of time, and partly that I’ve already heavily optimized the workflows that this could automate.

What should I be building? What app needs to exist, even if only for me or you?

Sar Haribhakti (quoting Jasmine Sun): This is spot on: “If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.” Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are.”

The key is that you need Coder Mindset to notice that your problems are program shaped, in the sense of ‘oh I want to do this thing three times’ or ‘I could just tell Claude Code to do that.’

Both Jasmine Sun and I have had Claude Code put together a tool to easily convert a video into a cleaned transcript – I considered using hers but I wanted something a little different and it’s not like rolling my own was hard.

She also has this list of other starter requests: Turn a CSV into a report, make a static website, build a personal tracker app, automate an existing workflow, design a custom game. I’ve mostly been doing workflow automation.

Jasmine Sun: The second-order effect of Claude Code was realizing how many of my problems are not software-shaped. Having these new tools did not make me more productive; on the contrary, Claudecrastination probably delayed this post by a week.

Amanda Askell: Claude Codecrastination: when you avoid the thing you’re supposed to do by cranking out 17 other things you’ve been wanting to do for a while.

Having new tools reduces your productivity while you’re creating and learning them, but if you’re planning well you should turn the corner reasonably quickly.

What it does do is potentially shift your current productivity into long term investments, or things further down on your wishlist. That can be an issue if you need productivity now.

I had Claude resurface texts I forgot to respond to, and realized that the real blocker—obviously—was that I didn’t want to reply.

That is not my experience. If I got over a bunch of unread texts or emails, yes often I don’t want to reply, but there are a bunch that slipped through the cracks.

Yep.

Ash Arora: Overheard in SF:

Person 1: “Rome wasn’t built in a day”

Person 2: “Yes but they didn’t have Claude Code”

Daniel Ost: Rome also didn’t have to pivot every two weeks

Discussion about this post

Claude Code #4: From The Before Times Read More »

with-gpt-5.3-codex,-openai-pitches-codex-for-more-than-just-writing-code

With GPT-5.3-Codex, OpenAI pitches Codex for more than just writing code

Today, OpenAI announced GPT-5.3-Codex, a new version of its frontier coding model that will be available via the command line, IDE extension, web interface, and the new macOS desktop app. (No API access yet, but it’s coming.)

GPT-5.3-Codex outperforms GPT-5.2-Codex and GPT-5.2 in SWE-Bench Pro, Terminal-Bench 2.0, and other benchmarks, according to the company’s testing.

There are already a few headlines out there saying “Codex built itself,” but let’s reality-check that, as that’s an overstatement. The domains OpenAI described using it for here are similar to the ones you see in some other enterprise software development firms now: managing deployments, debugging, and handling test results and evaluations. There is no claim here that GPT-5.3-Codex built itself.

Instead, OpenAI says GPT-5.3-Codex was “instrumental in creating itself.” You can read more about what that means in the company’s blog post.

But that’s part of the pitch with this model update—OpenAI is trying to position Codex as a tool that does more than generate lines of code. The goal is to make it useful for “all of the work in the software lifecycle—debugging, deploying, monitoring, writing PRDs, editing copy, user research, tests, metrics, and more.” There’s also an emphasis on steering the model mid-task and frequent status updates.

With GPT-5.3-Codex, OpenAI pitches Codex for more than just writing code Read More »

watch-kanzi-the-bonobo-pretend-to-have-a-tea-party

Watch Kanzi the bonobo pretend to have a tea party

Such studies have nonetheless been met with skepticism when it comes to interpreting the behavior as evidence of animals’ ability to engage in make-believe. For instance, it’s possible the animals are responding to behavioral cues, like the direction of a gaze, to solve such tasks.

“Kanzi, let’s play a game!”

Enter Kanzi, a 43-year-old bonobo who lives at the Ape Initiative and is capable of responding to verbal prompts, either by pointing or using a lexigram of more than 300 symbols. There had also been anecdotal observations of Kanzi engaging in pretense. Krupenye et al. conducted three distinct experiments with Kanzi, each involving an 18-trial session.

In the first experiment, a scientist would offer a verbal prompt: “Kanzi, let’s play a game! Let’s find the juice!” They would then place two empty transparent cups on a table and pretend to fill them from an empty transparent pitcher, with another verbal prompt (“Kanzi, look!”). The scientist would pretend to pour the “juice” in one of the cups back into the pitcher, placing the pitcher under the table. Then they asked, “Kanzi, where’s the juice?” and recorded which cup the bonobo pointed to first.

“If Kanzi could only track reality (that both cups were empty), he should have chosen at chance between the two options, whereas if his choices were guided by stimulus enhancement, he should have selected the incorrect cup that had been ‘emptied’ above chance,” the authors wrote. “In contrast, if Kanzi could represent the pretend juice, he should have chosen above chance the cup containing the ‘imaginary’ juice, the empty cup that had not been ‘poured’ back into the pitcher. That is exactly what Kanzi did.” Kanzi selected the correct cup 34 out of 50 times (68 percent).

Watch Kanzi the bonobo pretend to have a tea party Read More »

this-black-hole-“burps”-with-death-star-energy

This black hole “burps” with Death Star energy

When AT2018hyz, aka “Jetty,” was first discovered, radio telescopes didn’t detect any signatures of an outflow emission of material within the first few months. According to Cendes, that’s true of some 80 percent of TDEs, so astronomers moved on, preferring to use precious telescope time for more potentially interesting objects. A few years later, radio data from the Very Large Array (VLA) showed that Jetty was lighting up the skies again, spewing out material at a whopping 1.4 millijansky at 5 GHz.

Since then, that brightness has kept increasing. Just how large is the increase? Well, people have estimated the fictional Death Star’s emitted energy in the Star Wars saga, and Jetty McJetface’s emissions are a trillion times more than that, perhaps as much as 100 trillion times the energy. As for why Jetty initially eluded detection, there seems to be a single jet emitting radiation in one direction that might not have been aimed at Earth. Astronomers should be able to confirm this once the energy peaks.

Cendes and her team are now scouring the skies for similar behavior in high-energy TDEs, since the existence of Jetty suggests that delayed outflow is more common than astronomers previously expected. It’s such an unprecedented phenomenon that astronomers haven’t really looked for them before. After all, “If you have an explosion, why would you expect there to be something years after the explosion happened when you didn’t see something before?” said Cendes.

DOI: Astrophysical Journal, 2026. 10.3847/1538-4357/ae286d  (About DOIs).

This black hole “burps” with Death Star energy Read More »