Author name: Mike M.

ai-#67:-brief-strange-trip

AI #67: Brief Strange Trip

I had a great time at LessOnline. It was a both a working trip and also a trip to an alternate universe, a road not taken, a vision of a different life where you get up and start the day in dialogue with Agnes Callard and Aristotle and in a strange combination of relaxed and frantically go from conversation to conversation on various topics, every hour passing doors of missed opportunity, gone forever.

Most of all it meant almost no writing done for five days, so I am shall we say a bit behind again. Thus, the following topics are pending at this time, in order of my guess as to priority right now:

  1. Leopold Aschenbrenner wrote a giant thesis, started a fund and went on Dwarkesh Patel for four and a half hours. By all accounts, it was all quite the banger, with many bold claims, strong arguments and also damning revelations.

  2. Partly due to Leopold, partly due to an open letter, partly due to continuing small things, OpenAI fallout continues, yes we are still doing this. This should wait until after Leopold.

  3. DeepMind’s new scaling policy. I have a first draft, still a bunch of work to do.

  4. The OpenAI model spec. As soon as I have the cycles and anyone at OpenAI would have the cycles to read it. I have a first draft, but that was written before a lot happened, so I’d want to see if anything has changed.

  5. The Rand report on securing AI model weights, which deserves more attention than the brief summary I am giving it here.

  6. You’ve Got Seoul. I’ve heard some sources optimistic about what happened there but mostly we’ve heard little. It doesn’t seem that time sensitive, diplomacy flows slowly until it suddenly doesn’t.

  7. The Problem of the Post-Apocalyptic Vault still beckons if I ever have time.

Also I haven’t processed anything non-AI in three weeks, the folders keep getting bigger, but that is a (problem? opportunity?) for future me. And there are various secondary RSS feeds I have not checked.

There was another big change this morning. California’s SB 1047 saw extensive changes. While many were helpful clarifications or fixes, one of them severely weakened the impact of the bill, as I cover on the linked post.

The reactions to the SB 1047 changes so far are included here.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Three thumbs in various directions.

  4. Language Models Don’t Offer Mundane Utility. Food for lack of thought.

  5. Fun With Image Generation. Video generation services have examples.

  6. Deepfaketown and Botpocalypse Soon. The dog continues not to bark.

  7. They Took Our Jobs. Constant AI switching for maximum efficiency.

  8. Get Involved. Help implement Biden’s executive order.

  9. Someone Explains It All. New possible section. Template fixation.

  10. Introducing. Now available in Canada. Void where prohibited.

  11. In Other AI News. US Safety Institute to get model access, and more.

  12. Covert Influence Operations. Your account has been terminated.

  13. Quiet Speculations. The bear case to this week’s Dwarkesh podcast.

  14. Samuel Hammond on SB 1047. Changes address many but not all concerns.

  15. Reactions to Changes to SB 1047. So far coming in better than expected.

  16. The Quest for Sane Regulation. Your random encounters are corporate lobbyists.

  17. That’s Not a Good Idea. Antitrust investigation of Nvidia, Microsoft and OpenAI.

  18. The Week in Audio. Roman Yampolskiy, also new Dwarkesh Patel is a banger.

  19. Rhetorical Innovation. Innovative does not mean great.

  20. Oh Anthropic. I have seen the other guy, but you are not making this easy.

  21. Securing Model Weights is Difficult. Rand has some suggestions.

  22. Aligning a Dumber Than Human Intelligence is Still Difficult. What to do?

  23. Aligning a Smarter Than Human Intelligence is Difficult. SAE papers continue.

  24. People Are Worried About AI Killing Everyone. Various p(doom)s.

  25. Other People Are Not As Worried About AI Killing Everyone. LeCun fun.

  26. The Lighter Side. Why, yes. Yes I did.

Did AI pass a restaurant review ‘Turing test,or did human Yelp reviewers fail it? This is unsurprising, since the reviews seemingly were evaluated in isolation. Writing short bits like this is the wheelhouse. At minimum, you need to show the context, meaning the other information about the restaurant, including other reviews.

Via David Brin, goblin.tools has a formalizer, to change the tone of your text. You can of course do better with a normal LLM but an easier interface and no startup costs can go a long way.

Start making your Domino’s ‘pizza’ before you are done ordering it. The fun writes itself, also kind of amazingly great. I am hungry now.

It seems McDonalds does this too with its fries. My guess is this is more ‘we have enough fungible fries orders often enough that we simply make fries continuously’ rather than ‘we know you in particular will want fries’ but I could be wrong.

Would you like an extra thumb? Why yes I would. What’s funny is you can run a mental experiment to confirm that you’re totally capable of learning to use it if the machine can read the impulses. Plausibly super awesome. Mandatory jokes are in the comments if you scroll down.

Have adult level theory of mind, up to 6th order inferences.

Aid in drug development. No idea how much it helps, but all help is great.

Predict out-of-distribution salt crystal formation, with correct structures, while running a simulation. Suggestive of material science work being possible without physical experimentation.

Garry Tan endorses Perplexity for search if you want well-cited answers. I agree with Arun’s reply, Perplexity is great but only shines for narrow purposes. The ‘well-cited’ clause is doing a lot of work.

Use Gemini 1.5 Flash for many purposes, because it is fast, cheap and good enough. Sully has long been a proponent of cheap and fast and good enough.

Not yet, anyway.

Shoshana Weissmann: ugh really wanna jailbreak my nordictrack.

Brian Chen (New York Times) is not impressed by the new GPT-4, failing to see much improvement other than speed, saying he definitely wouldn’t let it tutor his child. This was a standard ‘look for places the AI fails’ rather than looking for where it succeeds. A great illustration is when he notes the translations were good, but that the Chinese accents were slightly off. Yes, okay, let’s improve the accents, but you are missing the point. Any child, or any adult, not using AI to learn is missing out.

Erik Wiffin warns of counterfeit proofs of thought. What happens if all those seemingly useless project plans and self-reports were actually about forcing people to think and plan? What if the plan was worthless, but the planning was essential, and now you can forge the plan without the planning? Whoops. Zizek’s alternative is that your LLM writes the report, mine reads it and now we are free to learn. Which way, modern worker?

  1. For the self-report I lean towards Zizek. This is mostly a test to see how much bullshit you dare write down on a page before you think you’ll be called out on it, a key way that the bullshitters collude to get ahead at the expense of those who don’t know to go along or have qualms about doing so.

    1. The idea that ‘your manager already knows’ might be true in some places, but it sure is not in others.

    2. I can’t remember the last time I knew someone who thought ‘writing this mandatory corporate self-report taught me so many valuable lessons’ because no one I know is that naive.

  2. The project plan seems more plausibly Wriffin’s territory. You should have to form a plan. That does not mean that the time spent turning that plan into a document that looks right is time well spent. So the goal is to get the manager to do the actual planning – Alice makes the Widgets, David buys the Thingamabobs. Then the LLM turns that into a formal document.

Look, I am not a master of war, but if I was Air Force Secretary Frank Kendall then I would presume that the point of an F-16 flying with AI was that I did not have to be inside that F-16 during simulated combats. He made a different decision. I mean, all right, yes, show of confidence, fun as hell, still I suppose that is one of many reasons I am not the secretary of the air force.

The funny alternative theory is this was so the other humans would let the AI win.

Still no worthy successor to AI dungeon, despite that being a flimsy base model wrapper and being a great product until you ran into its context window limits. The ‘put AI into games and interactive worlds’ developers are letting us down. Websim is kind of the modern version, perhaps, and Saerain Trismegistus mentions NovelAI.

Examples from Google’s video generation AI Veo.

Examples from a Chinese video generation service, 2 minutes, 30fps, 1080p.

Indian (and Mexican) elections latest to not have serious AI-related issue, despite this wild report?

Ate-a-Pi: 🤩 AI in politics in 🇮🇳

> Politicians are voluntarily deepfaking themselves

> to dub their message into the 22 languages widely spoken in India

> 50 million AI voice clone calls in the last month

> resurrecting deceased party leaders to endorse current candidates (the cult of personality never ends.. when Lee Kuan Yew?)

> super small teams – leading firm has 10 employees, founder dropped out of college after learning to make deepfakes on Reddit during the COVID breakdowns (for every learning loss.. there was a learning gain 🤣)

> authorized by politicians but not disclosed to voters. Many voters believe the calls are real and that the pols actually spoke to them

> typical gap in quality AI “promises a Ferrari but delivers a Fiat”

> fine tuning Mistral to get better results

This too will come to the 🇺🇸

The future of politics is having a parasocial relationship with your favorite politician, and their AI version being part of your brain trust, advisory board.

Kache: Americans don’t realize that in india and pakistan, people watch AI generated shorts of political leaders and believe that they are real.

This is such a wild equilibrium. Everyone gets to clone their own candidates and backers with AI, no one does fakes of others, the voters believe all of it is real.

Not a bad equilibrium. Yes, voters are fooled, but it is a ‘fair fooling’ and every message is intended by the candidate and party it purports to be from. This presumably is the least stable situation of all time and won’t happen again. The people will realize the AIs are fake. Also various actors will start trying to fake others using AI, but perhaps punishment and detection can actually work there?

One might think about the ‘leave your stroller at the playground and not pay attention’ technology. Yes, someone could try to steal it, but there is at least one person in the playground who would get very, very angry with you if they notice you trying to do that. What makes yours yours is not that you can prove it is yours, but that when you try to take it, you know no one will object.

People worry AI will be used to generate misinformation and people won’t be able to tell the difference. It is worth remembering the current state of misinformation generation and spreading technology, which is best summarized as:

Joseph Menn (Washington Post): News site editor’s ties to Iran, Russia show misinformation’s complexity.

Matthew Yglesias: This doesn’t actually seem that complicated.

A metaphorical TaskRabbit for AI hires, potentially something like EquiStamp, could be the efficient way to go. Alex Tabarrok suggests we may continuously evaluate, hire and fire AIs as relative performance, speed and cost fluctuate. Indeed, the power users of AI do this, and I am constantly reassessing which tools to use for which jobs, same as any other tool. A lot of this is that right now uses are mostly generic and non-integrated. It is easy to rotate. When we have more specialized tools, and need more assurance of consistent responses, it will be more exciting to stick to what you know relative to now.

What happens when you figure out how to have AI do your job and no one cares?

Fellowship for $190k/year to help implement Biden’s executive order. Deadline is June 12 or if they get 100 applicants, so act quickly. The median salary in AI is $326k, so this is not that high, but it is highly livable.

Prize of $500k for using AI to converse with animals.

Near: the experts have chimed in and have concerns that talking to animals might be bad. Lu0ckily I am going to ignore them and do it anyway!

Marcello Herreshoff explains the idea of ‘template fixation.’ If your question is close enough to a sufficiently strong cliche, the cliche gets applied even if it does not make sense. Hence the stupid answers to river crossing questions or water pouring tests or other twists on common riddles. If we can’t find a way to avoid this, math is going to remain tough. It is easy to see why this would happen.

OpenAI for nonprofits, essentially a discounted subscription.

Claude now available in Canada. Finally?

Flash Diffusion, for improving training diffusion models.

Leading AI companies agree to share their models with the US AI Safety Institute for pre-deployment testing. Link is to this story, which does not list which labs have agreed to do it, although it says there was no pushback.

NewsCorp’s deal with OpenAI, which includes the Wall Street Journal, is no joke, valued at over $250 million ‘in cash and credits’ over five years. The NewsCorp market cap is 15.8 billion after rising on the deal, so this is over 1% of the company over only five years. Seems very hard to turn down that kind of money. One key question I have not seen answered is to what extent these deals are exclusive.

Futurism notes that legacy media is getting these rich deals, whereas non-legacy media, which includes right-wing media, gets none of it so far. Robin Hanson summarizes this as ‘AI will lean left.’ We already have strong evidence AI will lean left in other ways, and this seems like an echo of that, mostly reflective of reality.

AI model for ECGs using vision transformer architecture gets state of the art performance with less training data.

Yes, AI models are mimics of deeply WEIRD data sets, so they act and respond in a Western cultural context. If you would rather create an AI that predicts something else, that most customers would want less but some would want more, that seems easy enough to do instead.

Tesla buying GPUs faster than it has places to plug them in.

Google getting rid of San Francisco office space. Prices for office space are radically down for a reason, work from home reduces needed space and if the landlords wouldn’t play ball you can relocate to another who will, although I see no signs Google is doing that. Indeed, Google is shrinking headcount, and seems to be firing people semi-randomly in doing so, which is definitely not something I would do. I would presume that doing a Musk-style purge of the bottom half of Google employees would go well. But you can only do that if you have a good idea which half is which.

OpenAI details its protocols for securing research infrastructure for advanced AI, also known as protecting its core algorithms and model weights. I leave it to others to assess how strong these precautions are. No matter what else might be happening at OpenAI, this is one team you do want to root for.

The gender divide in AI.

Kesley Piper: The Computing Research Association annual survey found that 18% of graduates from AI PhD programs are women.

Women are smarter than men. They avoid academic PhDs and OpenAI.

WiserAI has a community white paper draft. I am not looking further because triage, will revisit when it is final perhaps. If that is a mistake, let me know.

OpenAI terminates five user accounts that were attempting to use OpenAI’s models to support ‘covert influence operations,’ which OpenAI defines as ‘attempts to manipulate public opinion or influence political outcomes without revealing the true identity of the actors behind them.’ Full report here.

Specifically, the five are Russian actors Bad Grammer and Doppelganger, Chinese actor Spamouflage that was associated with Chinese law enforcement, Iranian actor International Union of Virtual Media and actions by the Israeli commercial company Stoic.

Note the implications of an arm of China’s law enforcement using ChatGPT for this.

OpenAI believes that the operations in question failed to substantially achieve their objectives. Engagement was not generated, distribution not achieved. These times.

What were these accounts doing?

Largely the same things any other account would do. There are multiple mentions of translation between languages, generating headlines, copy editing, debugging code and managing websites.

As OpenAI put it, ‘productivity gains,’ ‘content generation’ and ‘mixing old and new.’

Except, you know, as a bad thing. For evil, and all that.

They also point to the common theme of faking engagement, and arguably using ChatGPT for unlabeled content generation (as opposed to other productivity gains or copy editing) is also inherently not okay as well.

Ordinary refusals seem to have played a key role, as ‘threat actors’ often published the refusals, and the steady streams of refusals allowed OpenAI to notice threat actors. Working together with peers is also reported as helpful.

The full report clarifies that this is sticking to a narrow definition I can fully support. What is not allowed is pretending AI systems are people, or attributing AI content to fake people or without someone’s consent. That was the common theme.

Thus, Sam Altman’s access to OpenAI’s models will continue.

Preventing new accounts from being opened by these treat actors seems difficult, although this at least imposes frictions and added costs.

There are doubtless many other ‘covert influence operations’ that continue to spam AI content while retaining access to OpenAI’s models without disruption.

One obvious commonality is that all five actors listed here had clear international geopolitical goals. It is highly implausible that this is not being done for many other purposes. Until we are finding (for example) the stock manipulators, we have a long way to go.

This is still an excellent place to start. I appreciate this report, and would like to see similar updates (or at least brief updates) from Google and Anthropic.

The bear case on Nvidia. Robin Hanson continues to say ‘sell.

Claims from Andrew Cote about consciousness, the nature of reality and also of LLMs. I appreciate the model of ‘normiehood’ as a human choosing outputs on very low temperature.

Might we soon adorn buildings more because it is easier to do so via LLMs?

WSJ’s Christopher Mins says ‘The AI Revolution is Already Losing Steam.’ He admits my portfolio would disagree. He says AIs ‘remain ruinously expensive to run’ without noticing the continuing steady drop in costs for a given performance level. He says adoption is slow, which it isn’t compared to almost any other technology even now. Mostly, another example of how a year goes by with ‘only’ a dramatic rise in speed and reduction in cost and multiple players catching up to the leader and the economy not transformed and stocks only way up and everyone loses their minds.

I think that is behind a lot what is happening now. The narratives in Washington, the dismissal by the mainstream of both existential risks and even the possibility of real economic change. It is all the most extreme ‘what have you done for me lately,’ people assuming AI will never be any better than it is now, or it will only change at ‘economic normal’ rates from here.

Thus, my prediction is that when GPT-5 or another similar large advance does happen, these people will change their tune for a bit, adjust to the new paradigm, then memory hole and go back to assuming that AI once again will never advance much beyond that. And so on.

He’s joking, right?

Eliezer Yudkowsky: The promise of Microsoft Recall is that extremely early AGIs will have all the info they need to launch vast blackmail campaigns against huge swathes of humanity, at a time when LLMs are still stupid enough to lose the resulting conflict.

Rohit: I’d read this novel!

A lot of users having such a honeypot on their machines for both blackmail and stealing all their access and their stuff certainly does interesting things to incentives. One positive is that you encourage would-be bad actors to reveal themselves, but you also empower them, and you encourage actors to go bad or skill up in badness.

Dan Hendrycks questions algorithmic efficiency improvements, notes that if (GPT-)4-level models were now 10x cheaper to train we would see a lot more of them, and that secondary labs should not be that far behind. I do not think we should assume that many labs are that close to OpenAI in efficiency terms or in ‘having our stuff together’ terms.

Papers I analyze based on the abstract because that’s all the time we have for today: Owen Davis formalizes ways in which AI could weaken ‘worker power’ distinct from any impacts on labor demand, via management use of AI. The obvious flaw is that this does not mention the ability of labor to use AI. Labor can among other applications use AI to know when it is being underpaid or mistreated and to greatly lower switching costs. It also could allow much stronger signals of value, allowing workers more ability to switch jobs. I would not be so quick to assume ‘worker power’ will flow in one direction or the other in a non-transformative AI world.

Samuel Hammond wrote a few days ago in opposition to SB 1047, prior to the recent changes. He supports the core idea, but worries about particular details. Many of his concerns have now been addressed. This is the constructive way to approach the issue.

  1. He objected to the ‘otherwise similar general capability’ clause on vagueness grounds. The clause has been removed.

  2. He warns of a ‘potential chilling effect on open source,’ due to inability to implement a shutdown clause. The good news is that this was already a misunderstanding of the bill before the changes. The changes make my previous interpretation even clearer, so this concern is now moot as well.

    1. And as I noted, the ‘under penalty of perjury’ is effectively pro forma unless you are actively lying, the same as endless other government documents with a similar rules set.

  3. Also part of Samuel’s stated second objection: He misunderstands the limited duty exemption procedure, saying it must be applied for before training, which is not the case.

    1. You do not apply for it, you flat out invoke it, and you can do this either before or after training.

    2. He warns that you cannot predict in advance what capabilities your model will have, but in that case the developer can invoke before training and only has to monitor for unexpected capabilities and then take back the exemption (without punishment) if that does happen.

    3. Or they can wait, and invoke after training, if the model qualifies.

  4. Samuel’s third objection is the fully general one, and still applies: that model creators should not be held liable for damages caused by their models, equating it to what happens if a hacker uses a computer. This is a good discussion to have. I think it should be obvious to all reasonable parties that both that (a) model creators should not be liable for harms simply because the person doing harm used the model while doing the harm and (b) that model creators need to be liable if they are sufficiently negligent or irresponsible, and sufficiently large harm results. This is no different than most other product harms. We need to talk procedure and price.

    1. In the case of the SB 1047 proposed procedure, I find it to be an outlier in how tight are the requirements for a civil suit. Indeed, under any conditions where an AI company was actually held liable under SB 1047 for an incident with over $500 million in damages, I would expect that company to already also be liable under existing law.

    2. I strongly disagree with the idea that if someone does minimal-cost (relative to model training costs) fine tuning to Llama-4, this should absolve Meta of any responsibility for the resulting system and any damage that it does. I am happy to have that debate.

    3. There was indeed a problem with the original derivative model clause here, which has been fixed. We can talk price, and whether the 25% threshold now in SB 1047 is too high, but the right price is not 0.001%.

  5. Samuel also objects to the ‘net neutrality’ style pricing requirements on GPUs and cloud services, which he finds net negative but likely redundant. I would be fine with removing those provisions and I agree they are minor.

  6. He affirms the importance of whistleblower provisions, on which we agree.

  7. Samuel’s conclusion was that SB 1047 ‘risks America’s global AI leadership outright.’ I would have been happy to bet very heavily against such impacts even based on the old version of the bill. For the new version, if anyone wants action on that bet, I would be happy to give action. Please suggest terms.

In the realm of less reasonable objections that also pre-dated the recent changes, here is the latest hyperbolic misinformation about SB 1047, here from Joscha Bach and Daniel Jeffreys, noted because of the retweet from Paul Graham. I have updated my ‘ignorables list’ accordingly.

My prediction on Twitter was that most opponents of SB 1047, of which I was thinking especially of its most vocal opponents, would not change their minds.

I also said we would learn a lot, in the coming days and weeks, from how various people react to the changes.

So far we have blissfully heard nothing from most of ‘the usual suspects.’

Dan Hendrycks has a thread explaining some of the key changes.

Charles Foster gets the hat tip for alerting me to the changes via this thread. He was also the first one I noticed that highlighted the bill’s biggest previous flaw, so he has scored major RTFB points.

He does not offer a position on the overall bill, but is very good about noticing the implications of the changes.

Charles Foster: In fact, the effective compute threshold over time will be even higher now than it would’ve been if they had just removed the “similar performance” clause. The conjunction of >10^26 FLOP *and>$100M means the threshold rises with FLOP/$ improvements.

It is “Moore’s law adjusted” if by that you mean that the effective compute threshold will adjust upwards in line with falling compute prices over time. And also in line with $ inflation over time.

He also claims this change, which I can’t locate:

– “Hazardous capability” now determined by marginal risk over existing *nonexemptcovered models

If true, then that covers my other concern as well, and should make it trivial to provide the necessary reasonable assurance if you are not pushing the frontier.

Finally, he concludes:

Charles Foster: I think the bill is significantly better now. I didn’t take a directly pro- or anti-stance before, and IDK if I will in the future, but it seems like the revised bill is a much better reflection of the drafters’ stated intentions with fewer side effects. That seems quite good.

Andrew Critch, who was skeptical of the bill, approves of the covered model change.

Andrew Critch: These look like good changes to me. The legal definition of “Covered Model” is now clearer and more enforceable, and creates less regulatory uncertainty for small/non-incumbent players in the AI space, hence more economic fairness + freedom + prosperity. Nice work, California!

I think I understand the rationale for the earlier more restrictive language, but I think if a more restrictive definition of “covered model” is needed in the future, lowering numerical threshold(s) will be the best way to achieve that, rather than debating the meaning of the qualitative definition(s). Clear language is crucial for enforcement, and the world *definitelyneeds enforceable AI safety regulations. Progress like this makes me proud to be a resident of California.

Nick Moran points to the remaining issue with the 25% rule for derivative models, which is that if your open weights model is more than 4x over the threshold, then you create a window where training ‘on top of’ your model could make you responsible for a distinct otherwise covered model.

In practice I presume this is fine – both no one is going to do this, and if they did no one is going to hold you accountable for something that clearly is not your fault and if they did the courts would throw it out – but I do recognize the chilling effect, and that in a future panic situation I could be wrong.

The good news is there is an obvious fix, now that the issue is made clear. You change ‘25% of trained compute’ to ‘either 25% of trained compute or sufficient compute to qualify as a covered model.’ That should close the loophole fully, unless I missed something.

I have been heartened by the reactions of those in my internet orbit who were skeptical but not strongly opposed. There are indeed some people who care about bill details and adjust accordingly.

Dean Ball was the first strong opponent of SB 1047 I have seen respond, indeed he did so before I could read the bill changes let alone register any predictions. As opponents go, he has been one of the more reasonable ones, although we still strongly disagree on many aspects.

His reaction admits The Big Flip up front, then goes looking for problems.

Dean Ball: SB 1047 has been amended, as Senator Wiener recently telegraphed. My high-level thoughts:

1. There are some good changes, including narrowing the definition of a covered model

2. The bill is now more complex, and arguably harder for devs to comply with.

Big picture: the things that the developer and academic communities hated about SB 1047 remain: generalized civil and criminal liability for misuse beyond a developer’s control and the Frontier Model Division.

It is strictly easier to comply in the sense that anything that complied before complies now, but if you want to know where the line is? Yeah, that’s currently a mess.

I see this partly as the necessary consequence of everyone loudly yelling about how this hits the ‘little guy,’ which forced an ugly metric to prove it will never, ever hit the little guy, which forces you to use dollars where you shouldn’t.

That is no excuse for not putting in a ‘this is how you figure out what the market price will be so you can tell where the line is’ mechanism. We 100% need some mechanism.

The obvious suggestion is to have a provision that says the FMD publish a number once a (week, month or year) that establishes the price used. Then going forward you can use that to do the math, and it can at least act as a safe harbor. I presume this is a case of ‘new provision that no one gamed out fully’ and we can fix it.

Dean next raises a few questions about the 25% threshold for training (which the developer must disclose), around questions like the cost of synthetic data generation. My presumption is that data generation does not count here, but we could clarify that either way.

He warns that there is no dollar floor on the 25%, but given you can pick the most expansive open model available, it seems unlikely this threshold will ever be cheap to reach in practice unless you are using a very old model as your base, in which case I suppose you fill out the limited duty exemption form with ‘of course it is.’

If you want to fix that at the cost of complexity, there are various ways to cover this corner case.

Ball mentions the safety assurances. My non-lawyer read was that the changes clarify that this is the ‘reasonable assurance’ standard they use in other law and not anything like full confidence, exactly to (heh) provide reasonable assurance that the rule would be reasonable. If lawyers or lawmakers think that’s wrong let me know, but there is a particular sentence inserted to clarify exactly that.

He also mentions that Weiner at one point mentioned Trump in the context of the executive order. It was a cheap shot as phrased given no one knows what Trump ultimately thinks about AI, and I wish he hadn’t said that, but Trump has indeed promised to repeal Biden’s executive order on AI, so the actual point – that Congress is unlikely to act and executive action cannot be relied upon to hold – seems solid.

In a follow-up he says despite the changes that SB 1047 is still ‘aimed squarely at future generations of open-source foundation models.’ I rather see open models as having been granted exemptions from several safety provisions exactly because those are forms of safety that open models cannot provide, and their community making special pleading that they should get even more of a free pass. Requiring models adhere to even very light safety requirements is seen as ‘aimed squarely at open source’ exactly because open models make safety far more difficult.

Dean also notes here that Senator Weiner is making a good case for federal preemption of state policies. I think Weiner would to a large extent even agree with this, that it would be much better if the federal government acted to enact a similar law. I do not see California trying to override anything, rather it is trying to fill a void.

Danielle Fong here notes the bill is better, but remains opposed, citing the fee structure and general distrust of government.

Here is a good, honest example of ‘the price should be zero’:

Godoglyness: No, because the bill still does too much. Why should 10^26 and 100 million dollars be a cutoff point?

There shouldn’t be any cutoffs enacted until we have actual harms to calibrate on

It’s good the impact of the bill is diminished but it’s bad it still exists at all

Remember how some folks thought GPT2 would be dangerous? Ridiculous in retrospect, but…

We shouldn’t stop big training runs because of speculative harms when the speculation has failed again and again to anticipate the form/impact/nature of AI systems.

If you think ‘deal with the problems post-hoc after they happen’ is a superior policy, then of course you should oppose the bill, and be similarly clear on the logic.

Similarly, if your argument is ‘I want the biggest most capable possible open models to play with regardless of safety concerns and this might interfere with Meta opening the weights of Llama-N’ and I will oppose any bill that does that, then yes, that is another valid reason to oppose the bill. Again, please say that.

That is very different from misrepresenting the bill, or claiming it would impact people it even more explicitly than before does not impact.

On that note, here is Andrew Ng ignoring the changes and reiterating past arguments in ways that did not apply to the original bill and apply even less now that the comparison point for harm has been moved. For your model to be liable, it has to enable the actions in a way that non-covered models and models eligible for limited duty exemptions would not. Andrew Ng mentions that all current models can be jailbroken, but I do not see how that should make us intervene less. Ultimately he is going for the ‘only regulate applications’ approach that definitely won’t work. Arvind Narayanan calls it a ‘nice analysis.’

TIDNL, featuring helpfully clear section headlines like “Corporate America Looks to Control AI Policy” and section first sentences such as “Corporate interests are dominating lobbying on AI issues.”

Luke Muehlhauser: No surprise: “85 percent of the lobbyists hired in 2023 to lobby on AI-related issues were hired by corporations or corporate-aligned trade groups”

[thread contains discussion on definition of lobbying, linked to here.]

Public Citizen: Corporations, trade groups and other organizations sent more than 3,400 lobbyists to lobby the federal government on AI-related issues in 2023, a 120 percent leap from 2022.

  • AI is not just an issue of concern for AI and software corporations: While the tech industry was responsible for the most AI-related lobbyists in 2023 – close to 700 – the total amounts to only 20 percent of all the AI-related lobbyists deployed. Lobbyists from a broad distribution of industries outside of tech engaged in AI-related issues, including financial services, healthcare, telecommunications, transportation, and defense.

  • 85 percent of the lobbyists hired in 2023 to lobby on AI-related issues were hired by corporations or corporate-aligned trade groups. The Chamber of Commerce was responsible for the most AI-related lobbyists, 81, followed by Intuit (64), Microsoft (60), the Business Roundtable (42), and Amazon (35).

  • OpenSecrets found that groups that lobbied on AI in 2023 spent a total of $957 million lobbying the federal government on all issues that year. [Note that this is for all purposes, not only for AI]

  • An analysis of the clients revealed that while many clients resided in the tech industry, they still only made up 16% of all clients by industry.

The transportation sector, which ranked sixth for having the most clients lobby on AI-related issues, has engaged heavily on policies regarding autonomous vehicles.

In the defense sector, 30 clients hired a combined total of 168 lobbyists to work on AI issues. Given the U.S. Department of Defense and military’s growing interest in AI, defense companies that are often major government contractors have been increasingly implementing AI for military applications.

…in August 2023 the Pentagon announced a major new program, the Replicator Initiative, that aim to rely heavily on autonomous drones to combat Chinese missile strength in a theoretical conflict over Taiwan or at China’s eastern coast.

Look. Guys. If you are ever tempted to call something the Replicator Initiative, there are three things to know.

  1. Do not do the Replicator Initiative.

  2. Do not do the Replicator Initiative.

  3. Do not do the Replicator Initiative.

Also, as a bonus, at a bare minimum, do not call it the Replicator Initiative.

As federal agencies move forward with developing guardrails for AI technologies, stakeholders will likely rely even more on their lobbyists to shape how AI policy is formed.

You know one way to know your guardrails are lacking?

You called a program the Replicator Initiative.

Yes, expect tons of lobbying, mostly corporate lobbying.

Where will they lobby? It seems the White House is the place for the cool kids.

So who is involved?

Even in cases where at first glance a lobbying entity may not appear to be representing corporate interests, digging deeper into partnerships and collaborations revealed that non-profit interests are often deeply intertwined with corporate ones as well.

Only five of the top 50 lobbying entities responsible for the most AI-related lobbyists in 2023 were not representing corporate interests. Two of the five were large hospitals – the Mayo Clinic and The New York and Presbyterian Hospital – while the other three were the AFL-CIO, AARP, and the National Fair Housing Alliance. None of the five were in the top ten

Did you notice any names not on that list?

Most of that lobbying is highly orthogonal to the things generally discussed here. Hospitals are presumably concerned primarily with health care applications and electronic medical records. That was enough for multiple hospital groups to each outspend all lobbying efforts towards mitigating existential risk.

Adam Thierer implores us to just think of the potential, reminds us to beat China, urges ‘pro-innovation’ AI policy vision. It’s a Greatest Hits on so many levels. The core proposal is that ‘the time is now’ to… put a moratorium on any new rules on AI, and preempt any potential state actions. Do nothing, only more so.

Gavin Newsom warns about the burdens of overregulation of AI and the threat it would pose to California’s leadership on that, but says the state has ‘an obligation to lead’ because AI was invented there.

To be completely fair to Newsom this is not the first time he has warned about overregulation – he did it in 2004 regarding the San Francisco business permitting process, which is a canonical insane example of overregulation, and he has indeed taken some ‘concrete steps’ as governor to streamline some regulatory burdens, including an executive order and signing AB 1817. But also:

As usual in politics, this is both-sides applause light talk that does not tell you the price. The price is not going to be zero, nor would that be wise even if there was no existential risk, any more than we should have no laws about humans. The price is also a cost, and setting it too high would be bad.

The world as it is: FEC fighting FCC’s attempt to require political ads to disclose that they used AI, saying FCC lacks jurisdiction, and finding it ‘deeply troubling’ that they want this in place before the election with it happening so soon. How is ‘political ads that use AI tell us they are using AI’ not one of the things we can all agree upon?

You know what is a really, really bad idea?

Going after AI companies with antitrust enforcement.

Josh Sisco (Politico): The Justice Department and Federal Trade Commission are nearing an agreement to divvy up investigations of potential anticompetitive conduct by some of the world’s largest technology companies in the artificial intelligence industry, according to three people with knowledge of the negotiations.

As part of the arrangement, the DOJ is poised to investigate Nvidia and its leading position in supplying the high-end semiconductors underpinning AI computing, while the FTC is set to probe whether Microsoft, and its partner OpenAI, have unfair advantages with the rapidly evolving technology, particularly around the technology used for large language models.

The deal has been negotiated for nearly a year. And while leaders of both agencies have expressed urgency in ensuring that the rapidly growing artificial intelligence technology is not dominated by existing tech giants, until an agreement is finalized, there was very little investigative work they could do.

Fredipus Rex: Also, how in the world is OpenAI, which loses money on a mostly free product that has existed for two years and which is in a constant game of monthly technological leapfrog with a bunch of competitors in any possible way a “monopoly”?

Ian Spencer: Microsoft and OpenAI have nothing even remotely resembling monopolies in the AI space. Nvidia is facing competition everywhere, despite being clear market leaders.

It’s ASML and TSMC who have truly dominant market positions thanks to their R&D, and neither of them is based in the US.

Shoshana Weissmann: Every day is pain.

I do not know whether to laugh or cry.

A year ago they wanted to start an antitrust investigation, but it took that long to negotiate between agencies?

The antitrust was based on the idea that some companies in technological races had currently superior technologies and were thus commanding large market shares while rapidly improving their products and what you could get at a given price, and producing as fast as they could require input components?

Perhaps the best part is that during that year, during which OpenAI has been highly unprofitable in order to fight for market share and develop better products, two distinct competitors caught up to OpenAI and are now offering comparable products, although OpenAI likely will get to the next generation level first.

Or is the best part that Microsoft so little trusts OpenAI that they are spending unholy amounts of money to engage in direct competition with them?

Meanwhile Nvidia faces direct competition on a variety of fronts and is both maximizing supply and rapidly improving its products while not charging anything like the market clearing price.

This from the people who brought you ‘Google monopolized search,’ ‘Amazon prices are too high,’ ‘Amazon prices are too low’ and ‘Amazon prices are suspiciously similar.’

As Ian notes, in theory one could consider ASML or TSMC as more plausible monopolies, but neither is exploiting its position, and also neither is American so we can’t go after them. If anything I find the continued failure of both to raise prices to be a confusing aspect of the world.

It is vital not only not to prosecute companies like OpenAI for antitrust. They vitally need limited exemptions from antitrust, so that if they get together to collaborate on safety, they need not worry the government will prosecute them for it.

I have yet to see a free market type who wants to accelerate AI and place absolutely no restrictions on its development call for this particular exemption.

Lex Fridman talks to Roman Yampolskiy, as played by Jeff Goldblum, and Lex does not miss the central point.

Lex Fridman: Here’s my conversation with Roman Yampolskiy, AI safety researcher who believes that the chance of AGI eventually destroying human civilization is 99.9999%. I will continue to chat with many AI researchers & engineers, most of whom put p(doom) at <20%, but it's important to balance those technical conversations by understanding the long-term existential risks of AI. This was a terrifying and fascinating discussion.

Others, not so much.

Elon Musk: 😁

If you are interested in communication of and debate about existential risk, this is a podcast worth listening to. I could feel some attempts of Roman’s working because they worked well, others working by playing to Lex’s instincts in strange ways, others leading into traps or bouncing off before the reactions even happened. I saw Lex ask some very good questions and make some leaps, while being of all the Lex Fridmans the Lex Fidmanest in others. It is amazing how much he harps on the zoo concept as a desperate hope target, or how he does not realize that out of all the possible futures, most of the ones we can imagine and find interesting involve humans because we are human, but most of the configurations of atoms don’t involve us. And so on.

Also it is unfortunate (for many purposes) that Roman has so many additional funky views such as his perspective on the simulation hypothesis, but he is no doubt saying what he actually believes.

Of course, there is also Dwarkesh Patel talking to Leopold Aschenbrenner for 4.5 hours. I have been assured this is an absolute banger and will get to it Real Soon Now.

Request for best philosophical critique against AI existential risk. I am dismayed how many people exactly failed to follow the directions. We need to do better at that. I think the best practical critique is to doubt that we will create AGI any time soon, which may or may not be philosophical depending on details. It is good to periodically survey the answers out there.

Your periodic reminder that there are plenty of people out there on any high stakes topic who are ‘having a normal one,’ and indeed that a lot of people’s views are kind of crazy. And also that in-depth discussions of potential transformationally different future worlds are going to sound weird at times if you go looking for weirdness. As one commenter notes, if people keep retweeting the crazytown statements but not the people saying sanetown statements, you know what you will see. For other examples, see: Every political discussion, ever, my lord, actually please don’t, I like you.

For those trying to communicate nuance instead, it remains rough out there.

Helen Toner: Trying to communicate nuance in AI rn be like

Me: people think xrisk=skynet, but there are lots of ways AI could cause civilization-scale problems, and lots of throughlines w/today’s harms, so we shouldn’t always have those conversations separately

Headline writer: skynet dumb

Helen Toner: If you want to hear my full answer, it starts about 33: 15 here.

(and the article itself is fine/good, no shade to Scott. it’s just always those headlines…)

In this case the article is reported to be fine, but no, in my experience it is usually not only the headlines that are at issue.

An illustration.

Liron Shapira: Will humanity be able to determine which ASI behavior is safe & desirable by having it output explanations and arguments that we can judge?

Some argue yes. Some argue no. It’s tough to judge.

SO YOU SEE WHY THE ANSWER IS OBVIOUSLY NO.

That does not rule out all possible outs, but it is a vital thing to understand.

I am confident LLMs are not sentient or conscious, but your periodic reminder that the argument that they don’t have various biological or embodied characteristics is a terrible one, and Asimov’s prediction of this reaction was on point.

A few things going on here.

Jeffrey Ladish: I’m a bit sad about the state of AI discourse and governance right now. Lot of discussions about innovation vs. safety, what can / should the government actually do… but I feel like there is an elephant in the room

We’re rushing towards intelligent AI agents that vastly outstrip human abilities. A new non-biological species that will possess powers wonderful and terrible to behold. And we have no plan for dealing with that, no ability to coordinate as a species to avoid a catastrophic outcome

We don’t know exactly when we’ll get AI systems with superhuman capabilities… systems that can strategize, persuade, invent new technologies, etc. far better than we can. But it sure seems like these capabilities are in our sights. It sure seems like the huge investments in compute and scale will pay off, and people will build the kinds of systems AI risk researchers are most afraid of

If decision makers around the world can’t see this elephant in the room, I worry anything they try to do will fall far short of adequate.

Ashley Darkstone: Maybe if you and people like you stopped using biological/animist terms like “species” to refer to AI, you’d be taken more seriously.

Jeffrey Ladish: It’s hard to talk about something that is very different than anything that’s happened before. We don’t have good language for it. Do you have language you’d use to describe a whole other class of intelligent agent?

Ashley Darkstone: Only language specific to my work. We’ll all have to develop the language over time, along with the legalism, etc.

Species has specific implications to people. Life/Slavery/Evolution.. Biological/Human things that need not apply. It’s fearmongering.

AI should be a selfless tool.

Jeffrey Ladish: Maybe AI should be a selfless tool, but I think people train powerful agents

I studied evolutionary biology in college and thought a fair bit about different species concepts, all imperfect 🤷

“fearmongering” seems pretty dismissive of the risks at hand

is Darkstone objecting to the metaphorical use of a biological term because it is more confusing than helpful, more heat than light? Because it is technically incorrect, the worst kind of incorrect? Because it is tone policing?

Or is it exactly because of her belief that ‘AI should be a selfless tool’?

That’s a nice aspiration, but Ladish’s point is exactly that this won’t remain true.

More and more I view objections to AI risk as being rooted in not believing in the underlying technologies, rather than an actual functioning disagreement. And objections to the terminology and metaphors used being for the same reason: The terminology and metaphors imply that AGI and agents worthy of those names are coming, whereas objectors only believe in ATI (artificial tool inheritance).

Thus I attempt to coin the term ATI: Artificial Tool Intelligence.

Definition: Artificial Tool Intelligence. An intelligent system incapable of functioning as the core of a de facto autonomous agent.

If we were to only ever build ATIs, then that would solve most of our bigger worries.

That is a lot easier said than done.

Keegan McBride makes case that open source AI is vital for national security, because ‘Whoever builds, maintains, or controls the global open source AI ecosystem will have a powerful influence on our shared digital future.’

Toad: But our rivals can copy the open source models and modify them.

Frog: That is true. But that will ensure our cultural dominance, somehow?

Toad then noticed he was confused.

The post is filled with claims about China’s pending AI ascendancy, and to defend against that she says we need to open source our AIs.

I do give Keegan full credit for rhetorical innovation on that one.

It would be really great if we could know Anthropic was worthy of our trust.

  1. We know that Anthropic has cultivated a culture of caring deeply about safety, especially existential safety, among its employees. I know a number of its employees who have sent costly signals that they deeply care.

  2. We know that Anthropic is taking the problems far more seriously than its competitors, and investing more heavily in safety work.

  3. We know that Anthropic at least thinks somewhat about whether its actions will raise or lower the probability that AI kills everyone when it makes its decisions.

  4. We know they have the long term benefit trust and are a public benefit corporation.

  5. No, seriously, have you seen the other guy?

I have. It isn’t pretty.

Alas, the failure of your main rival to live up to ‘ordinary corporation’ standards does not change the bar of success. If Anthropic is also not up to the task, or not worthy of trust, then that is that.

I have said, for a while now, that I am confused about Anthropic. I expect to continue to be confused, because they are not making this easy.

Anthropic has a principle of mostly not communicating much, including on safety, and being extremely careful when it does communicate.

This is understandable. As their employees have said, there is a large tendency of people to read into statements, to think they are stronger or different than they are, that they make commitments the statement does not make. The situation is changing rapidly, so what seemed wise before might not be wise now. People and companies can and should change their minds. Stepping into such discussions often enflames them, making the problem worse, people want endless follow-ups, it is not a discussion you want to focus on. Talking about what the thing you are doing can endanger your ability to do the thing. Again, I get it.

Still? They are not making this easy. The plan might be wise, but the price must be paid. You go to update with the evidence you have. Failure to send costly signals is evidence, even if your actions plausibly make sense in a lot of different worlds.

What exactly did Anthropic promise or imply around not improving the state of the art? What exactly did they say to Dustin Moskovitz on this? Anthropic passed on releasing the initial Claude, but then did ship Claude Opus, and before that the first 100k context window.

To what extent is Anthropic the kind of actor who will work to give you an impression that suits its needs without that impacting its ultimate decisions? What should we make of their recent investor deck?

What public commitments has Anthropic actually made going forward? How could we hold them accountable? They have committed to their RSP, but most of it can be changed via procedure. Beyond that, not clear there is much. Will the benefit trust in practice have much effect especially in light of recent board changes?

What is up with Anthropic’s public communications?

Once again this week, we saw Anthropic’s public communications lead come out warning about overregulation, in ways I expect to help move the Overton window away from the things that are likely going to become necessary.

Simeon: Anthropic policy lead now advocating against AI regulation. What a surprise for an AGI lab 🤯

If you work at Anthropic for safety reasons, consider leaving.

That is Simeon’s reaction to a highly interesting retrospective by Jack Clark.

The lookback at GPT-2 and decisions around its release seems insightful. They correctly foresaw problems, and correctly saw the need to move off of the track of free academic release of models. Of course that GPT-2 was entirely harmless because it lacked sufficient capabilities, and in hindsight that seems very obvious, and part of the point is that it is hard to tell in advance. Here they ‘missed high’ but one could as easily ‘miss low.’

Then comes the part about policy. Here is the part being quoted, in context, plus key other passages.

Jack Clark: I’ve come to believe that in policy “a little goes a long way” – it’s far better to have a couple of ideas you think are robustly good in all futures and advocate for those than make a confident bet on ideas custom-designed for one specific future – especially if it’s based on a very confident risk model that sits at some unknowable point in front of you.

Additionally, the more risk-oriented you make your policy proposal, the more you tend to assign a huge amount of power to some regulatory entity – and history shows that once we assign power to governments, they’re loathe to subsequently give that power back to the people. Policy is a ratchet and things tend to accrete over time. That means whatever power we assign governments today represents the floor of their power in the future – so we should be extremely cautious in assigning them power because I guarantee we will not be able to take it back. 

For this reason, I’ve found myself increasingly at odds with some of the ideas being thrown around in AI policy circles, like those relating to needing a license to develop AI systems; ones that seek to make it harder and more expensive for people to deploy large-scale open source AI models; shutting down AI development worldwide for some period of time; the creation of net-new government or state-level bureaucracies to create compliance barriers to deployment.

Yes, you think the future is on the line and you want to create an army to save the future. But have you considered that your actions naturally create and equip an army from the present that seeks to fight for its rights?

Is there anything I’m still confident about? Yes. I hate to seem like a single-issue voter, but I had forgotten that in the GPT-2 post we wrote “we also think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems.” I remain confident this is a good idea!

This is at core not that different from my underlying perspective. Certainly it is thoughtful. Right now what we need most is to create broader visibility into what these systems are capable of, and to create the institutional capacity such that if we need to intervene in the future, we can do that.

Indeed, I have spoken how I feel proposals such as those in the Gladstone Report go too far, and would indeed carry exactly these risks. I draw a sharp contrast between that and something like SB 1047. I dive into the details to try and punch them up.

It still seems hard not to notice the vibes. This is written in a way that comes across as a warning against regulation. Coming across is what such communications are about. If this were an isolated example it would not bother me so much, but I see this consistently from Anthropic. If you are going to warn against overreach without laying out the stakes or pushing for proper reach, repeatedly, one notices.

Anthropic’s private lobbying and other private actions clearly happens and hopefully sings a very different tune, but we have no way of knowing.

Also, Anthropic failed to publicly share Claude Opus with the UK in advance, while Google did publicly share Gemini updates in advance. No commitments were broken, but this seems like a key place where it is important to set a good example. A key part of Anthropic’s thesis is that they will create a ‘race to safety’ so let’s race.

I consider Simeon’s reaction far too extreme. If you are internal, or considering becoming internal, you have more information. You should form your own opinion.

A nice positive detail: Anthropic has an anonymous hotline for reporting RSP compliance concerns. Of course, that only matters if they then act.

The Rand report on securing model weights is out.

Ideally this will become its own post in the future. It is super important that we secure the model weights of future more capable systems from a wide variety of potential threats.

As the value at stake goes up, the attacks get stronger, and so too must defenses.

The core message is that there is no silver bullet, no cheap and simple solution. There are instead many strategies to improve security via defense in depth, which will require real investment over the coming years.

Companies should want to do this on their own. Not investing enough in security makes you a target, and your extremely expensive model gets stolen. Even if there are no national security concerns or existential risks, that is not good for business.

That still makes it the kind of threat companies systematically underinvest in. It looks like a big expense until it looks cheap in hindsight. Failure is bad for business, but potentially far far worse for the world.

Thus, this is a place where government needs to step in, both to require and to assist. It is an unacceptable national security situation, if nothing else, for OpenAI, Google or Anthropic (or in the future certain others) not to secure their model weights. Mostly government ‘help’ is not something an AI lab will want, but cybersecurity is a potential exception.

For most people, all you need to take away is the simple ‘we need to do expensive defense in depth to protect model weights, we are not currently doing enough, and we should take collective action as needed to ensure this happens.’

There are highly valid reasons to oppose many other safety measures. There are even arguments that we should openly release the weights of various systems, now or in the future, once the developers are ready to do that.

There are not valid reasons to let bad actors exclusively get their hands on frontier closed model weights by using cyberattacks.

At minimum, you need to agree on what that means.

Will Depue: Alignment people have forgotten that the main goal of ai safety is to build systems that are aligned to the intent of the user, not the intent of the creators. this is a far easier problem.

I have noticed others calling this ‘user alignment,’ and so far that has gone well. I worry people will think this means aligning the user, but ‘alignment to the user’ is clunky.

For current models, ‘user alignment’ is indeed somewhat easier, although still not all that easy. And no, you cannot actually provide a commercial product that does exactly what the user wants. So you need to do a dance of both and do so increasingly over time.

The ‘alignment people’ are looking forward to future more capable systems, where user alignment will be increasingly insufficient.

Looking at Will’s further statements, this is very clearly a case of ‘mere tool.’ Will Depue does not expect AGI, rather he expects AI to remain a tool.

It was interesting to see Ted Sanders and Joshua Achiam, both at OpenAI, push back.

In addition to knowing what you want, you need to be able to know if you found it.

Daniel Kang claims that the GPT-4 system card was wrong, and that AI agent teams based on GPT-4 can now find and exploit zero-day vulnerabilities, his new version scoring 50% on his test versus 20% for previous agents and 0% for open-source vulnerability scanners. They haven’t tested Claude Opus or Gemini 1.5 yet.

I won’t read the details because triage, but the key facts to understand are that the agent frameworks will improve over time even if your system does not, and that it is extremely difficult to prove a negative. I can prove that your system can exploit zero day exploits by showing it exploiting a zero day exploit. You cannot prove that your system cannot do that simply by saying ‘I tried and it didn’t work,’ even if you gave it your best with the best agents you know about. You can of course often say that a given task is far outside of anything a model could plausibly do, but this was not one of those cases.

I do not think we have a practical problem in this particular case. Not yet. But agent system designs are improving behind the scenes, and some odd things are going to happen once GPT-5 drops.

Also, here we have DeepMind’s Nicholas Carlini once again breaks proposed AI defense techniques, here Sabre via changing one line of buggy code, then when the authors respond with a new strategy by modifying one more line of code. This thread has more context.

Analysis and notes of caution on Anthropic’s Scaling Monosemanticity (the Golden Gate Bridge) paper. We can be both super happy the paper happened, while also noticing that a lot of people are overreacting to it.

OpenAI gives us its early version of the SAE paper (e.g. the Golden Gate Bridge), searching for 16 million features in GPT-4, and claim their method scales better than previous work. Paper is here, Leo Gao is lead and coauthors include Sutskever and Leike. Not looking further because triage, so someone else please evaluate how we should update on this in light of Anthropic’s work.

Handy lists of various p(doom) numbers (pause AI, from the superforecasters and general surveys).

CAIS statement gets either ‘strongly agree’ or ‘agree’ from over 40% of Harvard students. Taking an AI class correlated with this being modestly higher, although I would guess causation mostly runs the other way.

Gabriel Wu: Students who have taken a class on AI were more likely to be worried about extinction risks from AI and had shorter “AGI timelines”: around half of all Harvard students who have studied artificial intelligence believe AI will be as capable as humans within 30 years.

Over half of Harvard students say that AI is changing the way they think about their careers, and almost half of them are worried that their careers will be negatively affected by AI.

How do automation concerns differ by industry? There’s isn’t much variation: around 40-50% of students are worried about AI automation no matter what sector they plan on working in (tech, education, finance, politics, research, consulting), with the exception of public health.

Full report is here.

Yann LeCun having strange beliefs department, in this case that ‘it is much easier to investigate what goes on in a deep learning system than in a turbojet, whether theoretically or experimentally.’ Judea Pearl explains it is the other way, whereas I would have simply said: What?

We also have the Yann LeCun providing unfortunate supporting links department.

Elon Musk is not always right, but when he’s right, he’s right.

Eliezer Yudkowsky: Very online people repeating each other: Eliezer Yudkowsky is a cult leader with a legion of brainwashed followers who obey his every word.

Real life: I wore this to LessOnline and Ozy Frantz stole my hat.

I do not have any such NDAs.

The last line is actually ‘member of an implicit coalition whose members coordinate to reward those who reward those who act to aid power and to prevent the creation of clarity around any and all topics including who may or may not have any form of NDA.’

Eternal September means the freshman philosophy beatings will continue.

I do note, however, that morale has slightly improved.

Say whatever else you want about e/acc. They will help you dunk.

Last week I had dinner with a group that included Emmett Shear, he made various claims of this type, and… well, he did not convince me of anything and I don’t think I convinced him of much either, but it was an interesting night. I was perhaps too sober.

Truth and reconciliation.

Indeed, someone is highly underpaid.

It is a no-good very-bad chart in so many other ways, but yeah, wow.

Updating a classic.

Narrator: They did not learn.

AI #67: Brief Strange Trip Read More »

us-agencies-to-probe-ai-dominance-of-nvidia,-microsoft,-and-openai

US agencies to probe AI dominance of Nvidia, Microsoft, and OpenAI

AI Antitrust —

DOJ to probe Nvidia while FTC takes lead in investigating Microsoft and OpenAI.

A large Nvidia logo at a conference hall

Enlarge / Nvidia logo at Impact 2024 event in Poznan, Poland on May 16, 2024.

Getty Images | NurPhoto

The US Justice Department and Federal Trade Commission reportedly plan investigations into whether Nvidia, Microsoft, and OpenAI are snuffing out competition in artificial intelligence technology.

The agencies struck a deal on how to divide up the investigations, The New York Times reported yesterday. Under this deal, the Justice Department will take the lead role in investigating Nvidia’s behavior while the FTC will take the lead in investigating Microsoft and OpenAI.

The agencies’ agreement “allows them to proceed with antitrust investigations into the dominant roles that Microsoft, OpenAI, and Nvidia play in the artificial intelligence industry, in the strongest sign of how regulatory scrutiny into the powerful technology has escalated,” the NYT wrote.

One potential area of investigation is Nvidia’s chip dominance, “including how the company’s software locks customers into using its chips, as well as how Nvidia distributes those chips to customers,” the report said. An Nvidia spokesperson declined to comment when contacted by Ars today.

High-end GPUs are “scarce,” antitrust chief says

Jonathan Kanter, the assistant attorney general in charge of the DOJ’s antitrust division, discussed the agency’s plans in an interview with the Financial Times this week. Kanter said the DOJ is examining “monopoly choke points and the competitive landscape” in AI.

The DOJ’s examination of the sector encompasses “everything from computing power and the data used to train large language models, to cloud service providers, engineering talent and access to essential hardware such as graphics processing unit chips,” the FT wrote.

Kanter said regulators are worried that AI is “at the high-water mark of competition, not the floor” and want to take action before smaller competitors are shut out of the market. The GPUs needed to train large language models are a “scarce resource,” he was quoted as saying.

“Sometimes the most meaningful intervention is when the intervention is in real time,” Kanter told the Financial Times. “The beauty of that is you can be less invasive.”

Microsoft deal scrutinized

The FTC is scrutinizing Microsoft over a March 2024 move in which it hired the CEO of artificial intelligence startup Inflection and most of the company’s staff and paid Inflection $650 million as part of a licensing deal to resell its technology. The FTC is investigating whether Microsoft structured the deal “to avoid a government antitrust review of the transaction,” The Wall Street Journal reported today.

“Companies are required to report acquisitions valued at more than $119 million to federal antitrust-enforcement agencies, which have the option to investigate a deal’s impact on competition,” the WSJ wrote. The FTC reportedly sent subpoenas to Microsoft and Inflection in an attempt “to determine whether Microsoft crafted a deal that would give it control of Inflection but also dodge FTC review of the transaction.”

Inflection built a large language model and a chatbot called Pi. Former Inflection employees are now working on Microsoft’s Copilot chatbot.

“If the agency finds that Microsoft should have reported and sought government review of its deal with Inflection, the FTC could bring an enforcement action against Microsoft,” the WSJ report said. “Officials could ask a court to fine Microsoft and suspend the transaction while the FTC conducts a full-scale investigation of the deal’s impact on competition.”

Microsoft told the WSJ that it complied with antitrust laws, that Inflection continues to operate independently, and that the deals gave Microsoft “the opportunity to recruit individuals at Inflection AI and build a team capable of accelerating Microsoft Copilot.”

OpenAI

Microsoft’s investment in OpenAI has also faced regulatory scrutiny, particularly in Europe. Microsoft has a profit-sharing agreement with OpenAI.

Microsoft President Brad Smith defended the partnership in comments to the Financial Times this week. “The partnerships that we’re pursuing have demonstrably added competition to the marketplace,” Smith was quoted as saying. “I might argue that Microsoft’s partnership with OpenAI has created this new AI market,” and that OpenAI “would not have been able to train or deploy its models” without Microsoft’s help, he said.

We contacted OpenAI today and will update this article if it provides any comment.

In January 2024, the FTC launched an inquiry into AI-related investments and partnerships involving Alphabet, Amazon, Anthropic, Microsoft, and OpenAI.

The FTC also started a separate investigation into OpenAI last year. A civil investigative demand sent to OpenAI focused on potentially unfair or deceptive privacy and data security practices, and “risks of harm to consumers, including reputational harm.” The probe focused partly on “generation of harmful or misleading content.”

US agencies to probe AI dominance of Nvidia, Microsoft, and OpenAI Read More »

sony-removes-still-unmet-“8k”-promise-from-ps5-packaging

Sony removes still-unmet “8K” promise from PS5 packaging

8K? We never said 8K! —

Move could presage an expected resolution bump in the rumored PS5 Pro.

  • The new PS5 packaging, as seen on the PlayStation Direct online store, is missing the “8K” label in the corner.

  • The original PS5 packaging with the 8K label, as still seen on the GameStop website.

When we first received our PlayStation 5 review unit from Sony in 2020, we reacted with some bemusement to the “8K” logo on the box and its implied promise of full 7630×4320 resolution output. We then promptly forgot all about it since native 8K content and 8K compatible TVs have remained a relative curiosity thus far in the PS5’s lifespan.

But on Wednesday, Digital Foundry’s John Linneman discovered that Sony has quietly removed that longstanding 8K label from the PS5 box. The ultra-high-resolution promise no longer appears on the packaging shown on Sony’s official PlayStation Direct store, a change that appears to have happened between late January and mid-February, according to Internet Archive captures of the store page (the old “8K” box can still be seen at other online retailers, though).

A promise deferred

This packaging change has been a long time coming since the PS5 hasn’t technically been living up to its 8K promise for years now. While Sony’s Mark Cerny mentioned the then-upcoming hardware’s 8K support in a 2019 interview, the system eventually launched with a pretty big “coming soon” caveat for that feature. “PS5 is compatible with 8K displays at launch, and after a future system software update will be able to output resolutions up to 8K when content is available, with supported software,” the company said in an FAQ surrounding the console’s 2020 launch.

Well over three years later, that 8K-enabling software update has yet to appear, meaning the console’s technical ability to push 8K graphics is still a practical impossibility for users. Until Sony’s long-promised software patch hits, even PS5 games that render frames internally at a full 8K resolution are still pushing out a downscaled 4K framebuffer through that HDMI 2.1 cable.

A slide from TV manufacturer TCL guesses at some details for the next micro-generation of high-end game consoles.

Enlarge / A slide from TV manufacturer TCL guesses at some details for the next micro-generation of high-end game consoles.

At this point, though, there’s some reason to expect that the promised patch may never come to the standard PS5. At the moment, the ever-churning rumor mill is expecting an impending mid-generation PS5 Pro upgrade that could offer true, native 8K resolution support right out of the box. If that comes to pass, removing the outdated “8K” promise from the original PS5 packaging could be a subtle way to highlight the additional power of the upcoming “Pro” upgrade.

A slight majority of participants in a double-blind study saw no discernible difference between a 4K and 8K image video clips.

Enlarge / A slight majority of participants in a double-blind study saw no discernible difference between a 4K and 8K image video clips.

So will console gamers be missing out if they don’t upgrade to an 8K-compatible display? Probably not, as studies show extremely diminishing returns in the perceived quality jump from 4K to 8K visual content for most users and living room setups. Unless you are sitting extremely close to an extremely large display, it’s pretty unlikely you’ll even be able to tell the difference.

Sony removes still-unmet “8K” promise from PS5 packaging Read More »

duckduckgo-offers-“anonymous”-access-to-ai-chatbots-through-new-service

DuckDuckGo offers “anonymous” access to AI chatbots through new service

anonymous confabulations —

DDG offers LLMs from OpenAI, Anthropic, Meta, and Mistral for factually-iffy conversations.

DuckDuckGo's AI Chat promotional image.

DuckDuckGo

On Thursday, DuckDuckGo unveiled a new “AI Chat” service that allows users to converse with four mid-range large language models (LLMs) from OpenAI, Anthropic, Meta, and Mistral in an interface similar to ChatGPT while attempting to preserve privacy and anonymity. While the AI models involved can output inaccurate information readily, the site allows users to test different mid-range LLMs without having to install anything or sign up for an account.

DuckDuckGo’s AI Chat currently features access to OpenAI’s GPT-3.5 Turbo, Anthropic’s Claude 3 Haiku, and two open source models, Meta’s Llama 3 and Mistral’s Mixtral 8x7B. The service is currently free to use within daily limits. Users can access AI Chat through the DuckDuckGo search engine, direct links to the site, or by using “!ai” or “!chat” shortcuts in the search field. AI Chat can also be disabled in the site’s settings for users with accounts.

According to DuckDuckGo, chats on the service are anonymized, with metadata and IP address removed to prevent tracing back to individuals. The company states that chats are not used for AI model training, citing its privacy policy and terms of use.

“We have agreements in place with all model providers to ensure that any saved chats are completely deleted by the providers within 30 days,” says DuckDuckGo, “and that none of the chats made on our platform can be used to train or improve the models.”

An example of DuckDuckGo AI Chat with GPT-3.5 answering a silly question in an inaccurate way.

Enlarge / An example of DuckDuckGo AI Chat with GPT-3.5 answering a silly question in an inaccurate way.

Benj Edwards

However, the privacy experience is not bulletproof because, in the case of GPT-3.5 and Claude Haiku, DuckDuckGo is required to send a user’s inputs to remote servers for processing over the Internet. Given certain inputs (i.e., “Hey, GPT, my name is Bob, and I live on Main Street, and I just murdered Bill”), a user could still potentially be identified if such an extreme need arose.

While the service appears to work well for us, there’s a question about its utility. For example, while GPT-3.5 initially wowed people when it launched with ChatGPT in 2022, it also confabulated a lot—and it still does. GPT-4 was the first major LLM to get confabulations under control to a point where the bot became more reasonably useful for some tasks (though this itself is a controversial point), but that more capable model isn’t present in DuckDuckGo’s AI Chat. Also missing are similar GPT-4-level models like Claude Opus or Google’s Gemini Ultra, likely because they are far more expensive to run. DuckDuckGo says it may roll out paid plans in the future, and those may include higher daily usage limits or access to “more advanced models.”)

It’s true that the other three models generally (and subjectively) pass GPT-3.5 in capability for coding with lower hallucinations, but they can still make things up, too. With DuckDuckGo AI Chat as it stands, the company is left with a chatbot novelty with a decent interface and the promise that your conversations with it will remain private. But what use are fully private AI conversations if they are full of errors?

Mixtral 8x7B on DuckDuckGo AI Chat when asked about the author. Everything in red boxes is sadly incorrect, but it provides an interesting fantasy scenario. It's a good example of an LLM plausibly filling gaps between concepts that are underrepresented in its training data, called confabulation. For the record, Llama 3 gives a more accurate answer.

Enlarge / Mixtral 8x7B on DuckDuckGo AI Chat when asked about the author. Everything in red boxes is sadly incorrect, but it provides an interesting fantasy scenario. It’s a good example of an LLM plausibly filling gaps between concepts that are underrepresented in its training data, called confabulation. For the record, Llama 3 gives a more accurate answer.

Benj Edwards

As DuckDuckGo itself states in its privacy policy, “By its very nature, AI Chat generates text with limited information. As such, Outputs that appear complete or accurate because of their detail or specificity may not be. For example, AI Chat cannot dynamically retrieve information and so Outputs may be outdated. You should not rely on any Output without verifying its contents using other sources, especially for professional advice (like medical, financial, or legal advice).”

So, have fun talking to bots, but tread carefully. They’ll easily “lie” to your face because they don’t understand what they are saying and are tuned to output statistically plausible information, not factual references.

DuckDuckGo offers “anonymous” access to AI chatbots through new service Read More »

radio-telescope-finds-another-mystery-long-repeat-source

Radio telescope finds another mystery long-repeat source

File under W for WTF —

Unlike earlier object, the new source’s pulses of radio waves are erratic.

Image of a purple, glowing sphere with straight purple-white lines emerging from opposite sides, all against a black background.

Enlarge / A slowly rotating neutron star is still our best guess as to the source of the mystery signals.

Roughly a year ago, astronomers announced that they had observed an object that shouldn’t exist. Like a pulsar, it emitted regularly timed bursts of radio emissions. But unlike a pulsar, those bursts were separated by over 20 minutes. If the 22 minute gap between bursts represents the rotation period of the object, then it is rotating too slowly to produce radio emissions by any known mechanism.

Now, some of the same team (along with new collaborators) are back with the discovery of something that, if anything, is acting even more oddly. The new source of radio bursts, ASKAP J193505.1+214841.0, takes nearly an hour between bursts. And it appears to have three different settings, sometimes producing weaker bursts and sometimes skipping them entirely. While the researchers suspect that, like pulsars, this is also powered by a neutron star, it’s not even clear that it’s the same class of object as their earlier discovery.

How pulsars pulse

Contrary to the section heading, pulsars don’t actually pulse. Neutron stars can create the illusion by having magnetic poles that aren’t lined up with their rotational pole. The magnetic poles are a source of constant radio emissions but, as the neutron star rotates, the emissions from the magnetic pole sweep across space in a manner similar to the light from a rotating lighthouse. If Earth happens to be caught up in that sweep, then the neutron star will appear to blink on and off as it rotates.

The star’s rotation is also needed for the generation of radio emissions themselves. If the neutron star rotates too slowly, then its magnetic field won’t be strong enough to produce radio emissions. So, it’s thought that if a pulsar’s rotation slows down enough (causing its pulses to be separated by too much time), it will simply shut down, and we’ll stop observing any radio emissions from the object.

We don’t have a clear idea of how long the time between pulses can get before a pulsar will shut down. But we do know that it’s going to be far less than 22 minutes.

Which is why the 2023 discovery was so strange. The object, GPM J1839–10, not only took a long time between pulses, but archival images showed that it had been pulsing on and off since at least 35 years ago.

To figure out what is going on, we really have two options. One is more and better observations of the source we know about. The second is to find other examples of similar behavior. There’s a chance we now have a second object like this, although there are enough differences that it’s not entirely clear.

An enigmatic find

The object, ASKAPJ193505.1+214841.0, was discovered by accident when the Australian Square Kilometre Array Pathfinder telescope was used to perform observations in the area due to detections of a gamma ray burst. It picked up a bright radio burst in the same field of view, but unrelated to the gamma ray burst. Further radio bursts showed up in later observations, as did a few far weaker bursts. A search of the telescope’s archives also spotted a weaker burst from the same location.

Checking the timing of the radio bursts, the team found that they could be explained by an object that emitted bursts every 54 hours, with bursts lasting from 10 seconds to just under a minute. Checking additional observations, however, showed that there were often instances where a 54 minute period would not end with a radio burst, suggesting the source sometimes skipped radio emissions entirely.

Odder still, the photons in the strong and weak bursts appeared to have different polarizations. These differences arise from the magnetic fields present where the bursts originate, suggesting that the two types of bursts differ not only in total energy, but also that the object that’s making them has a different magnetic field.

So, the researchers suggest that the object has three modes: strong pulses, faint pulses, and an off mode, although they can’t rule out the off mode producing weak radio signals that are below the detection capabilities of the telescopes we’re using. Over about eight months of sporadic observations, there’s no apparent pattern to the bursts.

What is this thing?

Checks at other wavelengths indicate there’s a magnetar and a supernova remnant in the vicinity of the mystery object, but not at the same location. There’s also a nearby brown dwarf at that point in the sky, but they strongly suspect that’s just a chance overlap. So, none of that tells us more about what produces these erratic bursts.

As with the earlier find, there seem to be two possible explanations for the ASKAP source. One is a neutron star that’s still managing to emit radiofrequency radiation from its poles despite rotating extremely slowly. The second is a white dwarf that has a reasonable rotation period but an unreasonably strong magnetic field.

To get at this issue, the researchers estimate the strength of the magnetic field needed to produce the larger bursts and come up with a value that’s significantly higher than any previously observed to originate on a white dwarf. So they strongly argue for the source being a neutron star. Whether that argues for the earlier source being a neutron star will depend on whether you feel that the two objects represent a single phenomenon despite their somewhat different behaviors.

In any case, we now have two of these mystery slow-repeat objects to explain. It’s possible that we’ll be able to learn more about this newer one if we can get some information as to what’s involved in its mode switching. But then we’ll have to figure out if what we learn applies to the one we discovered earlier.

Nature Astronomy, 2024. DOI: 10.1038/s41550-024-02277-w  (About DOIs).

Radio telescope finds another mystery long-repeat source Read More »

toyota-tests-liquid-hydrogen-burning-corolla-in-another-24-hour-race

Toyota tests liquid hydrogen-burning Corolla in another 24-hour race

yep, still at it —

The experience has taught it how to improve thermal efficiency, Toyota says.

A Toyota GR Corolla race car

Enlarge / “It got more attention than last year, and the development feels steadier, faster, and safer,” said Toyota Chairman Akio Toyoda when asked how the hydrogen-powered Corolla had improved from 2023.

Toyota

A couple of weekends ago, when most of the world’s motorsport attention was focused on Monaco and Indianapolis, Toyota President Akio “Morizo” Toyoda was taking part in the Super Taikyu Fuji 24 Hours at Fuji Speedway in Japan. Automotive executives racing their own products is not exactly unheard of, but few instances have been quite as unexpected as competing in endurance races with a hydrogen-burning Corolla.

A hydrogen-powered Toyota has shown up for the past few years, in fact, as the company uses the race track to learn new things about thermal efficiency that it says have benefitted its latest generation of internal-combustion engines, which it debuted to the public at the end of May.

With backing from its government, the Japanese auto industry has continued to explore hydrogen as an alternative vehicle energy source instead of liquid hydrocarbons or batteries. Commercially, that’s been in the form of hydrogen fuel cells, although with very little success among drivers, even in areas that have some hydrogen fueling infrastructure.

But the hydrogen powertrain in the GR Corolla uses an internal combustion engine, not a fuel cell. The project first competed in the 24-hour race at Fuji in 2021, then again with a little more success in 2022.

For 2023, there was a significant change to the car, now fueled by liquid hydrogen, not gaseous. Instead of trying to fill tanks pressurized to 70 MPa (700 bar), now it just has to be cooled to minus-253° C (minus-423° F). Liquid hydrogen has almost twice the energy density—although still only a third as much as gasoline—and the logistics and equipment required to support cryogenic refueling at the racetrack were much less than with pressurized hydrogen.

The new (left) and old (right) liquid hydrogen tanks.

Enlarge / The new (left) and old (right) liquid hydrogen tanks.

Toyota

The liquid hydrogen is stored in a double-walled tank that was much easier to package within the compact interior of the GR Corolla than the four pressurized cylinders it replaced. This year, the tank is 50 percent larger (storing 15 kg of hydrogen) and elliptical, which proved quite an interesting technical challenge for supplier Shinko. The new tank required Toyota to rebuild the car to repackage everything, taking the opportunity to cut 50 kg (110 lbs) of weight in the process.

From the tank, a high-pressure pump injects the fuel into a vaporizer, where it becomes a gas again and then heads to the engine to be burned. Unfortunately, the pump wasn’t so durable in 2023 and had to be replaced twice during the race, costing hours in the process.

For 2024, a revised pump was designed to last the full 24 hours, although during testing, it proved to be the source of a fuel leak, wasting the team’s time while the problem was isolated. Luckily, this was much less severe than when, in 2023, a gaseous hydrogen pipe leak in the engine bay led to a fire at a test.

Sadly, the new fuel pump had intermittent problems actually pumping fuel during the race, most likely due to sloshing in the tank. Later on, an ABS module failure sidelined the car in the garage for five hours, and while the team was able to take the checkered flag, it had completed fewer laps in 2024 than in 2023.

But 24-hour racing is really hard, and the race wasn’t a write-off for Toyota. It achieved its goal of 30-lap stints between refueling, and while the new pump wasn’t problem-free throughout the race (nor had to run for the entire 24 hours), it didn’t need to be replaced once, let alone twice.

2 filter.” height=”654″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/06/20240524_01_05-980×654.jpg” width=”980″>

Enlarge / For 2024, there was an automated system to clean the CO2 filter.

Toyota

I’m still scratching my head slightly about the carbon capture device that’s fitted to the car’s air filter. This adsorbs CO2 out of the air as the car drives, storing it in a small tank. It’s a nice gesture, I guess.

Since starting development of the hydrogen ICE engine, Toyota has found real gains in performance and efficiency, and the switch to liquid hydrogen has cut refueling times by 40 percent. All of those make it more viable as a carbon-free fuel, it says. But the chances of seeing production vehicles that get refueled with liquid hydrogen seem remote to me.

Even though Toyota still has optimism that one day it will be able to sell combustion cars that just emit water from their tailpipes, it’s pragmatic enough to know there needs to be some real-world payoff now beyond that the chairman likes racing and people like to keep him happy.

“Hydrogen engine development has really contributed to our deeper understanding of engine heat efficiency. It was a trigger that brought this technology” Toyota CTO Hiroki Nakajima told Automotive News at the debut of the automaker’s new 1.5 L and 2.0 L four-cylinder engines, which are designed to meet the European Union’s new Euro 7 emissions regulations, which go into effect in 2027.

Toyota tests liquid hydrogen-burning Corolla in another 24-hour race Read More »

canada-demands-5%-of-revenue-from-netflix,-spotify,-and-other-streamers

Canada demands 5% of revenue from Netflix, Spotify, and other streamers

Streaming fees —

Canada says $200M in annual fees will support local news and other content.

Illustrative photo featuring Canadian 1-cent coins with the Canadian flag displayed on a computer screen in the background,

Getty Images | NurPhoto /

Canada has ordered large online streaming services to pay 5 percent of their Canadian revenue to the government in a program expected to raise $200 million per year to support local news and other home-grown content. The Canadian Radio-television and Telecommunications Commission (CRTC) announced its decision yesterday after a public comment period.

“Based on the public record, the CRTC is requiring online streaming services to contribute 5 percent of their Canadian revenues to support the Canadian broadcasting system. These obligations will start in the 2024–2025 broadcast year and will provide an estimated $200 million per year in new funding,” the regulator said.

The fees apply to both video and music streaming services. The CRTC imposed the rules despite opposition from Amazon, Apple, Disney, Google, Netflix, Paramount, and Spotify.

The new fees are scheduled to take effect in September and apply to online streaming services that make at least $25 million a year in Canada. The regulations exclude revenue from audiobooks, podcasts, video game services, and user-generated content. The exclusion of revenue from user-generated content is a win for Google’s YouTube.

Streaming companies have recently been raising prices charged to consumers, and the CBC notes that streamers might raise prices again to offset the fees charged in Canada.

Fees to support local news, Indigenous content

The CRTC said it is relying on authority from the Online Streaming Act, which was approved by Canada’s parliament in 2023. The new fees are similar to the ones already imposed on licensed broadcasters.

“The funding will be directed to areas of immediate need in the Canadian broadcasting system, such as local news on radio and television, French-language content, Indigenous content, and content created by and for equity-deserving communities, official language minority communities, and Canadians of diverse backgrounds,” the CRTC said.

CRTC Chairperson Vicky Eatrides said the agency’s “decision will help ensure that online streaming services make meaningful contributions to Canadian and Indigenous content.” The agency also said that streaming companies “will have some flexibility to direct parts of their contributions to support Canadian television content directly.”

Industry groups blast CRTC

The Motion Picture Association-Canada criticized the CRTC yesterday, saying the fee ruling “reinforces a decades-old regulatory approach designed for cable companies” and is “discriminatory.” The fees “will make it harder for global streamers to collaborate directly with Canadian creatives and invest in world-class storytelling made in Canada for audiences here and around the world,” the lobby group said.

The MPA-Canada said the CRTC didn’t fully consider “the significant contributions streamers make in working directly with Canada’s creative communities.” The group represents streamers including Netflix, Disney Plus, HAYU, Sony’s Crunchyroll, Paramount Plus, and PlutoTV.

“Global studios and streaming services have spent over $6.7 billion annually producing quality content in Canada for local and international audiences and invested more in the content made by Canadian production companies last year than the CBC, or the Canada Media Fund and Telefilm combined,” the group said.

The fees were also criticized by the Digital Media Association, which represents streaming music providers including Amazon Music, Apple Music, and Spotify. The “discriminatory tax on music streaming services… is effectively a protectionist subsidy for radio” and may worsen “Canada’s affordability crisis,” the group said.

The Canadian Media Producers Association praised the CRTC decision, saying the decision benefits independent producers and “tilts our industry toward a more level playing field.”

Canada demands 5% of revenue from Netflix, Spotify, and other streamers Read More »

china’s-plan-to-dominate-ev-sales-around-the-world

China’s plan to dominate EV sales around the world

China’s plan to dominate EV sales around the world

FT montage/Getty Images

The resurrection of a car plant in Brazil’s poor northeast stands as a symbol of China’s global advance—and the West’s retreat.

BYD, the Shenzhen-based conglomerate, has taken over an old Ford factory in Camaçari, which was abandoned by the American automaker nearly a century after Henry Ford first set up operations in Brazil.

When Luiz Inácio Lula da Silva, Brazil’s president, visited China last year, he met BYD’s billionaire founder and chair, Wang Chuanfu. After that meeting, BYD picked the country for its first carmaking hub outside of Asia.

Under a $1 billion-plus investment plan, BYD intends to start producing electric and hybrid automobiles this year at the site in Bahia state, which will also manufacture bus and truck chassis and process battery materials.

The new Brazil plant is no outlier—it falls into a wave of corporate Chinese investment in electric vehicle manufacturing supply chains in the world’s most important developing economies.

Financial Times

The inadvertent result of rising protectionism in the US and Europe could be to drive many emerging markets into China’s hands.

Last month, Joe Biden issued a new broadside against Beijing’s deep financial support of Chinese industry as he unveiled sweeping new tariffs on a range of cleantech products—most notably, a 100 percent tariff on electric vehicles. “It’s not competition. It’s cheating. And we’ve seen the damage here in America,” Biden said.

The measures were partly aimed at boosting Biden’s chances in his presidential battle with Donald Trump. But the tariffs, paired with rising restrictions on Chinese investment on American soil, will have an immense impact on the global auto market, in effect shutting China’s world-leading EV makers out of the world’s biggest economy.

The EU’s own anti-subsidy investigation into Chinese electric cars is expected to conclude next week as Brussels tries to protect European carmakers by stemming the flow of low-cost Chinese electric vehicles into the bloc.

Government officials, executives, and experts say that the series of new cleantech tariffs issued by Washington and Brussels are forcing China’s leading players to sharpen their focus on markets in the rest of the world.

This, they argue, will lead to Chinese dominance across the world’s most important emerging markets, including Southeast Asia, Latin America, and the Middle East and the remaining Western economies that are less protectionist than the US and Europe.

“That is the part that seems to be lost in this whole discussion of ‘can we raise some tariffs and slow down the Chinese advance.’ That’s only defending your homeland. That’s leaving everything else open,” says Bill Russo, the former head of Chrysler in Asia and founder of Automobility, a Shanghai consultancy.

“Those markets are in play and China is aggressively going after those markets.”

China’s plan to dominate EV sales around the world Read More »

microsoft-to-test-“new-features-and-more”-for-aging,-stubbornly-popular-windows-10

Microsoft to test “new features and more” for aging, stubbornly popular Windows 10

but the clock is still ticking —

Support ends next year, but Windows 10 remains the most-used version of the OS.

Microsoft to test “new features and more” for aging, stubbornly popular Windows 10

Microsoft

In October 2025, Microsoft will stop supporting Windows 10 for most PC users, which means no more technical support and (crucially) no more security updates unless you decide to pay for them. To encourage adoption, the vast majority of new Windows development is happening in Windows 11, which will get one of its biggest updates since release sometime this fall.

But Windows 10 is casting a long shadow. It remains the most-used version of Windows by all publicly available metrics, including Statcounter (where Windows 11’s growth has been largely stagnant all year) and the Steam Hardware Survey. And last November, Microsoft decided to release a fairly major batch of Windows 10 updates that introduced the Copilot chatbot and other changes to the aging operating system.

That may not be the end of the road. Microsoft has announced that it is reopening a Windows Insider Beta Channel for PCs still running Windows 10, which will be used to test “new features and more improvements to Windows 10 as needed.” Users can opt into the Windows 10 Beta Channel regardless of whether their PC meets the requirements for Windows 11; if your PC is compatible, signing up for the less-stable Dev or Canary channels will still upgrade your PC to Windows 11.

Any new Windows 10 features that are released will be added to Windows 10 22H2, the operating system’s last major yearly update. Per usual for Windows Insider builds, Microsoft may choose not to release all new features that it tests, and new features will be released for the public version of Windows 10 “when they’re ready.”

One thing this new beta program doesn’t change is the end-of-support date for Windows 10, which Microsoft says is still October 14, 2025. Microsoft says that joining the beta program doesn’t extend support. The only way to continue getting Windows 10 security updates past 2025 is to pay for the Extended Security Updates (ESU) program; Microsoft plans to offer these updates to individual users but still hasn’t announced pricing for individuals. Businesses will pay as much as $61 per PC for the first year of updates, while schools will pay as little as $1 per PC.

Beta program or no, we still wouldn’t expect Windows 10 to change dramatically between now and its end-of-support date. We’d guess that most changes will relate to the Copilot assistant, given how aggressively Microsoft has moved to add generative AI to all of its products. For example, the Windows 11 version of Copilot is shedding its “preview” tag and becoming an app that runs in a regular window rather than a persistent sidebar, changes Microsoft could also choose to implement in Windows 10.

Microsoft to test “new features and more” for aging, stubbornly popular Windows 10 Read More »

Could Network Modeling Replace Observability?

Over the past four years, I’ve consolidated a representative list of network observability vendors, but have not yet considered any modeling-based solutions. That changed when Forward Networks and NetBrain requested inclusion in the network observability report.

These two vendors have built their products on top of a network modeling technology, and both of them met the report’s table stakes, which meant they qualified for inclusion. In this iteration of the report, the fourth, including these two modeling-based vendors did not have a huge impact. Vendors have shifted around on the Radar chart, but generally speaking, the report is consistent with the third iteration.

However, these modeling solutions are a fresh take on observability, which is a category that has so far been evolving incrementally. While there have been occasional leaps forward, driven by the likes of ML and eBPF, there hasn’t been an overhaul of the whole solution.

I cannot foresee any future version of network observability that does not include some degree of modeling, so I’ve been thinking about the evolution of these technologies, the current vendor landscape, and whether modeling-based products will overtake non-modeling-based observability products.

Even though it’s still early days for modeling-based observability, I want to explore and validate these two ideas:

  • It’s harder for observability-only tools to pivot into modeling than the other way around.
  • Modeling products offer some distinct advantages.

Pivoting to Modeling

The roots of modeling solutions are based in observability—specifically, collecting information about the configuration and state of the network. With this information, these solutions create a digital twin, which can simulate traffic to understand how the network currently behaves or would behave in hypothetical conditions.

Observability tools do not need to simulate traffic to do their job. They can report on near-real time network performance information to provide network operations center (NOC) analysts with the right information to maintain the level of performance. Observability tools can definitely incorporate modeling features (and some solutions already do), but the point here is that they don’t have to.

My understanding of today’s network modeling tools indicates that these solutions cannot yet deliver the same set of features as network observability tools. This is rather expected, as a large percentage of network observability tools have more than three decades of continuous development.

However, when looking at future developments, we need to consider that network modeling tools use proprietary algorithms, which have been developed over a number of years and require a highly specific set of skills. I do not expect that developers and engineers equipped with network modeling skills are readily available in the job market, and these use cases are not as trendy as other topics. For example, AI developers are also in demand, but there’s also going to be a continuous increase in supply over the next few years as younger generations choose to specialize in this subject.

In contrast, modeling tools can tap into existing observability knowledge and mimic a very mature set of products to implement comparable features.

Modeling Advantages

In the vendor questionnaires, I’ve been asking these two questions for a few years:

  • Can the tool correlate changes in network performance with configuration changes?
  • Can the tool learn from the administrator’s decisions and remediation actions to autonomously solve similar incidents or propose resolutions?

The majority of network observability vendors don’t focus on these sorts of features. But the modeling solutions do, and they do so very well.

This list is by no means exhaustive; I’m only highlighting it because I’ve been asking myself whether these sorts of features are out of scope for network observability tools. But this is the first time since I started researching this space that the responses to these questions went from “we sort of do that” to “yes, this is our core strength.”

This leads me to think there is an extensive set of features that can benefit NOC analysts that can be developed on top of the underlying technology, which may very well be network modeling.

Next Steps

Whether modeling tools can displace today’s observability tools is something that remains to be determined. I expect that the answer to this question will lie with the organizations whose business model heavily relies on network performance. If such an organization deploys both an observability and modeling tool, and increasingly favors modeling for observability tasks to the point where they decommission the observability tool, we’ll have a much clearer indication of the direction of the market.

To learn more, take a look at GigaOm’s network observability Key Criteria and Radar reports. These reports provide a comprehensive overview of the market, outline the criteria you’ll want to consider in a purchase decision, and evaluate how a number of vendors perform against those decision criteria.

If you’re not yet a GigaOm subscriber, sign up here.

Could Network Modeling Replace Observability? Read More »

gamestop-stock-influencer-roaring-kitty-may-lose-access-to-e-trade,-report-says

GameStop stock influencer Roaring Kitty may lose access to E-Trade, report says

“I like the stock” —

E-Trade fears restricting influencer’s trading may trigger boycott, sources say.

Keith Gill, known on Reddit under the pseudonym DeepFuckingValue and as Roaring Kitty, is seen on a fragment of a YouTube video.

Enlarge / Keith Gill, known on Reddit under the pseudonym DeepFuckingValue and as Roaring Kitty, is seen on a fragment of a YouTube video.

E-Trade is apparently struggling to balance the risks and rewards of allowing Keith Gill to continue trading volatile meme stocks on its platform, The Wall Street Journal reported.

The meme-stock influencer known as “Roaring Kitty” and “DeepF—Value” is considered legendary for instantly skyrocketing the price of stocks, notably GameStop, most recently with a single tweet.

E-Trade is concerned, according to The Journal’s insider sources, that on the one hand, Gill’s social media posts are potentially illegally manipulating the market—and possibly putting others’ investments at risk. But on the other, the platform worries that restricting Gill’s trading could incite a boycott fueled by his “meme army” closing their accounts “in solidarity.” That could also sharply impact trading on the platform, sources said.

It’s unclear what gamble E-Trade, which is owned by Morgan Stanley, might be willing to make. The platform could decide to take no action at all, the WSJ reported, but through its client agreement has the right to restrict or close Gill’s account “at any time.”

As of late Monday, Gill’s account was still active, the WSJ reported, apparently showing total gains of $85 million over the past three weeks. After Monday’s close, Gill’s GameStop positions “were valued at more than $289 million,” the WSJ reported.

Trading platforms unprepared for Gill’s comeback

In 2021, Gill’s social media activity on Reddit helped drive GameStop stock to historic highs. At that time, Gill encouraged others to invest in the stock—not based on the fundamentals of GameStop business but on his pure love for GameStop. The craze that he helped spark rapidly triggered temporary restrictions on GameStop trading, as well as a congressional hearing, but ultimately there were few consequences for Gill, who disappeared after making at least $30 million, the WSJ reported.

All remained quiet until a few weeks ago when Roaring Kitty suddenly came back. On X (formerly Twitter), Gill posted a meme of a man sitting up in his chair, then blitzed his feed with memes and movie clips, seemingly sending a continual stream of coded messages to his millions of followers who eagerly posted about their trades and gains on Reddit.

“Welcome back, legend,” one follower responded.

Once again, Gill’s abrupt surge in online activity immediately kicked off a GameStop stock craze fueling prices to a spike of more than 60 percent. And once again, because of the stock’s extreme volatility, Gill’s social posts prompted questions from both trading platforms and officials who continue to fret over whether Gill’s online influencing should be considered market manipulation.

For Gill’s biggest fans, the goal is probably to profit as much as possible before the hammer potentially comes down again and trading gets restricted. That started happening late on Sunday night, when it became harder or impossible to purchase GameStop shares on Robinhood, prompting some traders to complain on X.

The WallStreetBets account shared a warning that Robinhood sent to would-be buyers, which showed that trading was being limited, but not by Robinhood. Instead, the platform that facilitates Robinhood’s overnight trading of the stock, Blue Ocean ATS, set the limit, only accepting “trades 20 percent above or below” that day’s reference price—a move designed for legal or compliance reasons to stop trading once the stock exceeds a certain price.

These limits are set, the Securities and Exchange Commission (SEC) noted in 2021, partly to prevent fraudsters from spreading misleading tips online and profiting at the expense of investors from illegal price manipulation. A common form of this fraud is a pump-and-dump scheme, where fraudsters “make false and misleading statements to create a buying frenzy, and then sell shares at the pumped-up price.”

GameStop stock influencer Roaring Kitty may lose access to E-Trade, report says Read More »

google’s-ai-overviews-misunderstand-why-people-use-google

Google’s AI Overviews misunderstand why people use Google

robot hand holding glue bottle over a pizza and tomatoes

Aurich Lawson | Getty Images

Last month, we looked into some of the most incorrect, dangerous, and downright weird answers generated by Google’s new AI Overviews feature. Since then, Google has offered a partial apology/explanation for generating those kinds of results and has reportedly rolled back the feature’s rollout for at least some types of queries.

But the more I’ve thought about that rollout, the more I’ve begun to question the wisdom of Google’s AI-powered search results in the first place. Even when the system doesn’t give obviously wrong results, condensing search results into a neat, compact, AI-generated summary seems like a fundamental misunderstanding of how people use Google in the first place.

Reliability and relevance

When people type a question into the Google search bar, they only sometimes want the kind of basic reference information that can be found on a Wikipedia page or corporate website (or even a Google information snippet). Often, they’re looking for subjective information where there is no one “right” answer: “What are the best Mexican restaurants in Santa Fe?” or “What should I do with my kids on a rainy day?” or “How can I prevent cheese from sliding off my pizza?”

The value of Google has always been in pointing you to the places it thinks are likely to have good answers to those questions. But it’s still up to you, as a user, to figure out which of those sources is the most reliable and relevant to what you need at that moment.

  • This wasn’t funny when the guys at Pep Boys said it, either. (via)

    Kyle Orland / Google

  • Weird Al recommends “running with scissors” as well! (via)

    Kyle Orland / Google

  • This list of steps actually comes from a forum thread response about doing something completely different. (via)

    Kyle Orland / Google

  • An island that’s part of the mainland? (via)

    Kyle Orland / Google

  • If everything’s cheaper now, why does everything seem so expensive?

    Kyle Orland / Google

  • Pretty sure this Truman was never president… (via)

    Kyle Orland / Google

For reliability, any savvy Internet user makes use of countless context clues when judging a random Internet search result. Do you recognize the outlet or the author? Is the information from someone with seeming expertise/professional experience or a random forum poster? Is the site well-designed? Has it been around for a while? Does it cite other sources that you trust, etc.?

But Google also doesn’t know ahead of time which specific result will fit the kind of information you’re looking for. When it comes to restaurants in Santa Fe, for instance, are you in the mood for an authoritative list from a respected newspaper critic or for more off-the-wall suggestions from random locals? Or maybe you scroll down a bit and stumble on a loosely related story about the history of Mexican culinary influences in the city.

One of the unseen strengths of Google’s search algorithm is that the user gets to decide which results are the best for them. As long as there’s something reliable and relevant in those first few pages of results, it doesn’t matter if the other links are “wrong” for that particular search or user.

Google’s AI Overviews misunderstand why people use Google Read More »