Proposed

q&a-on-proposed-sb-1047

Q&A on Proposed SB 1047

Previously: On the Proposed California SB 1047.

Text of the bill is here. It focuses on safety requirements for highly capable AI models.

This is written as an FAQ, tackling all questions or points I saw raised.

Safe & Secure AI Innovation Act also has a description page.

There have been many highly vocal and forceful objections to SB 1047 this week, in reaction to a (disputed and seemingly incorrect) claim that the bill has been ‘fast tracked.’ 

The bill continues to have substantial chance of becoming law according to Manifold, where the market has not moved on recent events. The bill has been referred to two policy committees one of which put out this 38 page analysis

The purpose of this post is to gather and analyze all objections that came to my attention in any way, including all responses to my request for them on Twitter, and to suggest concrete changes that address some real concerns that were identified.

  1. Some are helpful critiques pointing to potential problems, or good questions where we should ensure that my current understanding is correct. In several cases, I suggest concrete changes to the bill as a result. Two are important to fix weaknesses, one is a clear improvement, the others are free actions for clarity.

  2. Some are based on what I strongly believe is a failure to understand how the law works, both in theory and in practice, or a failure to carefully read the bill, or both.

  3. Some are pointing out a fundamental conflict. They want people to have the ability to freely train and release the weights of highly capable future models. Then they notice that it will become impossible to do this while adhering to ordinary safety requirements. They seem to therefore propose to not have safety requirements.

  4. Some are alarmist rhetoric that has little tether to what is in the bill, or how any of this works. I am deeply disappointed in some of those using or sharing such rhetoric.

Throughout such objections, there is little or no acknowledgement of the risks that the bill attempts to mitigate, suggestions of alternative ways to do that, or reasons to believe that such risks are insubstantial even absent required mitigation. To be fair to such objectors, many of them have previously stated that they believe that future more capable AI poses little catastrophic risk.

I get making mistakes, indeed it would be surprising if this post contained none of its own. Understanding even a relatively short bill like SB 1047 requires close reading. If you thoughtlessly forward anything that sounds bad (or good) about such a bill, you are going to make mistakes, some of which are going to look dumb.

If you have not previously done so, I recommend reading my previous coverage of the bill when it was proposed, although note the text has been slightly updated since then.

In the first half of that post, I did an RTFB (Read the Bill). I read it again for this post.

The core bill mechanism is that if you want to train a ‘covered model,’ meaning training on 10^26 flops or getting performance similar or greater to what that would buy you in 2024, then you have various safety requirements that attach. If you fail in your duties you can be fined, if you purposefully lie about it then that is under penalty of perjury.

I concluded this was a good faith effort to put forth a helpful bill. As the bill deals with complex issues, it contains both potential loopholes on the safety side, and potential issues of inadvertent overreach, unexpected consequences or misinterpretation on the restriction side.

In the second half, I responded to Dean Ball’s criticisms of the bill, which he called ‘California’s Effort to Strangle AI.’

  1. In the section What Is a Covered Model, I contend that zero current open models would count as covered models, and most future open models would not count, in contrast to Ball’s claim that this bill would ‘outlaw open models.’

  2. In the section Precautionary Principle and Covered Guidance, I notice that what Ball calls ‘precautionary principle’ is an escape clause to avoid requirements, whereas the default requirement is to secure the model during training and then demonstrate safety after training is complete.

  3. On covered guidance, I notice that I expect the standards there to be an extension of those of NIST, along with applicable ‘industry best practices,’ as indicated in the text.

  4. In the section Non-Derivative, I notice that most open models are derivative models, upon which there are no requirements at all. As in, if you start with Llama-3 400B, the safety question is Meta’s issue and not yours.

  5. In the section So What Would the Law Actually Do, I summarize my practical understanding of the law. I will now reproduce that below, with modifications for the changes to the bill and my updated understandings based on further analysis (the original version is here).

  6. In Crying Wolf, I point out that if critics respond with similar rhetoric regardless of the actual text of the bill offered, as has been the pattern, and do not help improve any bill details, then they are not helping us to choose a better bill. And that the objection to all bills seems motivated by a fundamental inability of their preferred business model to address the underlying risk concerns.

This is an updated version of my previous list.

In particular, this reflects that they have introduced a ‘limited duty exemption,’ which I think mostly mirrors previous functionality but improves clarity.

This is a summary, but I attempted to be expansive on meaningful details.

Let’s say you want to train a model. You follow this flow chart, with ‘hazardous capabilities’ meaning roughly ‘can cause 500 million or more in damage in especially worrisome ways, or a similarly worrying threat in other ways’ but clarification would be appreciated there.

  1. If your model is not projected to be at least 2024 state of the art and it is not over the 10^26 flops limit?

    1. You do not need to do anything at all. As you were.

    2. You are not training a covered model. 

    3. You do not need a limited duty exemption. 

    4. That’s it.

    5. Every other business in America and especially California is jealous.

    6. Where the 10^26 threshold is above the estimated compute cost of GPT-4 or the current versions of Google Gemini, and no open model is anywhere near it other than Meta’s prospective Llama-3 400B, which may or may not hit it.

  2. If your model is a derivative of an existing model?

    1. You do not need to do anything at all. As you were.

    2. All requirements instead fall on the original developer.

    3. You do not need a limited duty exemption. 

    4. That’s it.

    5. Derivative in practice probably means ‘most of the compute was spent elsewhere’ but this would ideally be clarified further as noted below.

    6. Most open models are derivative in this sense, often of e.g. Llama-N.

  3. If your model is projected to have lower benchmarks and not have greater capabilities than an existing non-covered model, or one with a limited duty exemption?

    1. Your model qualifies for a limited duty exemption.

    2. You can choose to accept the limited duty exemption, or proceed to step 4.

    3. To get the exemption, certify why the model qualifies under penalty of perjury.

    4. Your job now is to monitor events in case you were mistaken.

    5. If it turns out you were wrong in good faith about the model’s benchmarks or capabilities, you have 30 days to report this and cease operations until you are in compliance as if you lacked the exemption. Then you are fully in the clear.

    6. If you are judged not in good faith, then it is not going to go well for you.

  4. If none of the above apply, then you are training a covered model. If you do not yet qualify for the limited duty exemption, or you choose not to get one? What do you have to do in order to train the model?

    1. Implement cybersecurity protections to secure access and the weights.

    2. Implement a shutdown capability during training.

    3. Implement all covered guidance.

    4. Implement a written and separate safety and security protocol.

      1. The protocol needs to ensure the model either lacks hazardous capability or has safeguards that prevent exercise of hazardous capabilities.

      2. The protocol must include a testing procedure to identify potential hazardous capabilities, and what you would do if you found them.

      3. The protocol must say what would trigger a shutdown procedure.

  5. Once training is complete: Can you determine a limited duty exemption now applies pursuant to your own previously recorded protocol? If no, proceed to #6. If yes and you want to get such an exemption:

    1. You can choose to file a certification of compliance to get the exemption.

    2. You then have a limited duty exemption.

    3. Once again, judged good faith gives you a free pass on consequences, if something were to go wrong.

    4. To be unreasonable, the assessment also has to fail to take into account ‘reasonably foreseeable’ risks, which effectively means either (1) another similar developer, (2) NIST or (3) The Frontier Model Division already visibly foresaw them.

  6. What if you want to release your model without a limited duty exemption?

    1. You must implement ‘reasonable safeguards and requirements’ to prevent:

      1. An individual from being able to use the hazardous capabilities of the model.

      2. An individual from creating a derivative model that was used to cause a critical harm.

      3. This includes a shutdown procedure for all copies within your custody.

    2. You must ensure that anything the model does is attributed to the model to the extent reasonably possible. It does not say that this includes derivative models, but I assume it does.

    3. Implement any other measures that are reasonably necessary to prevent or manage the risks from existing or potential hazardous capabilities.

    4. You can instead not deploy the model, if you can’t or won’t do the above.

  7. After deployment, you need to periodically reevaluate your safety protocols, and file an annual report. If something goes wrong you have 72 hours to file an incident report.

Also, there are:

  1. Some requirements on computing clusters big enough to train a covered model. Essentially do KYC, record payments and check for covered model training. Also they are required to use transparent pricing.

  2. Some ‘pro-innovation’ stuff of unknown size and importance, like CalCompute. Not clear these will matter and they are not funded.

  3. An open source advisory council is formed, for what that’s worth.

  1. That this matters to most AI developers.

    1. It doesn’t, and it won’t.

    2. Right now it matters at most to the very biggest handful of labs.

    3. It only matters later if you are developing a non-derivative model using 10^26 or more flops, or one that will likely exhibit 2024-levels of capability for a model trained with that level of compute.

    4. Or, it could matter indirectly if you were planning to use a future open model from a big lab such as Meta, and that big lab is unable to provide the necessary reasonable assurance to enable the release of that model.

  2. That you need a limited duty exemption to train a non-covered or derivative model.

    1. You don’t. 

    2. You have no obligations of any kind whatsoever.

  3. That you need a limited duty exemption to train a covered model.

    1. You don’t. It is optional.

    2. You can choose to seek a limited duty exemption to avoid other requirements.

    3. Or you can follow the other requirements. 

    4. Your call. No one is ever forcing you to do this. 

  4. That this is an existential threat to California’s AI industry.

    1. Again, this has zero or minimal impact on most of California’s AI industry. 

    2. This is unlikely to change for years. Few companies will want covered models that are attempting to compete with Google, Anthropic and OpenAI.

    3. For those who do want covered models short of that, there will be increasing ability to get limited duty exemptions that make the requirements trivial.

  5. That the bill threatens academics or researchers.

    1. This bill very clearly does not. It will not even apply to them. At all.

    2. Those who say this, such as Martin Casado of a16z who was also the most prominent voice saying the bill would threaten California’s AI industry, show that they do not at all understand the contents or implications of the bill.

  6. There are even claims this bill is aimed at destroying the AI industry, or destroying anyone who would ‘challenge OpenAI.’

    1. Seriously, no, stop it.

    2. This bill is designed to address real safety and misuse concerns.

    3. That does not mean the bill is perfect, or even good. It has costs and benefits.

  7. That the requirements here impose huge costs that would sink companies.

    1. The cost of filing the required paperwork is trivial versus training costs. If you can’t do the paperwork, then you can’t afford to train the model either.

    2. The real costs are any actual safety protocols you must do if you are training a covered non-derivative model and cannot or will not get a limited duty exemption, 

    3. In which case you should mostly be doing anyway.

    4. The other cost is the inability to release a covered non-derivative model if you cannot get a limited duty exemption, and also cannot provide reasonable assurance of lack of hazardous capability,

    5. Especially with the proposed fixes, this should only happen for a reason.

  8. That this bill targets open weights or open source.

    1. It does the opposite in two ways. It excludes shutdown of copies of the model outside your control from the shutdown requirement, and it creates an advisory committee for open source with the explicit goal of helping them.

    2. When people say this will kill open source, what they mostly mean is that open weights are unsafe and nothing can fix this, and they want a free pass on this. So from their perspective, any requirement that the models not be unsafe is functionally a ban on open weight models.

    3. Open model weights advocates want to say that they should only be responsible for the model as they release it, not for what happens if any modifications are made later, even if those modifications are trivial in cost relative to the released model. That’s not on us, they say. That’s unreasonable.

    4. There is one real issue. The derivative model clause is currently worded poorly, without a cost threshold, such that it is possible to try to hold an open weights developer responsible in an unreasonable way. I do not think this ever would happen in practice for multiple reasons, but we should fix the language to ensure that.

    5. Many of the issues raised as targeting ‘open source’ apply to all models.

  9. That developers risk going to jail for making a mistake on a form.

    1. This (almost) never happens.

    2. Seriously, this (almost) never happens.

    3. People almost never get prosecuted for perjury, period. A few hundred a year.

    4. When they do, it is not for mistakes, it is for blatant lying caught red handed.

    5. And mostly that gets ignored too. The prosecutor needs to be really pissed off.

  10. Hazardous capability includes any harms anywhere that add up to $500 million.

    1. That is not what the bill says.

    2. The bill says the $500 million must be due to cyberattacks on critical infrastructure, autonomous illegal-for-a-human activity by an AI, or something else of similar severity.

    3. This very clearly does not apply to ‘$500 million in diffused harms like medical errors or someone using its writing capabilities for phishing emails.’

    4. I suggest changes to make this clearer, but it should be clear already.

  11. That the people advocating for this and similar laws are statists that love regulation.

    1. Seriously. no. It is remarkable the extent to which the opposite is true.

I see two big implementation problems with the bill as written. In both cases I believe a flexible good regulator plus a legal realist response should address the issue, but it would be far better to address them now:

  1. Derivative models can include unlimited additional training, thus allowing you to pass off your liability to any existing open model, in a way clearly not intended. This should be fixed by my first change below.

  2. The comparison rule for hazardous capabilities risks incorporating models that advance mundane utility or are otherwise themselves safe, where the additional general productivity enables harm, or the functionality used would otherwise be available in other models we consider safe, but the criminal happened to choose yours. We should fix this with my second change below.

  3. In addition to those large problems, a relatively small issue is that the catastrophic threshold is not indexed for inflation. It should be.

Then there are problems or downsides that are not due to flaws in the bill’s construction, but rather are inherent in trying to do what the bill is doing or not doing.

First, the danger that this law might impose practical costs.

  1. This imposes costs on those who would train covered models. Most of that cost, I expect in practice, is in forcing them to actually implement and document their security practices that they damn well should have done anyway. But although I do not expect it to be large compared to overall costs, since you need to be training a rather large non-derivative model for this law to apply to you, there will be some amount of regulatory ass covering, and there will be real costs to filing the paperwork properly and hiring lawyers and ensuring compliance and all that.

  2. It is possible that there will be models where we cannot have reasonable assurance of their lacking hazardous capabilities, or even that we knew have such capabilities, but which it would pass a cost-benefit test to make available, either via closed access or release of weights.

  3. Because even a closed weights model can be jailbroken reliably, if a solution to that and similar issues cannot be found, alignment continues to be unsolved and capabilities continue to improve, and when this becomes sufficiently hazardous and risky, and our safety plans seem inadequate, this could in the future impose a de facto cap on the general capabilities of AI models, at some unknown level above GPT-4. If you think that AI development should proceed regardless in that scenario, that there is nothing to worry about, then you should oppose this bill.

  4. Because open weights are unsafe and nothing can fix this, if a solution to that cannot be found and capabilities continue to improve, then holding the open weights developer responsible for the consequences of their actions may in the future impose a de facto cap on the general capabilities of open weight models, at some unknown level above GPT-4, that might not de facto apply to closed models capable of implementing various safety protocols unavailable to open models. If you instead want open weights to be a free legal pass to not consider the possibility of enabling catastrophic harms and to not take safety precautions, you might not like this.

  5. It is possible that there will be increasing regulatory capture, or that the requirements will otherwise be expanded in ways that are unwise.

  6. It is possible that rhetorical hysteria in response to the bill will be harmful. If people alter their behavior in response, that is a real effect.

  7. This bill could preclude a different, better bill.

There are also the risks that this bill will fail to address the safety concerns it targets, by being insufficiently strong, insufficiently enforced and motivating, or by containing loopholes. In particular, the fact that open weights models need not have the (impossible to get) ability to shutdown copies not in the developer’s possession enables the potential release of such weights at all, but also renders the potential shutdown not so useful for safety.

Also, the liability can only be invoked by the Attorney General, the damages are relatively bounded unless violations are repeated and flagrant or they are compensatory for actual harm, and good faith is a defense against having violated the provisions here. So it may be very difficult to win a civil judgment. 

It likely will be even harder and rarer to win a criminal one. While perjury is technically involved if you lie on your government forms (same as other government forms) that is almost never prosecuted, so it is mostly meaningless.

Indeed, the liability could work in reverse, effectively granting model developers safe harbor. Industry often welcomes regulations that spell out their obligations to avoid liability for exactly this reason. So that too could be a problem or advantage to this bill. 

There are two important changes.

  1. We should change the definition of derivative model by adding an 22606(i)(3) to make clear that if a sufficiently large amount of compute (I suggest 25% of original training compute or 10^26 flops, whichever is lower) is spent on additional training and fine-tuning of an existing model, then the resulting model is now non-derivative. The new developer has all the responsibilities of a covered model, and the old developer is no longer responsible.

  2. We should change the comparison baseline on 22602(n)(1) when evaluating difficulty of causing catastrophic harm, inserting words to the effect of adding ‘other than access to other covered models that are known to be safe.’ Instead of comparing to causing the harm without use of any covered model, we should compare to causing the harm without use of any safe covered model that lacks hazardous capability. You then cannot be blamed because a criminal happened to use your model in place of GPT-N, as part of a larger package or for otherwise safe dual use actions like making payroll or scheduling meetings, and other issues like that. In that case, either GPT-N and your model therefore both hazardous capability, or neither does.

In addition:

  1. The threshold of $500 million in (n)(1)(B) and (n)(1)(C) should add ‘in 2024 dollars’ or otherwise be indexed for inflation.

  2. I would clear up the language in 22606(f)(2) to make unambiguous that this refers to the either what one could reasonably have expected to accomplish with that many flops in 2024, rather than being as good as the weakest model trained on such compute, and if desired that it should also refer to the strongest model available in 2024. Also we should clarify what date in 2024, if it is December 31 we should say so. The more I look at the current wording the more clear is the intent, but let’s make it a lot easier to see that.

  3. After consulting legal experts to get the best wording, and mostly to reassure people, I would add 22602(n)(3) to clarify that to qualify under (n)(1)(D) requires that the damage caused be acute and concentrated, and that it not be the diffuse downside of a dual use capability that is net beneficial, such as occasional medical mistakes resulting from sharing mostly useful information.

  4. After consulting legal experts to get the best wording, and mostly to reassure people, I would consider adding 22602 (n)(4) to clarify that the use of a generically productivity enhancing dual use capability, where that general increase in productivity is then used to facilitate hazardous activities without directly enabling the hazardous capabilities themselves, such as better managing employee hiring or email management, does not constitute hazardous capabilities. If it tells you how to build a nuclear bomb and this facilitates building one, that is bad. If it manages your payroll taxes better and this lets you hire someone who then makes a nuclear bomb, we should not blame the model. I do not believe we would anyway, but we can clear that up.

  5. It would perhaps be good to waive levies (user fees) for sufficiently small businesses, at least for when they are asking for limited duty exceptions, despite the incentive concerns, since we like small business and this is a talking point that can be cheaply diffused.

No. Never.

This perception is entirely due to a hallucination of how the bill works. People think you need a limited duty exemption to train any model at all. You don’t. This is nowhere in the bill. 

If you are training a non-covered or derivative model, you have no obligations under this bill. 

If you are training a covered model, you can choose to implement safeguards instead.

There is a loophole that needs to be addressed.

The problem is, what would happen if you were to start with (for example) Llama-3 400B, but then train it using an additional 10^27 flops in compute to create Acme-5, enhancing its capabilities to the GPT-5 level? Or if you otherwise used an existing model as your starting point, but mostly used that as an excuse or small cost savings, and did most of the work yourself?

This is a problem both ways.

The original non-derivative model and developer, here Llama-3 and Meta, should not be responsible for the hazardous capabilities that result.

On the other hand, Acme Corporation, the developers of Acme-5, clearly should be responsible for Acme-5 as if it were a non-derivative model.

Quintin Pope points out this is possible on any open model, no matter how harmless.

Jon Askonas points this out as well.

xlr8harder extends this, saying it is arguable you could not even release untrained weights.

I presume the regulators and courts would not allow such absurdities, but why take that chance or give people that worry?

My proposed new definition extension to fix this issue, for section 3 22602 (i)(3): If training compute to further train another developer’s model is expended or is planned to be expended that is greater than [10% / 25% / 50%] of the training compute used to train a model originally, or involves more than 10^26 flops, then the resulting new model is no longer considered a derivative model. It is now a non-derivative model for all purposes.

Nick Moran suggests the derivative model requirement is similar to saying ‘you cannot sell a blank book,’ because the user could introduce new capabilities. He uses the example of not teaching a model any chemistry or weapon information, and then someone fires up a fine-tuning run on a corpus of chemical weapons manuals.

I think that is an excellent example of a situation in which this is ‘a you problem’ for the model creator. Here, it sounds like it took only a very small fine tune, costing very little, to enable the hazardous capability. You have made the activity of ‘get a model to help you do chemical weapons’ much, much easier to accomplish than it would have been counterfactually. So then the question is, did the ability to use the fine-tuned model help you substantially more than only having access to the manuals.

Whereas most of the cost of a book that describes how to do something is in choosing the words and writing them down, not in creating a blank book to print upon, and there are already lots of ways to get blank books.

If the fine-tune was similar in magnitude of cost to the original training run, then I would say it is similar to a blank book, instead.

Charles Foster finds this inadequate, responding to a similar suggestion from Dan Hendrycks, and pointing out the combination scenario I may not have noticed otherwise.

Charles Foster: I don’t think that alleviates the concern. Developer A shouldn’t be stopped from releasing a safe model just because—for example—Developer B might release an unsafe model that Developer C could cheaply combine with Developer A’s. They are clearly not at fault for that.

This issue is why I also propose modifying the alternative capabilities rule.

See that section for more details. My proposal is to change from comparing to using no covered models, to comparing to using no unsafe models. Thus, you have to be enabling over and above what could have been done with for example GPT-N.

If Developer B releases a distinct unsafe covered model, which combined with Developer A’s model is unsafe, then I note that Developer B’s model is in this example non-derivative, so the modification clarifies that the issue is not on A merely because C chose to use A’s model over GPT-N for complementary activities. If necessary, we could add an additional clarifying clause here.

The bottom line, as I see it is:

  1. We should define derivative models such that it requires the original developer to have borne most of the cost and done most of the work, such that it is only derivative if you are severely discounting the cost of creating the new system.

  2. If you are severely discounting the cost of creating an unsafe system, and we can talk price about what the rule should be here, then that does not sound safe to me.

  3. If it is impossible to create a highly capable open model weights system that cannot be made unsafe at nominal time and money cost, then why do you think I should allow you to release such a model?

  4. We should identify cases where our rules would lead to unreasonable assignments of fault, and modify the rules to fix them.

Yes. This is an easy fix, change Sec. 3 22602 (n)(B) and (C) to index to 2024 dollars. There is no reason this threshold should decline in real terms over time.

Here is the current text.

(n) (1) “Hazardous capability” means the capability of a covered model to be used to enable any of the following harms in a way that would be significantly more difficult to cause without access to a covered model:

(A) The creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties.

(B) At least five hundred million dollars ($500,000,000) of damage through cyberattacks on critical infrastructure via a single incident or multiple related incidents.

(C) At least five hundred million dollars ($500,000,000) of damage by an artificial intelligence model that autonomously engages in conduct that would violate the Penal Code if undertaken by a human.

(D) Other threats to public safety and security that are of comparable severity to the harms described in paragraphs (A) to (C), inclusive.

I will address the harm counterfactual of ‘significantly more difficult to cause without access to a covered model’ in the next section.

I presume that everyone is onboard with (A) counting as hazardous. We could more precisely define ‘mass’ casualties, but it does not seem important.

Notice the construction of (B). The damage must explicitly be damage to critical infrastructure. This is not $500 million from a phishing scam, let alone $500 from each of a million scams. Similarly, notice (C). The violation of the penal code must be autonomous.

Both are important aggravating factors. A core principle of law is that if you specify X+Y as needed to count as Z, then X or Y alone is not a Z.

So when (D) says ‘comparable severity’ this cannot purely mean ‘causes $500 million in damages.’ In that case, there is no need for (B) or (C), one can simply say ‘causes $500 million in cumulative damages in some related category of harms.’

My interpretation of (D) is that the damages need to be sufficiently acute and severe, or sufficiently larger than this, as to be of comparable severity with only a similar level of overall damages. So something like causing a very large riot, perhaps.

You could do it via a lot of smaller incidents with less worrisome details, such as a lot of medical errors or malware emails, but we are then talking at least billions of dollars of counterfactual harm.

This seems like a highly reasonable rule.

However, people like Quinton Pope here are reasonably worried that it won’t be interpreted that way:

Quintin Pope: Suppose an open model developer releases an innocuous email writing model, and fraudsters then attach malware to the emails written by that model. Are the model developers then potentially liable for the fraudsters’ malfeasance under the derivative model clause?

Please correct me if I’m wrong, but SB 1047 seems to open multiple straightforward paths for de facto banning any open model that improves on the current state of the art. E.g., – The 2023 FBI Internet Crime Report indicates cybercriminals caused ~$12.5 billion in total damages. – Suppose cybercriminals do similar amounts in future years, and that ~5% of cybercriminals use whatever open source model is the most capable at a given time.

Then, any open model better that what’s already available would predictably be used in attacks causing > $500 million and thus be banned, *even if that model wouldn’t increase the damage caused by those attacks at all*.

Cybercrime isn’t the only such issue. “$500 million in damages” sounds like a big number, but it’s absolute peanuts compared to things that actually matter on an economy-wide scale. If open source AI ever becomes integrated enough into the economy that it actually benefits a significant number of people, then the negative side effects of anything so impactful will predictably overshoot this limit.

My suggestion is that the language be expanded for clarity and reassurance, and to guard against potential overreach. So I would move (n)(2) to (n)(3) and add a new (n)(2), or I would add additional language to (D), whichever seems more appropriate.

The additional language would clarify that the harm needs to be acute and not as a downside of beneficial usage, and this would not apply if the model contributed to examples such as Quintin’s. We should be able to find good wording here.

I would also add language clarifying that general ‘dual use’ capabilities that are net beneficial, such as helping people sort their emails, cannot constitute hazardous capability.

This is something a lot of people are getting wrong, so let’s make it airtight.

To count as hazardous capability, this law requires that the harm be ‘significantly more difficult to cause without access to a covered model,’ not without access to this particular model, which we will return to later.

This is considerably stronger than ‘this was used as part of the process’ and considerably weaker than ‘required this particular covered model in particular.’

The obvious problem scenario, why you can’t use a weaker clause, is what if:

  1. Acme issues a model that can help with cyberattacks on critical infrastructure.

  2. Zenith issues a similar model that does all the same things.

  3. Both are used to do crime that triggers (B) that required Acme or Zenith.

  4. Acme says the criminals would have used Zenith.

  5. Zenith says the criminals would have used Acme.

You need to be able to hold at least one of them liable.

The potential flaw in the other direction is, what if covered models simply greatly enhance all forms of productivity? What if it is ‘more difficult without access’ because your company uses covered models to do ordinary business things? Clearly that is not intended to count.

A potential solution might be to say something that is effectively ‘without access to a covered model that itself has hazardous capabilities’?

  1. Acme is a covered model.

  2. Zenith is a covered model.

  3. Zenith is used to substantially enable cyberattacks that trigger (B).

  4. If this could have also been done with Acme with similar difficulty, then either both Zenith and Acme have hazardous capabilities, or neither of them do.

I am open to other suggestions to get the right counterfactual in a robust way.

None of this has anything to do with open model weights. The problem does not differentiate. If we get this wrong and cumulative damages or other mundane issues constitute hazardous capabilities, it will not be an open weights problem. It will be a problem for all models.

Indeed, in order for open models to be in trouble relative to closed models, we need a reasonably bespoke definition of what counts here, that properly identifies the harms we want to avoid. And then the open models would need to be unable to prevent that harm.

As an example of this and other confusions being widespread: The post was deleted so I won’t name them, but two prominent VCs posted and retweeted that ‘under this bill, open source devs could be held liable for an LLM outputting ‘contraband knowledge’ that you could get access to easily via Google otherwise.’ Which is clearly not the case.

It seems hard. Jessica Taylor notes that it seems very hard. Indeed, she does not see a way for any developer to in good faith provide assurance that their protocol works.

The key term of art here is ‘reasonable assurance.’ That gives you some wiggle room.

Jessica points out that jailbreaks are an unsolved problem. This is very true.

If you are proposing a protocol for a closed model, you should assume that your model can and will be fully jailbroken, unless you can figure out a way to make that not true. Right now, we do not know of a way to do that. This could involve something like ‘probabilistically detect and cut off the jailbreak sufficiently well that the harm ends up not being easier to cause than using another method’ but right now we do not have a method for that, either.

So the solution for now seems obvious. You assume that the user will jailbreak the model, and assess it accordingly.

Similarly, for an open weights model, you should assume the first thing the malicious user does is strip out your safety protocols, either with fine tuning or weights injection or some other method. If your plan was refusals, find a new plan. If your plan was ‘it lacks access to this compact data set’ then again, find a new plan.

As a practical matter, I believe that I could give reasonable assurance, right now, that all of the publically available models ( including GPT-4, Claude 3, and Gemini Advanced 1.0 and Pro 1.5) lack hazardous capability, if we were to lower the covered model threshold to 10^25 and included them.

If I was going to test GPT-5 or Claude-4 or Gemini-2 for this, how would I do that? There’s a METR for that, along with the start of robust internal procedures. I’ve commented extensively on what I think a responsible scaling policy (RSP) or preparedness framework should look like, which would carry many other steps as well.

One key this emphasizes is that such tests need to give the domain experts jailbroken access, rather than only default access.

Perhaps this will indeed prove impractical in the future for what would otherwise be highly capable models if access is given widely. In that case, we can debate whether that should be sufficient to justify not deploying, or deploying in more controlled fashion.

I do think that is part of the point. At some point, this will no longer be possible. At that point, you should actually adjust what you do.

No.

Reasonable assurance is a term used in auditing. 

Here is Claude Opus’s response, which matches my understanding:

In legal terminology, “reasonable assurance” is a level of confidence or certainty that is considered appropriate or sufficient given the circumstances. It is often used in the context of auditing, financial reporting, and contracts.

Key points about reasonable assurance:

  1. It is a high, but not absolute, level of assurance. Reasonable assurance is less than a guarantee or absolute certainty.

  2. It is based on the accumulation of sufficient, appropriate evidence to support a conclusion.

  3. The level of assurance needed depends on the context, such as the risk involved and the importance of the matter.

  4. It involves exercising professional judgment to assess the evidence and reach a conclusion.

  5. In auditing, reasonable assurance is the level of confidence an auditor aims to achieve to express an opinion on financial statements. The auditor seeks to obtain sufficient appropriate audit evidence to reduce the risk of expressing an inappropriate opinion.

  6. In contracts, reasonable assurance may be required from one party to another about their ability to fulfill obligations or meet certain conditions.

The concept of reasonable assurance acknowledges that there are inherent limitations in any system of control or evidence gathering, and absolute certainty is rarely possible or cost-effective to achieve.

Jeremy Howard made four central objections, and raised several other warnings below, that together seemed to effectively call for no rules on AI at all.

One objection, echoed by many others, is that the definition here is overly broad.

Right now, and for the next few years, the answer is clearly no. Eventually, I still do not think so, but it becomes a reasonable concern.

Howard says this sentence, which I very much appreciate: “This could inadvertently criminalize the activities of well-intentioned developers working on beneficial AI projects.”

Being ‘well-intentioned’ is irrelevant. The road to hell is paved with good intentions. Who decides what is ‘beneficial?’ I do not see a way to take your word for it.

We don’t ask ‘did you mean well?’ We ask whether you meet the requirements.

I do agree it would be good to allow for cost-benefit testing, as I will discuss later under Pressman’s suggestion.

You must do mechanism design on the rule level, not on the individual act level.

The definition can still be overly broad, and this is central, so let’s break it down.

Here is (Sec. 3 22602):

(f) “Covered model” means an artificial intelligence model that meets either of the following criteria:

(1) The artificial intelligence model was trained using a quantity of computing power greater than 10^26 integer or floating-point operations.

(2) The artificial intelligence model was trained using a quantity of computing power sufficiently large that it could reasonably be expected to have similar or greater performance as an artificial intelligence model trained using a quantity of computing power greater than 10^26 integer or floating-point operations in 2024 as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.

This probably covers zero currently available models, open or closed. It definitely covers zero available open weights models.

It is possible this would apply to Llama-3 400B, and it would presumably apply to Llama-4. The barrier is somewhere in the GPT-4 (4-level) to GPT-5 (5-level) range.

This does not criminalize such models. It says such models have to follow certain rules. If you think that open models cannot abide by any such rules, then ask why. If you object that this would impose a cost, well, yes.

You would be able to get an automatic limited duty exemption, if your model was below the capabilities of a model that had an existing limited duty exemption, which in this future could be a model that was highly capable.

I do get that there is a danger here that in 2027 we could have GPT-5-level performance in smaller models and this starts applying to a lot more companies, and perhaps no one at 5-level can get a limited duty exemption in good faith.

That would mean that those models would be on the level of GPT-5, and no one could demonstrate their safety when used without precautions. What should our default regime be in that world? Would this then be overly broad?

My answer is no. The fact that they are in (for example) the 10^25 range does not change what they can do.

Neil Chilson says the clause is anti-competitive, with its purpose being to ensure that if someone creates a smaller model that has similar performance to the big boys, that it would not have cheaper compliance costs.

In this model, the point of regulating large models is to impose high regulatory compliance costs on big companies and their models, so that those companies benefit from the resulting moat. And thus, the costs must be imposed on other capable models, or else the moat would collapse.

No.

The point is to ensure the safety of models with advanced capabilities.

The reason we use a 10^26 flops threshold is that this is the best approximation we have for ‘likely will have sufficiently advanced capabilities.’

Are regulatory requirements capable of contributing to moats? Yes, of course. And it is possible this will happen here to a non-trivial degree, among those training frontier foundation models in particular. But I expect the costs involved to be a small fraction of the compute costs of training such models, or the cost of actual necessary safety checks, as I note elsewhere.

The better question is, is this the right clause to accomplish that?

If the clause said that performance on any one benchmark triggered becoming a covered model, the same way that in order to get a limited duty exception you need to be inferior on all benchmarks, then I would say that was overly broad. A model happening to be good at one thing does not mean it is generally dangerous.

That is not what the clause says. It says ‘as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.’ So this is an overall gestalt. That seems like a highly reasonable rule.

In my reading the text clearly refers to what one would expect as the result of a state of the art training run of size 10^26 in 2024, rather than the capabilities of any given model. For example, it obviously would not be a null provision if no model over the threshold was released in 2024, which is unlikely but not known to be impossible. And obviously no one thinks that if Falcon produced a terrible 10^26 flops model that was GPT-3.5 level, that this would be intended to lower the bar to that.

So for example this claim by Brian Chau is at best confused, if you ignore the ludicrous and inflammatory framing. But I see an argument that this is technically ambiguous if you are being sufficiently dense, so I suggest clarification.

Then there is this by Perry Metzger, included for completeness, accusing Dan Hendrycks, all of LessWrong and all safety advocates of being in beyond bad faith. He also claims that ‘the [AI] industry will be shut down in California if this passes’ and for reasons I explain throughout I consider that absurd and would happily bet against that.

No, and it perhaps could do the opposite by creating safe harbor.

Several people have claimed this bill creates unreasonable liability, including Howard as part of his second objection. I think that is essentially a hallucination.

 There have been other bills that propose strict liability for harms. This bill does not.

The only way you are liable under this bill is if the attorney general finds you in violation of the statute, and brings a civil action, requiring a civil penalty proportional to the model’s training cost. That is it.

What would it mean to be violating this statute? It roughly means you failed to take reasonable precautions, you did not follow the requirements, and you failed to act in good faith, and the courts agreed.

Even if your model is used to inflict catastrophic harm, a good faith attempt at reasonable precautions is a complete defense.

If a model were to enable $500 million in damages in any fashion, or mass casualties, even if it does not qualify as hazardous capability under this act, people are very much getting sued under current law. By spelling out what model creators must do via providing reasonable assurance, this lets labs claim that this should shield them from ordinary civil liability. I don’t know how effective that would be, but similar arguments have worked elsewhere.

The broader context of Howard’s second objection is that the models are ‘dual use,’ general purpose tools, and can be used for a variety of things. As I noted above, clarification would be good to rule out ‘the criminals used this to process their emails faster and this helped them do the crime’ but I am not worried this would happen either way, nor do I see how ‘well funded legal teams’ matter here.

Howard tries to make this issue about open weights, but it is orthogonal to that. The actual issue he is pointing towards here, I will deal with later.

Not unless they are willfully defying the rules and outright lying in their paperwork.

Here is California’s perjury statute

Even then, mostly no. It is extremely unlikely that perjury charges will ever be pursued unless there was clear bad faith and lying. Even then, and even if this resulted in actual catastrophic harm, not merely potential harm, it still seems unlikely.

Lying on your tax return or benefit forms or a wide variety of government documents is perjury. Lying on your loan application is perjury. Lying in signed affidavits or court testimony is perjury.

Really an awful lot of people are committing perjury all the time. Also this is a very standard penalty for lying on pretty much any form, ever, even at trivial stakes.

This results in about 300-400 federal prosecutions for perjury per year, total, out of over 80,000 annual criminal cases.

In California for 2022, combining perjury, contempt and intimidation, there were a total of 9 convictions, none in the Northern District that includes San Francisco.

Unlike several other proposed bills, companies are tasked with their own compliance. 

You can be sued civilly by the Attorney General if you violate the statute, with good faith as a complete defense. In theory, if you lie sufficiently brazenly on your government forms, like in other such cases, you can be charged with perjury, see the previous question. That’s it. 

If you are not training a covered non-derivative model, there is no enforcement. The law does not apply to you.

If you are training a covered non-derivative model, then you decide whether to seek a limited duty exemption. You secure the model weights and otherwise provide cybersecurity during training. You decide how to implement covered guidance. You do any necessary mitigations. You decide what if any additional procedures are necessary before you can verify the requirements for the limited duty exemption or provide reasonable assurance. You do have to file paperwork saying what procedures you will follow in doing so.

There is no procedure where you need to seek advance government approval for any action.

No. It creates the Frontier Model Division within the Department of Technology. See section 4, 11547.6(c). The new division will issue guidance, allow coordination on safety procedures, appoint an advisory committee on (and to assist) open source, publish incident reports and process certifications. 

No. 

This has been in other proposals. It is not in this bill. The model developer provides the attestment, and does not need to await its review or approval.

Right now rather obviously not, since they do not apply to small developers.

The substantial burdens only apply if you train a covered model, from scratch, that can’t get a limited duty exception. A derivative model never counts.

That will not happen to a small developer for years.

At that point, yes, if you make a GPT-5-level model from scratch, I think you can owe us some reports.

The burden of the reports seems to pale in comparison to (and on top of) the burden of actually taking the precautions, or the burden of the compute cost of the model being trained. This is not a substantial cost addition once the models get that large.

The good objection here is that ‘covered guidance’ is open ended and could change. I see good reasons to be wary of that, and to want the mechanisms picked carefully. But also any reasonable regime is going to have a way to issue new guidance as models improve.

It would be if it fully applied to such models.

The good news for open weights models is that this (somehow) does not apply to them. Read the bill, bold is mine.

(m) “Full shutdown” means the cessation of operation of a covered model, including all copies and derivative models, on all computers and storage devices within custody, control, or possession of a person, including any computer or storage device remotely provided by agreement.

If they had meant ‘full shutdown’ to mean ‘no copies of the model are now running’ then this would not be talking about custody, control or possession at all. Instead, if the model is now fully autonomous and out of your control, or is open weights and has been downloaded by others, you are off the hook here.

Which is good for open model weights, because ‘ability to take back a mistake’ or ‘shut down’ is not an ability they possess.

This seems like a real problem for the actual safety intent here, as I noted last time.

Rather than a clause that is impossible for an open model to meet, this is a clause where open models are granted extremely important special treatment, in a way that seems damaging to the core needs of the bill.

The other shutdown requirement is the one during training of a covered model without a limited duty exception.

That one says, while training the model, you must keep the weights on lockdown. You cannot open them up until after you are done, and you run your tests. So, yes, there is that. But that seems quite sensible to me? Also a rule that every advanced open model developer has followed in practice up until now, to the best of my knowledge.

Thus I believe objections like Kevin Lacker’s here are incorrect with respect to the shutdown provision. For his other more valid concern, see the derivative model definition section. 

On Howard’s final top point, what here disincentivizes openness?

Openness and disclosing information on your safety protocols and training plans are fully compatible. Everyone faces the same potential legal repercussions. These are costs imposed on everyone equally.

To the extent they are imposed more on open models, it is because those models are incapable of guarding against the presence of hazardous capabilities.

Ask why.

Howard raised this possibility, as does Martin Casado of a16z, who calls the bill a ‘fing disaster’ and an attack on innovation generally.  

I don’t see how this ever happens. It seems like a failure to understand the contents of the bill, or to think through the details.

The only people liable or who have responsibilities under SB 1047 are those that train covered models. That’s it. What exactly is your research, sir?

It is standard at this point to include ‘business pays the government fees to cover administrative costs’ in such bills, in this case with Section 11547.6 (c)(11). This aligns incentives.

It is also standard to object, as Howard does, that this is an undue burden on small business.

My response is, all right, fine. Let’s waive the fees for sufficiently small businesses, so we don’t have to worry about this. It is at worst a small mistake.

Howard warned of this.

Again, the barrier to entry can only apply if the rules apply to you. So this would only apply in the future, and only to companies that seek to train their own covered models, and only to the extent that this is burdensome.

This could actively work the other way. Part of this law will be that NIST and other companies and the Frontier Model Division will be publishing their safety protocols for you to copy. That seems super helpful.

I am not sure if this is on net a barrier to entry. I expect a small impact.

Did they, as also claimed by Brian Chau, ‘literally specify that they want to regulate models capable of competing with OpenAI?’

No, of course not, that is all ludicrous hyperbole, as per usual.

Brian Chau also goes on to say, among other things that include ‘making developers pay for their own oppression’:

Brian Chau: The bill would make it a felony to make a paperwork mistake for this agency, opening the door to selective weaponization and harassment.

Um, no. Again, see the section on perjury, and also the very explicit text of the bill. That is not what the bill says. That is not what perjury means. If he does not know this, it is because he is willfully ignorant of this and is saying it anyway.

And then the thread in question was linked to by several prominent others, all of whom should know better, but have shown a consistent pattern of not knowing better.

To those people: You can do better. You need to do better.

There are legitimate reasons one could think this bill would be a net negative even if its particular detailed issues are fixed. There are also particular details that need (or at least would benefit from) fixing. Healthy debate is good.

This kind of hyperbole, and a willingness to repeatedly signal boost it, is not.

Brian does then also make the important point about the definition of derivative model currently being potentially overly broad, allowing unlimited additional training, and thus effectively the classification of a non-derivative model as derivative of an arbitrary other model (or least one with enough parameters). See the section on the definition of derivative models, where I suggest a fix.

Several people raised the specter of people or companies leaving the state.

It is interesting that people think you can avoid the requirements by leaving California. I presume that is not the intent of the law, and under other circumstances such advocates would point out the extraterritoriality issues.

If it is indeed true that the requirements here only apply to models trained in California, will people leave?

In the short term, no. No one who this applies to would care enough to move. As I said last time, have you met California? Or San Francisco? You think this is going to be the thing that triggers the exodus? Compared to (for example) the state tax rate, this is nothing.

If and when, a few years down the line, the requirements start hitting smaller companies who want to train and release non-derivative covered models where they would be unable to reasonably adhere to the laws, and they can indeed avoid jurisdiction by leaving, then maybe those particular people will do it.

But that will at most be a tiny fraction of people doing software development. Most companies will not have covered models at all, because they will use derivative models or someone else’s models. So the network effects are not going anywhere.

This is possible.

This would be the result of Meta being unwilling or unable to provide reasonable assurance that Llama-4-1T lacked hazardous capabilities.

Ask why this would happen.

Again, it would come down to the fundamental conflict that open weights are unsafe and nothing can fix this, indeed this would happen because Meta cannot fix this.

If that is likely to happen because the definitions here (e.g. for hazardous capability or reasonable assurance or derivative model) are flawed, the definitions should be fixed. I suggest some such changes here. If that seems insufficient, I (and I believe the bill’s author and sponsors as well) are open to further suggestions.

If you think Meta should, if unable to provide reasonable assurance, release the weights of such a future highly capable model anyway, because open weights are more important, then we have a strong values disagreement. I also notice that you oppose the entire purpose of the bill. You should oppose this bill, and be clear as to why.

John Pressman gets constructive, proposes the best kind of test: A cost-benefit test.

John Pressman: Since I know you [Scott Weiner] are unlikely to abandon this bill, I do have a suggested improvement: For a general technology like foundation models, the benefits will accrue to a broad section of society including criminals.

My understanding is that the Federal Trade Commission decides whether to sanction a product or technology based on a utilitarian standard: Is it on the whole better for this thing to exist than not exist, and to what extent does it create unavoidable harms and externalities that potentially outweigh the benefits?

In the case of AI and e.g. open weights we want to further consider marginal risk. How much *extra benefitand how much *extra harmis created by the release of open weights, broadly construed?

This is of course a matter of societal debate, but an absolute threshold of harm for a general technology mostly acts to constrain the impact rather than the harm, since *anyform of impact once it becomes big enough will come with some percentage of absolute harm from benefits accruing to adversaries and criminals.

I share others concerns that any standard will have a chilling effect on open releases, but I’m also a pragmatic person who understands the hunger for AI regulation is very strong and some kind of standards will have to exist. I think it would be much easier for developers to weigh whether their model provides utilitarian benefit in expectation, and the overall downstream debate in courts and agency actions will be healthier with this frame.

[In response to being asked how he’d do it]: Since the FTC already does this thing I would look there for a model. The FTC was doing some fairly strong saber rattling a few years ago as part of a bid to become The AI Regulator but seems to have backed down.

Zvi: It looks from that description like the FTC’s model is ‘no prior restraint but when we don’t like what you did and decide to care then we mess you up real good’?

John Pressman: Something like that. This can be Fine Actually if your regulator is sensible, but I know that everyone is currently nervous about the quality of regulators in this space and trust is at an all time low.

Much of the point is to have a reasonable standard in the law which can be argued about in court. e.g. some thinkers like yourself and Jeffrey Laddish are honest enough to say open weights are very bad because AI progress is bad.

The bill here is clearly addressing only direct harms. It excludes ‘accelerates AI progress in general’ as well as ‘hurts America in its competition with China’ and ‘can be used for defensive purposes’ and ‘you took our jobs’ and many other things. Those impacts are ignored, whatever sign you think they deserve, the same way various other costs and benefits are ignored.

Pressman is correct that the natural tendency of a ‘you cannot do major harm’ policy is ‘you cannot do major activities at all’ policy. A lot of people are treating the rule here as far more general than it is with a much lower threshold than it has, I believe including Pressman. See the discussion on the $500 million and what counts as a hazardous capability. But the foundational problem is there either way.

Could we do a cost-benefit test instead? It is impossible to fully ‘get it right’ but it is always impossible to get it right. The question is, can we make this practical?

I do not like the FTC model. The FTC model seems to be:

  1. You do what you want.

  2. One day I decide something is unfair or doesn’t ‘pass cost-benefit.’

  3. Retroactively I invalidate your entire business model and your contracts.

  4. Also, you do not want to see me angry. You would not like me when I’m angry.

There are reasons Lina Khan is considered a top public enemy by much of Silicon Valley.

This has a lot of the problems people warn about, in spades.

  1. If it turns out you should not have released the model weights, and I decide you messed up, what happens now? You can’t take it back. And I don’t think any of us want to punish you enough to make you regret releasing models that might be mistakes to release.

  2. Even if you could take it back, such as with a closed model, are you going to have to shut down the moment the FTC questions you? That could break you, easily. If not, then how fast can a court move? By the time it rules, the world will have moved on to better models, you made your killing or everyone is dead, or what not.

  3. It is capricious and arbitrary. Yes, you can get court arguments once the FTC (or other body) decides to have it out with you, it is going to get ugly for you, even if you are right. They can and do threaten you in arbitrary ways. They can and do play favorites and go after enemies while ignoring friends who break rules.

  4. I think these problems are made much worse by this structure.

So I think if you want cost-benefit, you need to do a cost-benefit in advance of the project. This would clearly be a major upgrade on for example NEPA (where I want to do exactly this), or on asking to build housing, and other similar matters.

Could we make this reliable enough and fast enough that this made sense? I think you would still have to do all the safety testing.

Presumably there would be a ‘safe harbor’ provision. Essentially, you would want to offer a choice:

  1. You can follow the hazardous capabilities procedure. If your model lacks hazardous capabilities in the sense defined here, then we assume the cost-benefit test is now positive, and you can skip it. Or at least, you can release pending it.

  2. You can follow the cost-benefit procedure. You still have to document what hazardous capabilities could be present, or we can’t model the marginal costs. Then we can also model the marginal benefits.

    1. We would want to consider the class of model as a group as well, at least somewhat, so we don’t have the Acme-Zenith issue where the other already accounts for the downside and both look beneficial.

Doomslide suggests that using the concept of ‘weights’ at all anchors us too much on existing technology, because regulation will be too slow to adjust, and we should use only input tokens, output tokens and compute used in forward passes. I agree that we should strive to keep the requirements as simple and abstract as possible, for this and other reasons, and that ideally we would word things such that we captured the functionality of weights rather than speaking directly about weights. I unfortunately find this impractical.

I do notice the danger of people trying to do things that technically do not qualify as ‘weights’ but that is where ‘it costs a lot of money to build a model that is good’ comes in, you would be going to a lot of trouble and expense for something that is not so difficult to patch out.

That also points to the necessity of having a non-zero amount of human discretion in the system. A safety plan that works if someone follows the letter but not the spirit, and that allows rules lawyers and munchkining and cannot adjust when circumstances change, is going to need to be vastly more restrictive to get the same amount of safety.

Jessica Taylor goes one step further, saying that these requirements are so strict that you would be better off either abandoning the bill or banning covered model training entirely.

I think this is mostly a pure legal formalism interpretation of the requirements, based on a wish that our laws be interpreted strictly and maximally broadly as written, fully enforced fully in all cases and written with that in mind, and seeing our actual legal system as it functions today as in bad faith and corrupt. So anyone who participated here would have to also be in bad faith and corrupt, and otherwise she sees this as a blanket ban.

I find a lot appealing about this alternative vision of a formalist legal system and would support moving towards it in general. It is very different from our own. In our legal system, I believe that the standard of ‘reasonable assurance’ will in practice be something one can satisfy, in actual good faith, with confidence that the good faith defense is available.

In general, I see a lot of people who interpret all proposed new laws through the lens of ‘assume this will be maximally enforced as written whenever that would be harmful but not when it would be helpful, no matter how little sense that interpretation would make, by a group using all allowed discretion as destructively as possible in maximally bad faith, and that is composed of a cabal of my enemies, and assume the courts will do nothing to interfere.’

I do think this is an excellent exercise to go through when considering a new law or regulation. What would happen if the state was fully rooted, and was out to do no good? This helps identify ways we can limit abuse potential and close loopholes and mistakes. And some amount of regulatory capture and not getting what you intended is always part of the deal and must be factored into your calculus. But not a fully maximal amount.

In defense of the bill, also see Dan Hendrycks’s comments, and also he quotes Hinton and Bengio:

Geoffrey Hinton: SB 1047 takes a very sensible approach… I am still passionate about the potential for AI to save lives through improvements in science and medicine, but it’s critical that we have legislation with real teeth to address the risks.

Yoshua Bengio: AI systems beyond a certain level of capability can pose meaningful risks to democracies and public safety. Therefore, they should be properly tested and subject to appropriate safety measures. This bill offers a practical approach to accomplishing this, and is a major step toward the requirements that I’ve recommended to legislators.

Howard has a section on this. It is my question to all those who object.

If you want to modify the bill, how would you change it?

If you want to scrap the bill, what would you do instead?

Usually? Their offer is nothing.

Here are Howard’s suggestions, which do not address the issues the bill targets:

  1. The first suggestion is to ‘support open-source development,’ which is the opposite of helping solve these issues.

  2. ‘Focus on usage, not development’ does not work. Period. We have been over this.

  3. ‘Promote transparency and collaboration’ is in some ways a good idea, but also this bill requires a lot of transparency and he is having none of that.

  4. ‘Invest in AI expertise’ for government? I notice that this is also objected to in other contexts by most of the people making the other arguments here. On this point, we fully agree, except that I say this is a compliment not a substitute.

The first, third and fourth answers here are entirely non-responsive.

The second answer, the common refrain, is an inherently unworkable proposal. If you put the hazardous capabilities up on the internet, you will then (at least) need to prevent misuse of those capabilities. How are you going to do that? Punishment after the fact? A global dystopian surveillance state? What is the third option?

The flip side is that Guido Reichstadter proposes that we instead shut down all corporate efforts at the frontier. I appreciate people who believe in that saying so. And here are Akash Wasil and Holly Elmore, who are of similar mind, noting that the current bill does not actually have much in the way of teeth.

This is a worry I heard raised previously. Would California’s congressional delegation then want to keep the regulatory power and glory for themselves?

Senator Scott Weiner, who introduced this bill, answered me directly that he would still strongly support federal preemption via a good bill, and that this outcome is ideal. He cannot however speak to other lawmakers.

I am not overly worried about this, but I remain nonzero worried, and do see this as a mark against the bill. Whereas perhaps others might see it as a mark for the bill, instead.

Hopefully this has cleared up a lot of misconceptions about SB 1047, and we have a much better understanding of what the bill actually says and does. As always, if you want to go deep and get involved, all analysis is a complementary good to your own reading, there is no substitute for RTFB (Read the Bill). So you should also do that.

This bill is about future more capable models, and would have had zero impact on every model currently available outside the three big labs of Anthropic, OpenAI and Google Deepmind, and at most one other model known to be in training, Llama-3 400B. If you build a ‘derivative’ model, meaning you are working off of someone else’s foundation model, you have to do almost nothing.

This alone wildly contradicts most alarmist claims.

In addition, if in the future you are rolling your own and build something that is substantially above GPT-4 level, matching the best anyone will do in 2024, then so long as you are behind existing state of the art your requirements are again minimal.

Many others are built on misunderstanding the threshold of harm, or the nature of the requirements, or the penalties and liabilities imposed and how they would be enforced. A lot of them are essentially hallucinations of provisions of a very different bill, confusing this with other proposals that would go farther. A lot of descriptions of the requirements imposed greatly exaggerate the burden this would impose even on future covered models.

If this law poses problems for open weights, it would not be because anything here targets or disfavors open weights, other than calling for weights to be protected during the training process until the model can be tested, as all large labs already do in practice. Indeed, the law explicitly favors open weights in multiple places, rather than the other way around. One of those is the tolerance of a major security problem inherent in open weight systems, the inability to shutdown copies outside one’s control.

The problems would arise because those open weights open up a greater ability to instill or use hazardous capabilities to create catastrophic harm, and you cannot reasonably assure that this is not the case.

That does not mean that this bill has only upside or is in ideal condition.

In addition to a few other minor tweaks, I was able to identify two key changes that should be made to the bill to avoid the possibility of unintentional overreach and reassure everyone. To reiterate from earlier:

  1. We should change the definition of derivative model by adding an 22606(i)(3) to make clear that if a sufficiently large amount of compute (I suggest 25% of original training compute or 10^26 flops, whichever is lower) is spent on additional training and fine-tuning of an existing model, then the resulting model is now non-derivative. The new developer has all the responsibilities of a covered model, and the old developer is no longer responsible.

  2. We should change the comparison baseline om 22602(n)(1) when evaluating difficulty of causing catastrophic harm, inserting words to the effect of adding ‘other than access to other covered models that are known to be safe.’ Instead of comparing to causing the harm without use of any covered model, we should compare to causing the harm without use of any safe covered model that lacks hazardous capability. You then cannot be blamed because a criminal happened to use your model in place of GPT-N, as part of a larger package or for otherwise safe dual use actions like making payroll or scheduling meetings, and other issues like that. In that case, either GPT-N and your model therefore both hazardous capability, or neither does.

With those changes, and minor other changes like indexing the $500 million threshold to inflation, this bill seems to be a mostly excellent version of the bill it is attempting to be. That does not mean it could not be improved further, and I welcome and encourage additional attempts at refinement.

It certainly does not mean we will not want to make changes over time as the world rapidly changes, or that this bill seems sufficient even if passed in identical form at the Federal level. For all the talk of how this bill would supposedly destroy the entire AI industry in California (without subjecting most of that industry’s participants to any non-trivial new rules, mind you), it is easy to see the ways this could prove inadequate to our future safety needs. What this does seem to be is a good baseline from which to gain visibility and encourage basic precautions, which puts us in better position to assess future unpredictable situations.

Q&A on Proposed SB 1047 Read More »

rtfb:-on-the-new-proposed-caip-ai-bill

RTFB: On the New Proposed CAIP AI Bill

Center for AI Policy proposes a concrete actual model bill for us to look at.

Here was their announcement:

WASHINGTON – April 9, 2024 – To ensure a future where artificial intelligence (AI) is safe for society, the Center for AI Policy (CAIP) today announced its proposal for the “Responsible Advanced Artificial Intelligence Act of 2024.” This sweeping model legislation establishes a comprehensive framework for regulating advanced AI systems, championing public safety, and fostering technological innovation with a strong sense of ethical responsibility.

“This model legislation is creating a safety net for the digital age,” said Jason Green-Lowe, Executive Director of CAIP, “to ensure that exciting advancements in AI are not overwhelmed by the risks they pose.”

The “Responsible Advanced Artificial Intelligence Act of 2024” is model legislation that contains provisions for requiring that AI be developed safely, as well as requirements on permitting, hardware monitoring, civil liability reform, the formation of a dedicated federal government office, and instructions for emergency powers.

The key provisions of the model legislation include:

1. Establishment of the Frontier Artificial Intelligence Systems Administration to regulate AI systems posing potential risks.

2. Definitions of critical terms such as “frontier AI system,” “general-purpose AI,” and risk classification levels.

3. Provisions for hardware monitoring, analysis, and reporting of AI systems.

4. Civil + criminal liability measures for non-compliance or misuse of AI systems.

5. Emergency powers for the administration to address imminent AI threats.

6. Whistleblower protection measures for reporting concerns or violations.

The model legislation intends to provide a regulatory framework for the responsible development and deployment of advanced AI systems, mitigating potential risks to public safety, national security, and ethical considerations.

“As leading AI developers have acknowledged, private AI companies lack the right incentives to address this risk fully,” said Jason Green-Lowe, Executive Director of CAIP. “Therefore, for advanced AI development to be safe, federal legislation must be passed to monitor and regulate the use of the modern capabilities of frontier AI and, where necessary, the government must be prepared to intervene rapidly in an AI-related emergency.”

Green-Lowe envisions a world where “AI is safe enough that we can enjoy its benefits without undermining humanity’s future.” The model legislation will mitigate potential risks while fostering an environment where technological innovation can flourish without compromising national security, public safety, or ethical standards. “CAIP is committed to collaborating with responsible stakeholders to develop effective legislation that governs the development and deployment of advanced AI systems. Our door is open.”

I discovered this via Cato’s Will Duffield, whose statement was:

Will Duffield: I know these AI folks are pretty new to policy, but this proposal is an outlandish, unprecedented, and abjectly unconstitutional system of prior restraint.

To which my response was essentially:

  1. I bet he’s from Cato or Reason.

  2. Yep, Cato.

  3. Sir, this is a Wendy’s.

  4. Wolf.

We need people who will warn us when bills are unconstitutional, unworkable, unreasonable or simply deeply unwise, and who are well calibrated in their judgment and their speech on these questions. I want someone who will tell me ‘Bill 1001 is unconstitutional and would get laughed out of court, Bill 1002 has questionable constitutional muster in practice and unconstitutional in theory, we would throw out Bill 1003 but it will stand up these days because SCOTUS thinks the commerce clause is super broad, Bill 1004 is legal as written but the implementation won’t work, and so on. Bonus points for probabilities, and double bonus points if they tell you how likely each bill is to pass so you know when to care.

Unfortunately, we do not have that. We only have people who cry wolf all the time. I love that for them, and thank them for their service, which is very helpful. Someone needs to be in that role, if no one is going to be the calibrated version. Much better than nothing. Often their critiques point to very real issues, as people are indeed constantly proposing terrible laws.

The lack of something better calibrated is still super frustrating.

So what does this particular bill actually do if enacted?

There is no substitute for reading the bill.

I am going to skip over a bunch of what I presume is standard issue boilerplate you use when creating this kind of apparatus, like the rulemaking authority procedures.

There is the risk that I have, by doing this, overlooked things that are indeed non-standard or otherwise worthy of note, but I am not sufficiently versed in what is standard to know from reading. Readers can alert me to what I may have missed.

Each bullet point has a (bill section) for reference.

The core idea is to create the new agency FAISA to deal with future AI systems.

There is a four-tier system of concern levels for those systems, in practice:

  1. Low-concern systems have no restrictions.

  2. Medium-concern systems must be checked monthly for capability gains.

  3. High-concern systems require permits and various countermeasures.

  4. Very high-concern systems will require even more countermeasures.

As described later, the permit process is a holistic judgment based on a set of ruberics, rather than a fixed set of requirements. A lot of it could do with better specification. There is a fast track option when that is appropriate to the use case.

Going point by point:

  1. (4a) Creates the Frontier Artificial Intelligence Systems Administration, whose head is a presidential appointment confirmed by the Senate.

  2. (4b) No one senior in FAIS can have a conflict of interest on AI, including owning any related stocks, or having worked at a frontier lab within three years, and after leaving they cannot lobby for three years and can only take ‘reasonable compensation.’ I worry about revolving doors, but I also worry this is too harsh.

  3. (3u1): Definition: LOW-CONCERN AI SYSTEM (TIER 1).—The terms “low-concern AI system” and “Tier 1” mean AI systems that do not have any capabilities that are likely to pose major security risks. Initially, an AI system shall be deemed low-concern if it used less than 10^24 FLOP during its final training run.

  4. (3u2): Definition: MEDIUM-CONCERN AI SYSTEM (TIER 2). The terms “medium-concern AI system” and “Tier 2” mean AI systems that have a small chance of acquiring at least one capability that could pose major security risks. For example, if they are somewhat more powerful or somewhat less well-controlled than expected, such systems might substantially accelerate the development of threats such as bioweapons, cyberattacks, and fully autonomous artificial agents. Initially, an AI system shall be deemed medium- concern if it used at least 10^24 FLOP during its final training run and it does not meet the criteria for any higher tier. I note, again, that his threshold shows up in such drafts when I think it should have been higher.

  5. (3u3): Definition: HIGH-CONCERN AI SYSTEM (TIER 3).—The terms “high-concern AI system” and “Tier 3” mean AI systems that have at least one capability that could pose major security risks, or that have capabilities that are at or very near the frontier of AI development, and as such pose important threats that are not yet fully understood.

  6. Gemini believes that sections 5-6 grant unusually flexible rulemaking authority, and initially I otherwise skipped those sections. It says “The Act grants the Administrator significant flexibility in rulemaking, including the ability to update technical definitions and expedite certain rules. However, there are also provisions for Congressional review and potential disapproval of rules, ensuring a balance of power.” As we will see later, there are those who have a different interpretation. They can also hire faster and pay 150% of base pay in many spots, which will be necessary to staff well.

  7. If you are ‘low-concern’ you presumably do not have to do anything.

  8. (7) Each person who trains a ‘medium-concern AI’ shall pre-register their training plan, meaning lay out who is doing it, the maximum compute to be spent, the purpose of the AI, the final scores of the AI system on the benchmarks selected by the DAIS, and the location of the training (including cloud services used if any). Then they have to do continuous testing each month, and report in and cease training if they hit 80% on any of the benchmarks in 3(v)(3)(a)(ii), as you are now high concern. I notice that asking for benchmark scores before starting is weird? And also defining a ‘purpose’ of an AI is kind of weird?

The core idea is to divide AI into four use cases: Hardware, Training, Model Weights and Deployment. You need a distinct permit for each one, and a distinct permit for each model or substantial model change for each one, and you must reapply each time, again with a fast track option when the situation abides that.

Each application is to be evaluated and ‘scored,’ then a decision made, with the criteria updated at least yearly. We are given considerations for the selection process, but mostly the actual criteria are left fully unspecified even initially. The evaluation process is further described in later sections.

There are three core issues raised, which are mostly discussed in later sections.

  1. Practicality. How much delay and cost and unreliability will ensue?

  2. Specificity. There is the common complaint that we do not yet know what the proper requirements will be and they will be difficult to change. The solution here is to give the new department the authority to determine and update the requirements as they go. The failure modes of this are obvious, with potential ramp-ups, regulatory capture, outright nonsense and more. The upside of flexibility and ability to correct and update is also obvious, but can we get that in practice from a government agency, even a new one?

  3. Objectivity. Will favored insiders get easy permits, while outsiders or those the current administration dislikes get denied or delayed? How to prevent this?

As always, we have a dilemma of spirit of the rules versus technical rule of law.

To the extent the system works via technical rules, that is fantastic, protecting us in numerous ways. If it works. However, every time I look at a set of technical proposals, my conclusion is at best ‘this could work if they abide by the spirit of the rules here.’ Gaming any technical set of requirements would be too easy if we allowed rules lawyering (including via actual lawyering) to rule the day. Any rules that worked against adversarial labs determined to work around the rules and labs that seem incapable of acting wisely, that are not allowed to ask whether the lab is being adversarial or unwise, will have to be much more restrictive overall to compensate for that, to get the same upsides, and there are some bases that will be impossible to cover in any reasonable way.

To the extent we enforce the spirit of the rules, and allow for human judgment and flexibility, or allow trusted people to adjust the rules on the fly, we can do a lot better on many fronts. But we open ourselves up to those who would not follow the spirit, and force there to be those charged with choosing who can be trusted to what extent, and we risk insider favoritism and capture. Either you can ‘pick winners and losers’ in any given sense or level of flexibility, or you can’t, and we go to regulate with the government we have, not the one we wish we had.

The conclusion of this section has some notes on these dangers, and we will return to those questions in later sections as well.

Again, going point by point:

  1. (8a) What about ‘high-concern AI’? You will need permits for that. Hardware, Training, Model Weights and Deployment are each their own permit. It makes sense that each of these steps is distinct. Each comes with its own risks and responsibilities. That does not speak to whether the burdens imposed here are appropriate.

  2. (8b1) The hardware permit only applies to a specific collection of hardware. If you want to substantially change, add to or supplement that hardware, you need to apply again. It is not a general ‘own whatever hardware you want’ permit. This makes sense if the process is reasonably fast and cheap when no issues are present, but we do need to be careful about that.

  3. (8b2) Similarly the training permit is for a particular system, and it includes this: ‘If the person wishes to add additional features to the AI system that were not included in the original training permit’ then they need to apply for a new permit, meaning that they need to declare in advance what (capabilities and) features will be present, or they need to renew the permit. I also want to know what counts as a feature? What constitutes part of the model, versus things outside the model? Gemini’s interpretation is that for example GPTs would count even if they are achieved purely via scaffolding, and it speculates this goes as far as a new UI button to clear your chat history. Whereas it thinks improving model efficiency or speed, which is of course more safety relevant, would not. This seems like a place we need refinement and clarity, and it was confusing enough that Gemini was having trouble keeping the issues straight.

  4. (8b3) A deployment permit is for the final model version, to a specific set of users. If you ‘substantially change’ the user base or model, you need to reapply. That makes sense for the model, that is the whole point, but I wonder how this would apply to a user base. This would make sense if you have the option to either ask for a fully broad deployment permit or a narrow one, where the narrow one (such as ‘for research’) would hold you to at least some looser standards in exchange.

  5. (8b4) Similarly, your right to possess one set of weights is for only those weights.

  6. In principle, I understand exactly why you would want all of this, once the details are cleaned up a bit. However it also means applying for a bunch of permits in the course of doing ordinary business. How annoying will it be to get them? Will the government do a good job of rubber stamping the application when the changes are trivial, but actually paying attention and making real (but fast) decisions when there is real new risk in the room? Or, rather, exactly how bad at this will we be? And how tightly will these requirements be enforced in practice, and how much will that vary based on whether something real is at stake?

  7. (8c1) There is a grandfather clause for existing systems.

  8. (8c) (there is some confusion here with the section names) Each year by September 1 the Administrator shall review each of the thresholds for high concern in (8a) for adequacy, and fix them if they are not adequate. I notice this should be symmetrical – it should say something like ‘adequate and necessary.’ If a threshold used be needed and now does not make sense, we should fix that.

  9. (8d1) There will be a ‘fast-track’ form of less than two pages. They list examples of who should qualify: Self-driving cars, navigational systems, recommendation engines, fraud detection, weather forecasting, tools for locating mineral deposits, economic forecasting, search engines and image generators. That list starts strong, then by the end I get less confident, an image generator can absolutely do scary stuff with ‘typically no more than thirty words of text.’ So the principle is, specialized systems for particular purposes are exempt, but then we have to ask whether that makes them safe to train? And how we know they only get used in the way you expect or claim? The decision on who gets to fast track, to me, is not mostly about what you use the system for but the underlying capabilities of the system. There should definitely be easy waivers to get of the form ‘what I am doing cannot create any new dangers.’ Or perhaps the point is that if I am fine-tuning GPT-N for my recommendation engine, you should not bother me, and I can see that argument, but I notice I would want to dig more into details here before I feel good. In practice this might mostly be intended for small fine-tuning jobs, which ideally would indeed be fine, but we should think hard about how to make this highly smooth and also ensure no one abuses the loophole. Tricky.

  10. (8d6) Ah, application fees, including exemptions for research, fast track and open source, and ‘support for small business.’ No numbers are specified in terms of what the fee shall be. I am going to go ahead and say that if the fee is large enough that it matters, it is an outrageous fee.

  11. (8e) There need to be rules for ‘how to score each application’ and what it takes to get approved. I notice I worry about the use of ‘score’ at all. I do not want government saying ‘this checked off 7 boxes out of 10 so it gets a +7, and thus deserves a permit.’ I don’t think that works, and it is ripe for abuse and mission creep. I also worry about many other places this threatens to be rather arbitrary. I want a well-defined set of safety and security requirements, whereas as worded we have no idea what we will get in practice.

  12. (8e2) If there is anything on the list that is not required, they have to explain why.

  13. (8e3) Precautions can be required, such as (A) third-party evaluations and audits, (B) penetration testing, (C) compute usage limits, (D) watermarks and (E) other.

  14. (8e4) Mandatory insurance can be required. Yes, please.

  15. (8e5) These rubrics should be updated as needed.

  16. (8f) Now we see how this ‘scoring’ thing works. You get a ‘scoring factor’ for things like the plan for securing liability insurance or otherwise mitigating risks, for your incident detection and reporting plan, your ‘demonstrated ability to forecast capabilities’ (I see it and you do too), and the applicant’s ‘resources, abilities, reputation and willingness to successfully execute the plans described in subsections (g) through (j).

  17. And we all hear that one even louder. I am not saying that there are not advantages to considering someone’s reputation and established abilities when deciding whether to issue a permit, but this is making it clear that the intention is that you are not entitled to this permit merely for following the rules. The government has discretion, and if they don’t feel comfortable with you, or you piss them off, or they have any other reason, then it is no good, no permit for you. And yes, this could absolutely be a prelude to denying Elon Musk a permit, or generally locking out newcomers.

  18. There is an obvious dilemma here. If you have to give everyone who technically qualifies the right to do whatever they want, then you need a system safe to people who ignore the spirit of the rules, who would not follow rules unless you can catch enforce those rules at each step, and who have not proven themselves responsible in any way. But, if you allow this type of judgment, then you are not a system of laws, and we all know what could happen next. So yes, I will absolutely say that the approach taken by implication here makes me uncomfortable. I do not trust the process, and I think as written this calls for too much trust to avoid picking winners and losers.

What are the considerations when evaluating a safety plan? There are some details here that confuse me, but also this is thought out well enough that we can talk details on that level at all.

The broader concern is the idea of this being organized into a scoring system, and how one should holistically evaluate an application. I do think the rubrics themselves are a great start.

  1. (8g) Rubrics for hardware in particular are the plan for KYC and customer controls, for cybersecurity of systems, and guarding against physical theft. Those are good rubrics if implemented objectively with thresholds.

  2. (8h) Rubrics for model weights are awareness of real identities of customers, preventing theft of weights, limiting access to those with proper permits, and the danger level of the weights in question. The middle two make sense. The last one implies a sliding scale for how dangerous the weights are, which implies there should be more than one category under high-risk? It makes sense that there would be multiple categories here, but we should spell it out then. Then the first one feels like a deployment issue? Your ‘customer’ is whoever has the deployment permit, here, so if you don’t need to KYC the ultimate user that is a distinct issue? I sure hope so, if not we need to clean this up.

  3. (8i) Rubrics for training are the extent of

    1. Specifications of maximum intended capabilities.

    2. Extent to which you have explained why that is safe.

    3. Extent to which they have ‘a theory predicting’ how capabilities will develop

    4. The plan for identifying and dealing with any discrepancies from those predictions, including a potential full halt in training, and communication of the anomaly to the government.

    5. A clear maximum compute budget and timeframe and schedule.

    6. Protection against the model escaping during training (!).

    7. A plan for who, both internally and in the government, will get what information when and about what as training proceeds.

    8. A plan for detecting unauthorized access attempts.

  4. I get why one would include each of these things. What I worry about is, again, the whole thing where I gather together tons of expensive resources so I can train and deploy a system, I try to get as ‘high a score’ on everything as I can, and then hope that I get authorized to proceed, without knowing what might put me over the edge. I also worry that many of these things should not be left up to the lab in question to the extent this implies. In any case, I am impressed they went there in many senses, but it feels off. More of these should be clear rules and hard requirements, not point sources, and we should specify more of them.

  5. Also, okay, we are definitely drawing an implied distinction between high-concern and other levels, while being short of what Section 9 deems ‘extremely high-concern’ AI systems. I don’t love the attempt at a continuum.

  6. (8j) Rubrics for deployment

    1. Evidence the system is ‘robustly aligned’ under plausible conditions.

    2. Plan for traceability.

    3. Plan for preventing use, access and reverse engineering in places that lack adequate AI safety legislation.

    4. Plan for avoiding future changes increasing the risks from the systems, such as from fine-tuning, plug-ins, utilities or other modifications.

    5. Plan to monitor the system and if needed shut it down.

    6. Danger that the AI could itself advance AI capabilities, or autonomously survive, replicate or spread.

    7. Direct catastrophic risks such as bioweapons, hacking, and so on.

  7. That is quite the mixed bag. I notice that it is very unclear what it would mean to have a ‘points system’ on these to decide who gets to deploy and who does not, and this carries a lot of risk for the company if they might develop an expensive system and then not be allowed to deploy in a way that is hard to predict.

  8. I do very much appreciate that (e) and (f) are here explicitly, not only (g).

  9. I notice that (d) confuses me, since fine-tuning should require a permit anyway, what it is doing there? And also similar for plug-ins and other modifications, what is the intent here? And how do you really stop this, exactly? And (c) worries me, are we going to not let people be users in countries with ‘inadequate legislation’? If you have adequate precautions in place your users should be able to be wherever they want. There are so many battles this sets us up for down the line.

What about open source models?

Well, how exactly do you propose they fit into the rhuberics we need?

  1. (8k) Considerations for open source frontier models. So there is an obvious problem here, for open source systems. Look at the rubrics for deployment. You are going to get a big fat zero for (b), (c), (d) and (e), and also (a) since people can fine-tune away the alignment. These are impossible things to do with open model weights. In the original there were eight considerations (f combines two of them), so this means you utterly fail five out of eight. If we are taking this seriously, then a ‘high-risk model’ with open model weights must be illegal, period, or what the hell are we even doing.

  2. The response ‘but that’s not fair, make an exception, we said the magic words, we are special, the rules do not apply to us’ is not how life or law works, open model weights past the high-risk threshold are simply a blatant fyou to everything this law is trying to do.

  3. So what to do? (8k) instead offers ‘considerations,’ and calls for ‘fairly considering both the risks and benefits associated with open source frontier AI systems, including both the risk that an open source frontier AI system might be difficult or impossible to remove from the market if it is later discovered to be dangerous, and the benefits that voluntary, collaborative, and transparent development of AI offers society.’

  4. I mean, lol. The rest of the section essentially says ‘but what if this so-called ‘open source’ system was not actually open source, would it be okay then?’ Maybe.

  5. It says (8k3) ‘no automatic determinations.’ You should evaluate the system according to all the rubrics, not make a snap judgment. But have you seen the rubrics? I do not see how a system can be ‘high-risk’ under this structure, and for us to be fine sharing its model weights. Perhaps we could still share its source code, or its data, or even both, depends on details, but not the weights.

  6. That is not because these are bad rubrics. This is the logical consequence of thinking these models are high-concern and then picking any reasonable set of rubrics. They could use improvement of course, but overall they are bad rubrics if and only if you think there is no importantly large risk in the room.

  7. Will open weights advocates scream and yell and oppose this law no matter what? I mean, oh hell yes, there is no compromise that will get a Marc Andreessen or Richard Sutton or Yann LeCun on board and also do the job at hand.

  8. That is because this is a fundamental incompatibility. Some of us want to require that sufficiently capable future AI systems follow basic safety requirements. The majority of those requirements are not things open weights models are capable of implementing, on a deep philosophical level, in a way that open weights advocates see as a feature rather than a bug. The whole point is that anyone can do whatever they want with the software, and the whole point of this bill is to put restrictions on what software you can create and what can be done with that software.

  9. If you think this is untrue, prove me wrong, kids. If open model weights advocates have a plan, even a bad start of a plan, for how to achieve the aims and motivations behind these proposals without imposing such restrictions, none of them have deemed to tell me about them. It seems impossible even in theory, as explained above.

  10. Open weights advocates have arguments for why we should not care about those aims and motivations, why everything will be wonderful anyway and there is no risk in the room. Huge if true, but I find those deeply uncompelling. If you believe there is little underlying catastrophic or existential risk for future frontier AI systems, then you should oppose any version of this bill.

What about those ‘extremely’ high concern systems? What to do then? What even are they? Can the people writing these documents please actually specify at least a for-now suggested definition, even if no one is that close to hitting it yet?

  1. (9) There will be specifications offered for what is an ‘extremely high-concern AI system,’ the definition of which should be created within 12 months of passage, and the deployment requirements for such systems within 30 months. Both are not spelled out here, similarly to how OpenAI and Anthropic both largely have an IOU or TBD where the definitions should be in their respective frameworks.

  2. They do say something about the framework, that it should take into account:

    1. Whether the architecture is fundamentally safe.

    2. Whether they have mathematical proofs the AI system is robustly aligned.

    3. Whether it is ‘inherently unable’ to assist with WMDs.

    4. Whether it is specifically found to be inherently unable to autonomously replicate.

    5. Whether it is specifically found to be inherently unable to accelerate scientific or engineering progress sufficiently to pose national security risks.

  3. I know, I know! Pick me, pick me! The answers are:

    1. No*,

    2. No*,

    3. No,

    4. No

    5. and no.

  4. The asterisk is that perhaps Davidad’s schema will allow a proof in way I do not expect, or we will find a new better architecture. And of course it is possible that your system simply is not that capable and (c), (d) and (e) are not issues, in which case we presumably misclassified your model.

  5. But mostly, no, if your is a ‘extremely high-concern’ system then it is not safe for deployment. I am, instead, extremely concerned. That is the whole point of the name.

  6. Will that change in the future, when we get better techniques for dealing with such systems? I sure hope so, but until that time, not so much.

This is a formal way of saying exactly that. There is a set of thresholds, to be defined later, beyond which no, you are simply not going to be allowed to create or deploy an AI system any time soon.

The problem is that this a place one must talk price, and they put a ‘TBD’ by the price. So we need to worry the price could be either way too high, or way too low, or both in different ways.

The actual decision process is worth highlighting. It introduces random judge selection into the application process, then offers an appeal, followed by anticipating lawsuits. I worry this introduces randomness that is bad for both business and risk, and also that the iterated process is focused on the wrong type of error. You want this type of structure when you worry about the innocent getting punished, whereas here our primary concern about error type is flipped.

  1. (10a) Saying ‘The Administrator shall appoint AI Judges (AIJs)’ is an amusing turn of phrase, for clarity I would change the name, these are supposed to be humans. I indeed worry that we will put AIs in charge of such judgments rather soon.

  2. (10c) Applications are reviewed by randomly selected 3-judge panels using private technical evaluators for help. The application is evaluated within 60 days, but they outright consider the possibility they will lack the capacity to do this? I get that government has this failure mode (see: our immigration system, oh no) but presumably we should be a little less blasé about the possibility. I notice that essentially you apply, then a random group grades you mostly pass/fail (they can also impose conditions or request revisions), and this does not seem like the way you would design a collaborative in-the-spirit process. Can we improve on this? Also I worry about what we would do about resubmissions, where there are no easy answers under a random system.

  3. (11) Yes, you can appeal, and the appeal board is fixed and considers issues de novo when it sees fit. And then, if necessary, the company can appeal to the courts. I worry that this is backwards. In our criminal justice system, we rightfully apply the rule of double jeopardy and provide appeals and other rules to protect defendants, since our top priority is to protect the innocent and the rights of defendants. Here, our top priority should be to never let a model be trained or released in error, yet the companies are the ones with multiple bites at the apple. It seems structurally backwards, we should give them less stringent hurdles but not multiple apple bites, I would think?

There is some very important stuff here. Any time anyone says ‘emergency powers’ or ‘criminal penalties’ you should snap to attention. The emergency powers issues will get discussed in more depth when I handle objections.

  1. (12) Hardware monitoring. Tracking of ‘high performance’ AI hardware. I like specificity. Can we say what counts here?

  2. (13) You shall report to Congress each year and provide statistics.

  3. (14a) AI developers are assigned a duty of care for civil liability, with joint and several liability, private right of action and public right of action, and strict liability, with exceptions for bona fide error, and potential punitive damages. Open source is explicitly not a defense, nor is unforeseeability of misalignment (although also, if you don’t foresee it, let me stop you right there). All the liability bingo cards should be full, this seems very complete and aggressive as written, although that could be wise.

  4. (15b) Criminal felony liability if you ignore an emergency order and fail to take steps within your power to comply, get your permit rejected and train anyway, get approved with conditions and knowingly violate the conditions, knowingly submit false statements on your application, or fraudulently claim intention to do safety precautions.

    1. I note a lot of this involves intent and knowledge. You only go to jail (for 10-25 years no less) if you knowingly break the rules, or outright defy them, and the government will need to prove that. The stakes here will be very high, so you do need to be able to have enforcement teeth. Do they need to be this sharp? Is this too much? Will it scare people off? My guess is this is fine, and no one will actually fear going to jail unless they actually deserve it. You can say ‘oh the engineer who disregarded the conditional approval rules does not deserve a decade in prison’ and in many cases I would agree with you and hopefully they move up the chain as per normal instead, but also if you are actually training an existentially risky model in defiance of the rules? Yeah, I do think that is a pretty big freaking deal.

  5. (15c) Misdemeanor liability here is 6 months to a year (plus fines). I notice this gets weird. (1) is using or improving a frontier model without a permit. So not asking for a permit is a misdemeanor, going through a rejection is a felony? I do not love the incentives there. If you know you are ‘improving’ a frontier model without a permit, then I do not see why you should get off light, although mere use does seem different. Trigger (2) is recklessness with requirements, that results in failure, I don’t love any options on this type of rule. (3) is submitting a knowingly incomplete or misleading application, rather than false, I am not sure how that line is or should be drawn. (4) is intentionally sabotaging a benchmark score in order to get less regulatory scrutiny, and I think that has to be a felony here. This is lying on an application, full stop, maybe worse.

  6. (There are more enforcement rules and crime specifications, they seem standard.)

  7. (16) Emergency powers. The President can declare an emergency for up to a year due to an AI-related national security risk, more than that requires Congress. That allows the President to: Suspend permits, stop actions related to frontier AI, require safety precautions, seize model weights, limit access to hardware, issue a general moratorium or ‘take any other actions consistent with this statutory scheme that the Administrator deems necessary to protect against an imminent major security risk.’

  8. So, basically, full emergency powers related to inhibiting AI, as necessary. I continue to be confused about what emergency powers do and do not exist in practice. Also I do not see a way to deal with a potential actual emergency AI situation that may arise in the future, without the use of emergency powers like this, to stop systems that must be stopped? What is the alternative? I would love a good alternative. More discussion later.

  9. (17) Whistleblower protections, yes, yes, obviously.

  10. (18-20) Standard boiler-plate, I think.

  11. (21) There is some very strong language in the severability clause that makes me somewhat nervous, although I see why they did it.

I think it is very good that they took the time to write a full detailed bill, so now we can have discussions like this, and talk both price and concrete specific proposals.

What are the core ideas here?

  1. We should monitor computer hardware suitable for frontier model training, frontier model training runs, the stewardship of resulting model weights and how such models get deployed.

  2. We should do this when capability thresholds are reached, and ramp up the amount of monitoring as those thresholds get crossed.

  3. At some point, models get dangerous enough we should require various precautions. You will need to describe what you will do to ensure all this is a safe and wise thing to be doing, and apply for a permit.

  4. As potential capabilities, so do the safety requirements and your responsibilities. At some further point, we do not know a way to do this safely, so stop.

  5. Those rules should be adjusted periodically to account for technological developments, and be flexible and holistic, so they do not become impossible to change.

  6. There should be criminal penalties for openly and knowingly defying all this.

  7. Given our options and the need to respond quickly to events, we should leave these decisions with broad discretion to an agency, letting it respond quickly, with the head appointed by the President, with the advice and consent of the Senate.

  8. The President should be able to invoke emergency powers to stop AI activity, if he believes there is an actual such emergency.

  9. Strict civil liability in all senses for AI if harm ensues.

  10. Strong whistleblower protections.

  11. We should do this via a new agency, rather than doing it inside an existing one.

I strongly agree with #1, #2, #3, #4, #5, #6 and #10. As far as I can tell, these are the core of any sane regulatory regime. I believe #9 is correct if we find the right price. I am less confident in #7 and #8, but do not know what a superior alternative would be.

The key, as always, is talking price, and designing the best possible mechanisms and getting the details right. Doing this badly can absolutely backfire, especially if we push too hard and set unreasonable thresholds.

I do think we should be aware of and prepared for the fact that, at some point in the future there is a good chance that the thresholds and requirements will need to be expensive, and impose real costs, if they are to work. But that point is not now, and we need to avoid imposing any more costs than we need to, going too far too fast will only backfire.

The problem is both that the price intended here seems perhaps too high too fast, and also that it dodges much talking of price by kicking that can to the new agency. There are several points in this draft (such as the 10^24 threshold for medium-concern) where I feel that the prices here are too high, in addition to places where I believe implementation details need work.

There is also #9, civil liability, which I also support as a principle, where one can fully talk price now, and the price here seems set maximally high, at least within the range of sanity. I am not a legal expert here but I sense that this likely goes too far, and compromise would be wise. But also that is the way of model bills.

That leaves the hard questions, #7, #8 and #11.

On #7, I would like to offer more guidance and specification for the new agency than is offered here. I do think the agency needs broad discretion to put up temporary barriers quickly, set new thresholds periodically, and otherwise assess the current technological state of play in a timely fashion. We do still have great need for Congressional and democratic oversight, to allow for adjustments and fixing of overreach or insider capture if mistakes get made. Getting the balance right here is going to be tricky.

On #8, as I discuss under objections, what is the alternative? Concretely, if the President decides that an AI system poses an existential risk (or other dire threat to national security), and that threat is imminent, what do you want the President to do about that? What do you think or hope the President would do now? Ask for Congress to pass a law?

We absolutely need, and I would argue already have de facto, the ability to in an emergency shut down an AI system or project that is deemed sufficiently dangerous. The democratic control for that is periodic elections. I see very clear precedent and logic for this.

And yes, I hate the idea of states of emergency, and yes I have seen Lisa Simpson’s TED Talk, I am aware that if you let the government break the law in an emergency they will create an emergency in order to break the law. But I hate this more, not less, when you do it anyway and call it something else. Either the President has the ability to tell any frontier AI project to shut down for now in an actual emergency, or they don’t, and I think ‘they don’t’ is rather insane as an option. If you have a better idea how to square this circle I am all ears.

On #11, this was the one big objection made when I asked someone who knows about bills and the inner workings of government and politics to read the bill, as I note later. They think that the administrative, managerial, expertise and enforcement burdens would be better served by placing this inside an existing agency. This certainly seems plausible, although I would weigh it against the need for a new distinctive culture and the ability to move fast, and the ability to attract top talent. I definitely see this as an open question.

In response to my request on Twitter, Jules Robins was the only other person to take up reading the bill.

Jules Robins: Overall: hugely positive update if this looks like something congress would meaningfully consider as a starting point. I’m not confident that’s the case, but hopefully it at least moves the Overton Window. Not quite a silver bullet (I’ll elaborate below), but would be a huge win.

Biggest failings to my eyes are:

1. Heavily reliant on top officials very much embracing the spirit of the assignment. I mean, that was probably always going to be true, but much of the philosophical bent requires lots of further research and rule-making to become effective.

2. Doesn’t really grapple with the reality we may be living in (per the recent Google paper) where you can train a frontier model without amassing a stock of specialized compute (say, SETI style). Ofc that’s only newly on most radars, and this was in development long before that.

Other odds and ends: Structure with contention favoring non-permiting is great here. As is a second person in the organization with legal standing to contest an agency head not being cautious enough.

Some checks on power I’d rather not have given this already only works with aligned officials (e.g. Deputy Administrator for Public Interest getting stopped by confidentiality, relatively light punishments for some violations that could well be pivotal)

Model tiering leaves a potentially huge hole: powerful models intended for a narrow task that may actually result in broad capabilities to train on. (e.g. predicting supply & demand is best done by forecasting Earth-system wide futures).

So all-in-all, I’d be thrilled if we came out with something like this, but it’d require a lot more work put in by the (hopefully very adept people) put in charge.

Were ~this implemented, there would be potential for overreach. There are likely better mitigations than the proposal has, but I doubt you can make a framework that adapts to the huge unknowns of what’s necessary for AGI safety without broad enough powers to run overreach risk.

This mostly was the measured, but otherwise the opposite of the expected responses from the usual objectors. Jules saw that this bill is making a serious attempt to accomplish its mission, but that there are still many ways it could fail to work, and did not focus on the potential places there could be collateral damage or overreach of various kinds.

Indeed, there are instead concerns that the checks on power that are here could interfere, rather than worrying about insufficient checks on power. The right proposal should raise concerns in both directions.

But yes, Jules does notice that if this exact text got implemented, there are some potential overreaches.

The spirit of the rules point is key. Any effort without the spirit of actually creating safety driving actions is going to have a bad time unless you planned to route around that, and this law does not attempt to route around that.

I did notice the Google paper referenced here, and I am indeed worried that we could in time lose our ability to monitor compute in this way. If that happens, we are in even deeper trouble, and all our options get worse. However I anticipate that the distributed solution will be highly inefficient, and difficult to scale on the level of actually dangerous models for some time. I think for now we proceed anyway, and that this is not yet our reality.

I definitely thought about the model purpose loophole. It is not clear that this would actually get you much of a requirement discount given my reading, but it is definitely something we will need to watch. The EU’s framework is much worse here.

The bill did gave its critics some soft rhetorical targets, such as the severability clause, which I didn’t bother reading assuming it was standard until Matt Mittlesteadt pointed it out. The provision definitely didn’t look good when I first read it, either:

Matt Mittelsteadt: This is no joke. They even wrote the severability clause to almost literally say ‘AI is too scary to follow the constitution and therefore this law can’t be struck by the courts.’

Here is the clause itself, in full:

The primary purpose of this Act is to reduce major security risks from frontier AI systems. Moreover, even a short interruption in the enforcement of this Act could allow for catastrophic harm.

Therefore, if any portion or application of this Act is found to be unconstitutional, the remainder of the Act shall continue in effect except in so far as this would be counterproductive for the goal of reducing major security risks.

Rather than strike a portion of the Act in such a way as to leave the Act ineffective, the Courts should amend that portion of the Act so as to reduce major security risks to the maximum extent permitted by the Constitution.

Then I actually looked at the clause and thought about it, and it made a lot more sense.

The first clause is a statement of intent and an observation of fact. The usual suspects will of course treat it as scaremongering but in the world where this Act is doing good work this could be very true.

The second clause is actually weaker than a standard severability clause, in a strategic fashion. It is saying, sever, but only sever if that would help reduce major security risks. If severing would happen in a way that would make things worse than striking down more of the law, strike down more on that basis. That seems good.

The third clause is saying that if a clause is found unconstitutional, then rather than strike even that clause, they are authorized to modify that clause to align with the rest of the law as best they can, given constitutional restrictions. Isn’t that just… good? Isn’t that what all laws should say?

So, for example, there was a challenge to the ACA’s individual mandate in 2012 in NFIB v. Sebelius. The mandate was upheld on the basis that it was a tax. Suppose that SCOTUS had decided that it was not a tax, even though it was functionally identical to a tax. In terms of good governance, the right thing to do is to say ‘all right, we are going to turn it into a tax now, and write new law, because Congress has explicitly authorized us to do that in this situation in the severability provision of the ACA.’ And then, if Congress thinks that is terrible, they can change the law again. But I am a big fan of ‘intent wins’ and trying to get the best result. Our system of laws does not permit this by default, but if legal I love the idea of delegating this power to the courts, presumably SCOTUS. Maybe I am misunderstanding this?

So yeah, I am going to bite the bullet and say this is actually good law, even if its wording may need a little reworking.

Next we have what appears to me to be an attempted inception from Jeremiah Johnson, saying the bill is terrible and abject incompetence that will only hurt the cause of enacting regulations, in the hopes people will believe this and make it true.

I evaluated this claim by asking someone I know who works on political causes not related to AI, with a record of quietly getting behind the scenes stuff done, to read the bill without giving my thoughts, to get a distinct opinion.

The answer came back that this was that this was indeed a very professionally drafted, well thought out bill. Their biggest objection was that they thought it was a serious mistake to make this a new agency, rather than put it inside an existing one, due to the practical considerations of logistics, enforcement and ramping up involved. Overall, they said that this was ‘a very good v1.’

Not that this ever stops anyone.

Claiming the other side is incompetent and failing and they have been ‘destroyed’ or ‘debunked’ and everyone hates them now is often a highly effective strategy. Even I give pause and get worried there has been a huge mistake, until I do what almost no one ever does, and think carefully about the exact claims involved and read the bill. And that’s despite having seen this playbook in action many times.

Notice that Democrats say this about Republicans constantly.

Notice that Republicans say this about Democrats constantly.

So I do not expect them to stop trying it, especially as people calibrate based on past reactions. I expect to hear this every time, with every bill, of any quality.

Then we have this, where Neil Chilson says:

Neil Chilson (Head of AI Policy at Abundance Institute): There is a new AI proposal from @aipolicyus. It should SLAM the Overton window shut.

It’s the most authoritarian piece of tech legislation I’ve read in my entire policy career (and I’ve read some doozies).

Everything in the bill is aimed at creating a democratically unaccountable government jobs program for doomers who want to regulate math.

I mean, just check out this section, which in a mere six paragraphs attempts to route around any potential checks from Congress or the courts.

You know you need better critics when they pull out ‘regulate math’ and ‘government jobs program’ at the drop of a hat. Also, this is not how the Overton Window works.

But I give him kudos for both making a comparative claim, and for highlighting the actual text of the bill that he objects to most, in a section I otherwise skipped. He links to section 6, which I had previously offloaded to Gemini.

Here is what he quotes, let’s check it in detail, that is only fair, again RTFB:

f) CONGRESSIONAL REVIEW ACT.

(1) The Administrator may make a determination pursuant to 5 U.S.C. §801(c) that a rule issued by the Administrator should take effect without further delay because avoidance of such delay is necessary to reduce or contain a major security risk. If the Administrator makes such a determination and submits written notice of such determination to the Congress, then a rule that would not take effect by reason of 5 U.S.C. §801(a)(3) shall nevertheless take effect. The exercise of this authority shall have no effect on the procedures of 5 U.S.C. § 802 or on the effect of a joint Congressional resolution of disapproval.

So as I understand it, normally any new rule requires a 60 day waiting period before being implemented under 5 U.S.C. §801(a)(3), to allow for review or challenge. This is saying that, if deemed necessary, rules can be changed without this waiting period, while still being subject to the review and potentially be paired back.

Also my understanding is that the decision here of ‘major security risk’ is subject to judicial review. So this does not prevent legal challenges or Congressional challenges to the new rule. What it does do is it allows stopping activity by default. That seems like a reasonable thing to be able to do in context?

(2) Because of the rapidly changing and highly sensitive technical landscape, a rule that appears superficially similar to a rule that has been disapproved by Congress may nevertheless be a substantially different rule. Therefore, a rule issued under this section that varies at least one material threshold or material consequence by at least 20% from a previously disapproved rule is not “substantially the same” under 5 U.S.C. § 802(b)(2).

This is very much pushing it. I don’t like it. I think here Neil has a strong point.

I do agree that rules that appear similar can indeed not be substantially similar, and that the same rule rejected before might be very different now.

But changing a ‘penalty’ by 20% and saying you changed the rule substantially? That’s clearly shenanigans, especially when combined with (1) above.

The parties involved should not need such a principle. They should be able to decide for themselves what ‘substantially similar’ means. Alas, this law did not specify how any of this works, there is no procedure, it sounds like?

So there is a complex interplay involved, and everything is case-by-case and courts sometimes intervene and sometimes won’t, which is not ideal.

I think this provision should be removed outright. If the procedure for evaluating this is so terrible it does not work, then we should update 5 U.S.C. § 802(b)(2) with a new procedure. Which it sounds like we definitely should do anyway.

If an agency proposes a ‘substantially similar’ law to Congress, here or elsewhere, my proposed new remedy is that it should need to be noted in the new rule that it may be substantially similar to a previous proposal that was rejected. Congress can then stamp it ‘we already rejected this’ and send it back. Or, if they changed their minds for any reason, an election moved the majority or a minor tweak fixes their concerns, they can say yes the second time. The law should spell this out.

(g) MAJOR QUESTIONS DOCTRINE. It is the intent of Congress to delegate to the Administration the authority to mitigate the major security risks of advanced, general-purpose artificial intelligence using any and all of the methods described in this Act. The Administration is expected and encouraged to rapidly develop comparative expertise in the evaluation of such risks and in the evaluation of the adequacy of measures intended to mitigate these risks. The Administration is expressly authorized to make policy judgments regarding which safety measures are necessary in this regard. This Act shall be interpreted broadly, with the goal of ensuring that the Administration has the flexibility to adequately discharge its important responsibilities.

If you think we have the option to go back to Congress as the situation develops to make detailed decisions on how to deal with future general-purpose AI security threats, either you do not think we will face such threats, or you think Congress will be able to keep up, you are fine not derisking or you have not met Congress.

That does not mean we should throw out rule of law or the constitution, and give the President and whoever he appoints unlimited powers to do what they want until Congress manages to pass a law to change that (which presumably will never happen). Also that is not what this provision would do, although it pushes in that direction.

Does this language rub us all the wrong way? I hope so, that is the correct response to the choices made here. It seems expressly designed to give the agency as free a hand as possible until such time as Congress steps in with a new law.

The question is whether that is appropriate.

(h) NO EFFECT ON EMERGENCY POWERS. Nothing in this section shall be construed to limit the emergency powers granted by Section 11.

Yes, yes, ignore.

Finally we have this:

(i) STANDARD FOR REVIEW. In reviewing a rule promulgated under this Act that increases the strictness of any definition or scoring criterion related to frontier AI, a court may not weaken or set aside that rule unless there is clear and convincing evidence of at least one of the following

(1) doing so will not pose major security risks, or

(2) the rule exceeded the Administrator’s authority.

That doesn’t sound awesome. Gemini thinks that courts would actually respect this clause, which initially surprised me. My instinct was that a judge would laugh in its face.

I do notice that this is constructed narrowly. This is specifically about changing strictness of definitions towards being more strict. I am not loving it, but also the two clauses here to still allow review seem reasonable to me, and if they go too far the court should strike whatever it is down anyway I would assume.

The more I look at the detailed provisions here, the more I see very thoughtful people who have thought hard about the situation, and are choosing very carefully to do a specific thing. The people objecting to the law are objecting exactly because the bill is well written, and is designed to do the job it sets out to do. Because that is a job that they do not want to see be done, and they aim to stop it from happening.

There are also legitimate concerns here. This is only a model bill, as noted earlier there is still much work to do, and places where I think this goes too far, and other places where if such a bill did somehow pass no doubt compromises will happen even if they aren’t optimal.

But yes, as far as I can tell this is a serious, thoughtful model bill. That does not mean it or anything close to it will pass, or that it would be wise to do so, especially without improvements and compromises where needed. I do think the chances of this type of framework happening very much went up.

RTFB: On the New Proposed CAIP AI Bill Read More »

on-the-proposed-california-sb-1047

On the Proposed California SB 1047

California Senator Scott Wiener of San Francisco introduces SB 1047 to regulate AI. I have put up a market on how likely it is to become law.

“If Congress at some point is able to pass a strong pro-innovation, pro-safety AI law, I’ll be the first to cheer that, but I’m not holding my breath,” Wiener said in an interview. “We need to get ahead of this so we maintain public trust in AI.”

Congress is certainly highly dysfunctional. I am still generally against California trying to act like it is the federal government, even when the cause is good, but I understand.

Can California effectively impose its will here?

On the biggest players, for now, presumably yes.

In the longer run, when things get actively dangerous, then my presumption is no.

There is a potential trap here. If we put our rules in a place where someone with enough upside can ignore them, and we never then pass anything in Congress.

So what does it do, according to the bill’s author?

California Senator Scott Wiener: SB 1047 does a few things:

  1. Establishes clear, predictable, common-sense safety standards for developers of the largest and most powerful AI systems. These standards apply only to the largest models, not startups.

  2. Establish CalCompute, a public AI cloud compute cluster. CalCompute will be a resource for researchers, startups, & community groups to fuel innovation in CA, bring diverse perspectives to bear on AI development, & secure our continued dominance in AI.

  3. prevent price discrimination & anticompetitive behavior

  4. institute know-your-customer requirements

  5. protect whistleblowers at large AI companies

@geoffreyhinton called SB 1047 “a very sensible approach” to balancing these needs. Leaders representing a broad swathe of the AI community have expressed support.

People are rightfully concerned that the immense power of AI models could present serious risks. For these models to succeed the way we need them to, users must trust that AI models are safe and aligned w/ core values. Fulfilling basic safety duties is a good place to start.

With AI, we have the opportunity to apply the hard lessons learned over the past two decades. Allowing social media to grow unchecked without first understanding the risks has had disastrous consequences, and we should take reasonable precautions this time around.

As usual, RTFC (Read the Card, or here the bill) applies.

Section 1 names the bill.

Section 2 says California is winning in AI (see this song), AI has great potential but could do harm. A missed opportunity to mention existential risks.

Section 3 22602 offers definitions. I have some notes.

  1. Usual concerns with the broad definition of AI.

  2. Odd that ‘a model autonomously engaging in a sustained sequence of unsafe behavior’ only counts as an ‘AI safety incident’ if it is not ‘at the request of a user.’ If a user requests that, aren’t you supposed to ensure the model doesn’t do it? Sounds to me like a safety incident.

  3. Covered model is defined primarily via compute, not sure why this isn’t a ‘foundation’ model, I like the secondary extension clause: “The artificial intelligence model was trained using a quantity of computing power greater than 10^26 integer or floating-point operations in 2024, or a model that could reasonably be expected to have similar performance on benchmarks commonly used to quantify the performance of state-of-the-art foundation models, as determined by industry best practices and relevant standard setting organizations OR The artificial intelligence model has capability below the relevant threshold on a specific benchmark but is of otherwise similar general capability..”

  4. Critical harm is either mass casualties or 500 million in damage, or comparable.

  5. Full shutdown means full shutdown but only within your possession and control. So when we really need a full shutdown, this definition won’t work. The whole point of a shutdown is that it happens everywhere whether you control it or not.

  6. Open-source artificial intelligence model is defined to only include models that ‘may be freely modified and redistributed’ so that raises the question of whether that is legal or practical. Such definitions need to be practical, if I can do it illegally but can clearly still do it, that needs to count.

  7. Definition (s): [“Positive safety determination” means a determination, pursuant to subdivision (a) or (c) of Section 22603, with respect to a covered model that is not a derivative model that a developer can reasonably exclude the possibility that a covered model has a hazardous capability or may come close to possessing a hazardous capability when accounting for a reasonable margin for safety and the possibility of posttraining modifications.]

    1. Very happy to see the mention of post-training modifications, which is later noted to include access to tools and data, so scaffolding explicitly counts.

Section 3 22603 (a) says that before you train a new non-derivative model, you need to determine whether you can make a positive safety determination.

I like that this happens before you start training. But of course, this raises the question of how you know how it will score on the benchmarks?

One thing I worry about is the concept that if you score below another model on various benchmarks, that this counts as a positive safety determination. There are at least four obvious failure modes for this.

  1. The developer might choose to sabotage performance against the benchmarks, either by excluding relevant data and training, or otherwise. Or, alternatively, a previous developer might have gamed the benchmarks, which happens all the time, such that all you have to do to score lower is to not game those benchmarks yourself.

  2. The model might have situational awareness, and choose to get a lower score. This could be various degrees of intentional on the part of the developers.

  3. The model might not adhere to your predictions or scaling laws. So perhaps you say it will score lower on benchmarks, but who is to say you are right?

  4. The benchmarks might simply not be good at measuring what we care about.

Similarly, it is good to make a safety determination before beginning training, but also if the model is worth training then you likely cannot actually know its safety in advance, especially since this is not only existential safety.

Section 3 22603 (b) covers what you must do if you cannot make the positive safety determination. Here are the main provisions:

  1. You must prevent unauthorized access.

  2. You must be capable of a full shutdown.

  3. You must implement all covered guidance. Okie dokie.

  4. You must implement a written and separate safety and security protocol, that provides ‘reasonable assurance’ that it would ensure the model will have safeguards that prevent critical harms. This has to include clear tests that verify if you have succeeded.

  5. You must say how you are going to do all that, how you would change how you are doing it, and what would trigger a shutdown.

  6. Provide a copy of your protocol and keep it updated.

You can then make a ‘positive safety determination’ after training and testing, subject to the safety protocol.

Section (d) says that if your model is ‘not subject to a positive safety determination,’ in order to deploy it (you can still deploy it at all?!) you need to implement ‘reasonable safeguards and requirements’ that allow you prevent harms and to trace any harms that happen. I worry this section is not taking such scenarios seriously. To not be subject to such determination, the model needs to be breaking new ground in capabilities, and you were unable to assure that it wouldn’t be dangerous. So what are these ‘reasonable safeguards and requirements’ that would make deploying it acceptable? Perhaps I am misunderstanding here.

Section (g) says safety incidents must be reported.

Section (h) says if your positive safety determination is unreasonable it does not count, and that to be reasonable you need to consider any risk that has already been identified elsewhere.

Overall, this seems like a good start, but I worry it has loopholes, and I worry that it is not thinking about the future scenarios where the models are potentially existentially dangerous, or might exhibit unanticipated capabilities or situational awareness and so on. There is still the DC-style ‘anticipate and check specific harm’ approach throughout.

Section 22604 is about KYC, a large computing cluster has to collect the information and check to see if customers are trying to train a covered model.

Section 22605 requires sellers of inference or a computing cluster to provide a transparent, uniform, publicly available price schedule, banning price discrimination, and bans ‘unlawful discrimination or noncompetitive activity in determining price or access.’

I always wonder about laws that say ‘you cannot do things that are already illegal,’ I mean I thought that was the whole point of them already being illegal.

I am not sure to what extent this rule has an impact in practice, and whether it effectively means that anyone selling such services has to be a kind of common carrier unable to pick who gets its limited services, and unable to make deals of any kind. I see the appeal, but also I see clear economic downsides to forcing this.

Section 22606 covers penalties. The fines are relatively limited in scope, the main relief is injunction against and possible deletion of the model. I worry in practice that there is not enough teeth here.

Section 2207 is whistleblower protections. Odd that this is necessary, one would think there would be such protections universally by now? There are no unexpectedly strong provisions here, only the normal stuff.

Section 4 11547.6 tasks the new Frontier Model Division with its official business, including collecting reports and issuing guidance.

Section 5 11547.7 is for the CalCompute public cloud computing cluster. This seems like a terrible idea, there is no reason for public involvement here, also there is no stated or allocated budget. Assuming it is small, it does not much matter.

Sections 6-9 are standard boilerplate disclaimers and rules.

What should we think about all that?

It seems like a good faith effort to put forward a helpful bill. It has a lot of good ideas in it. I believe it would be net helpful. In particular, it is structured such that if your model is not near the frontier, your burden here is very small.

My worry is that this has potential loopholes in various places, and does not yet strongly address the nature of the future more existential threats. If you want to ignore this law, you probably can.

But it seems like a good beginning, especially on dealing with relatively mundane but still potentially catastrophic threats, without imposing an undo burden on developers. This could then be built upon.

Ah, Tyler Cowen has a link on this and it’s… California’s Effort to Strange AI.

Because of course it is. We do this every time. People keep saying ‘this law will ban satire’ or spreadsheets or pictures of cute puppies or whatever, based on what on its best day would be a maximalist anti-realist reading of the proposal, if it were enacted straight with no changes and everyone actually enforced it to the letter.

Dean Ball: This week, California’s legislature introduced SB 1047: The Safe and Secure Innovation for Frontier Artificial Intelligence Systems Act. The bill, introduced by State Senator Scott Wiener (liked by many, myself included, for his pro-housing stance), would create a sweeping regulatory regime for AI, apply the precautionary principle to all AI development, and effectively outlaw all new open source AI models—possibly throughout the United States.

This is a line pulled out whenever anyone proposes that AI be governed by any regulatory regime whatsoever even with zero teeth of any kind. When someone says that someone, somewhere might be legally required to write an email.

At least one of myself and Dean Ball is extremely mistaken about what this bill says.

The definition of covered model seems to me to be clearly intended to apply only to models that are effectively at the frontier of model capabilities.

Let’s look again at the exact definition:

(1) The artificial intelligence model was trained using a quantity of computing power greater than 10^26 integer or floating-point operations in 2024, or a model that could reasonably be expected to have similar performance on benchmarks commonly used to quantify the performance of state-of-the-art foundation models, as determined by industry best practices and relevant standard setting organizations.

(2) The artificial intelligence model has capability below the relevant threshold on a specific benchmark but is of otherwise similar general capability.

That seems clear as day on what it means, and what it means is this:

  1. If your model is over 10^26 we assume it counts.

  2. If it isn’t, but it is as good as state-of-the-art current models, it counts.

  3. Being ‘as good as’ is a general capability thing, not hitting specific benchmarks.

Under this definition, if no one was actively gaming benchmarks, at most three existing models would plausibly qualify for this definition: GPT-4, Gemini Ultra and Claude. I am not even sure about Claude.

If the open source models are gaming the benchmarks so much that they end up looking like a handful of them are matching GPT-4 on benchmarks, then what can I say, maybe stop gaming the benchmarks?

Or point out quite reasonably that the real benchmark is user preference, and in those terms, you suck, so it is fine. Either way.

But notice that this isn’t what the bill does. The bill applies to large models and to any models that reach the same performance regardless of the compute budget required to make them. This means that the bill applies to startups as well as large corporations.

Um, no, because the open model weights models do not remotely reach the performance level of OpenAI?

Maybe some will in the future.

But this very clearly does not ‘ban all open source.’ There are zero existing open model weights models that this bans.

There are a handful of companies that might plausibly have to worry about this in the future, if OpenAI doesn’t release GPT-5 for a while, but we’re talking Mistral and Meta, not small start-ups. And we’re talking about them exactly because they would be trying to fully play with the big boys in that scenario.

Bell is also wrong about the precautionary principle being imposed before training.

I do not see any such rule here. What I see is that if you cannot show that your model will definitely be safe before training, then you have to wait until after the training run to certify that it is safe.

In other words, this is an escape clause. Are we seriously objecting to that?

Then, if you also can’t certify that it is safe after the training run, then we talk precautions. But no one is saying you cannot train, unless I am missing something?

As usual, people such as Ball are imagining a standard of ‘my product could never be used to do harm’ that no one is trying to apply here in any way. That is why any model not at the frontier can automatically get a positive safety determination, which flies in the face of this theory. Then, if you are at the frontier, you have to obey industry standard safety procedures and let California know what procedures you are following. Woe is you. And of course, the moment someone else has a substantially better model, guess who is now positively safe?

The ‘covered guidance’ that Ball claims to be alarmed about does not mean ‘do everything any safety organization says and if they are contradictory you are banned.’ The law does not work that way. Here is what it actually says:

(e) “Covered guidance” means any of the following:

(1) Applicable guidance issued by the National Institute of Standards and Technology and by the Frontier Model Division.

(2) Industry best practices, including relevant safety practices, precautions, or testing procedures undertaken by developers of comparable models, and any safety standards or best practices commonly or generally recognized by relevant experts in academia or the nonprofit sector.

(3) Applicable safety-enhancing standards set by standards setting organizations.

So what that means is, we will base our standards off an extension of NIST’s, and also we expect you to be liable to implement anything that is considered ‘industry best practice’ even if we did not include it in the requirements. But obviously it’s not going to be best practices if it is illegal. Then we have the third rule, which only counts ‘applicable’ standards. California will review them and decide what is applicable, so that is saying they will use outside help.

Also, note the term ‘non-derivative’ when talking about all the models. If you are a derivative model, then you are fine by default. And almost all models with open weights are derivative models, because of course that is the point, distillation and refinement rather than starting over all the time.

So here’s what the law would actually do, as far as I can tell:

  1. If your model is not projected to be state of the art level and it is not over the 10^26 limit no one has hit yet and no one except the big three are anywhere near, this law has only trivial impact upon you, it is a trivial amount of paperwork. Every other business in America and especially the state of California is jealous.

  2. If your model is a derivative of an existing model, you’re fine, that’s it.

  3. If your model you want to train is projected to be state of the art, but you can show it is safe before you even train it, good job, you’re golden.

  4. If your model is projected to be state of the art, and can’t show it is safe before training it, you can still train it as long as you don’t release it and you make sure it isn’t stolen or released by others. Then if you show it is safe or show it is not state of the art, you’re golden again.

  5. If your model is state of the art, and you train it and still don’t know if it is ‘safe,’ and by safe we do not mean ‘no one ever does anything wrong’ we mean things more like ‘no one ever causes 500 million dollars in damages or mass casualties,’ then you have to implement a series of safety protocols (regulatory requirements) to be determined by California, and you have to tell them what you are doing to ensure safety.

  6. You have to have to have abilities like ‘shut down AIs running on computers under my control’ and ‘plausibly prevent unauthorized people from accessing the model if they are not supposed to.’ Which does not even apply to copies of the program you no longer control. Is that is going to be a problem?

  7. You also have to report any ‘safety incidents’ that happen.

  8. Also some ‘pro-innovation’ stuff of unknown size and importance.

Not only does SB 1047 not attempt to ‘strangle AI,’ not only does it not attempt regulatory capture or target startups, it would do essentially nothing to anyone but a handful of companies unless they have active safety incidents. If there are active safety incidents, then we get to know about them, which could introduce liability concerns or publicity concerns, and that seems like the main downside? That people might learn about your failures and existing laws might sometimes apply?

The arguments against such rules often come from the implicit assumption that we enforce our laws as written, reliably and without discretion. Which we don’t. What would happen if, as Eliezer recently joked, the law actually worked the way critics of such regulations claim that it does? If every law was strictly enforced as written, with no common sense used, as they warn will happen? And someone our courts could handle the case loads involved? Everyone would be in jail within the week.

When people see proposals for treating AI slightly more like anything else, and subjecting it to remarkably ordinary regulation, with an explicit and deliberate effort to only target frontier models that are exclusively fully closed, and they say that this ‘bans open source’ what are they talking about?

They are saying that Open Model Weights Are Unsafe and Nothing Can Fix This, and we want to do things that are patently and obviously unsafe, so asking any form of ‘is this safe?’ and having an issue with the answer being ‘no’ is a ban on open model weights. Or, alternatively, they are saying that their business model and distribution plans are utterly incompatible with complying with any rules whatsoever, so we should never pass any, or they should be exempt from any rules.

The idea that this would “spell the end of America’s leadership in AI” is laughable. If you think America’s technology industry cannot stand a whiff of regulation, I mean, do they know anything about America or California? And have they seen the other guy? Have they seen American innovation across the board, almost entirely in places with rules orders of magnitude more stringent? This here is so insanely nothing.

But then, when did such critics let that stop them? It’s the same rhetoric every time, no matter what. And some people seem willing to amplify such voices, without asking whether their words make sense.

What would happen if there was actually a wolf?

On the Proposed California SB 1047 Read More »