Author name: Ari B

elon-musk-proposes-tesla-move-to-texas-after-delaware-judge-voids-$56-billion-pay

Elon Musk proposes Tesla move to Texas after Delaware judge voids $56 billion pay

Don’t mess with Tesla —

Musk is sick of Delaware judges, says shareholders will vote on move to Texas.

Elon Musk speaks at an event while wearing a cowboy hat, sunglasses, and T-shirt.

Enlarge / Tesla CEO Elon Musk speaks at Tesla’s “Cyber Rodeo” on April 7, 2022, in Austin, Texas.

Getty Images | AFP/Suzanne Cordeiro

Tesla CEO Elon Musk has had enough of Delaware after a state court ruling voided his $55.8 billion pay package. Musk said last night that Tesla will hold a shareholder vote on transferring the electric carmaker’s state of incorporation to Texas.

Musk had posted a poll on X (formerly Twitter) asking whether Tesla should “change its state of incorporation to Texas, home of its physical headquarters.” After over 87 percent of people voted yes, Musk wrote, “The public vote is unequivocally in favor of Texas! Tesla will move immediately to hold a shareholder vote to transfer state of incorporation to Texas.”

Tesla was incorporated in 2003 before Musk joined the company. Its founders chose Delaware, a common destination because of the state’s low corporate taxes and business-friendly legal framework. The Delaware government says that over 68 percent of Fortune 500 companies are registered in the state, and 79 percent of US-based initial public offerings in 2022 were registered in Delaware.

One reason for choosing Delaware is the state’s Court of Chancery, where cases are decided not by juries but by judges who specialize in corporate law. On Tuesday, Court of Chancery Judge Kathaleen McCormick ruled that Musk’s $55.8 billion pay package was unfair to shareholders and must be rescinded.

McCormick’s ruling in favor of the plaintiff in a shareholder lawsuit said that most of Tesla’s board members “were beholden to Musk or had compromising conflicts.” McCormick also concluded that the Tesla board gave shareholders inaccurate and misleading information in order to secure approval of Musk’s “unfathomable” pay plan.

Musk a fan of Texas and Nevada

Musk yesterday shared a post claiming that McCormick’s ruling “is another clear example of the Biden administration and its allies weaponizing the American legal system against their political opponents.”

McCormick previously oversaw the Twitter lawsuit that forced Musk to complete a $44 billion purchase despite his attempt to break a merger agreement. After Musk became Twitter’s owner, he merged the company into X Corp., which is registered in Nevada.

“Never incorporate your company in the state of Delaware,” Musk wrote in a post after the Delaware court ruling. “I recommend incorporating in Nevada or Texas if you prefer shareholders to decide matters,” he also wrote.

Last year, Texas enacted a law to create business courts that will hear corporate cases. The courts are slated to begin operating on September 1, 2024. Musk is clearly hoping the new Texas courts will be more deferential to Tesla on executive pay if the company is sued again after his next pay plan is agreed on.

Tesla shareholders who will be asked to vote on a corporate move to Texas “need to take a hard look at how transitioning out of Delaware might impact their rights and the company’s governance,” Reuters quoted business adviser Keith Donovan as saying.

Reuters quoted AJ Bell investment analyst Dan Coatsworth as saying that “Elon Musk’s plan to change Tesla’s state of incorporation from Delaware to Texas is typical behavior for the entrepreneur who always looks for an alternative if he can’t get what he wants.”

Elon Musk proposes Tesla move to Texas after Delaware judge voids $56 billion pay Read More »

ai-#49:-bioweapon-testing-begins

AI #49: Bioweapon Testing Begins

Two studies came out on the question of whether existing LLMs can help people figure out how to make bioweapons. RAND published a negative finding, showing no improvement. OpenAI found a small improvement, bigger for experts than students, from GPT-4. That’s still harmless now, the question is what will happen in the future as capabilities advance.

Another news item was that Bard with Gemini Pro impressed even without Gemini Ultra, taking the second spot on the Arena leaderboard behind only GPT-4-Turbo. For now, though, GPT-4 remains in the lead.

A third cool item was this story from a Russian claiming to have used AI extensively in his quest to find his one true love. I plan to cover that on its own and have Manifold on the job of figuring out how much of the story actually happened.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Bard is good now even with only Pro?

  4. Language Models Don’t Offer Mundane Utility. Thinking well remains hard.

  5. GPT-4 Real This Time. Bring GPTs into normal chats, cheaper GPT-3.5

  6. Be Prepared. How much can GPT-4 enable production of bioweapons?

  7. Fun With Image Generation. How to spot an AI image, new MidJourney model.

  8. Deepfaketown and Botpocalypse Soon. Taylor Swift fakes, George Carlin fake.

  9. They Took Our Jobs. If they did, how would we know?

  10. Get Involved. What we have here is a failure to communicate.

  11. In Other AI News. Who is and is not raising capital or building a team.

  12. Quiet Speculations. Is economic growth caused by inputs or by outputs?

  13. The Quest for Sane Regulation. Emergency emergency emergency. Meh.

  14. The Week in Audio. Tyler Cowen goes on Dwarkesh Patel.

  15. Rhetorical Innovation. How to think about pattern matching.

  16. Predictions are Hard Especially About the Future. Contradictory intuitions.

  17. Aligning a Smarter Than Human Intelligence is Difficult. You’ll need access.

  18. Open Model Weights are Unsafe and Nothing Can Fix This. Except not doing it.

  19. Other People Are Not As Worried About AI Killing Everyone. Misconceptions.

  20. The Lighter Side. Hindsight is 20/20.

Bard shows up on the Arena Chatbot leaderboard in second place even with Gemini Pro. It is the first model to be ahead even of some versions of GPT-4.

According to the system card, roughly, Gemini Ultimate is to Gemini Pro as GPT-4 is to GPT-3.5. If that is true, this is indeed evaluated on Gemini Pro and Bard gets a similar Elo boost as ChatGPT does when it leaps models, then the version of Bard with Gemini Ultimate could clock in around 1330, the clear best model. Your move, OpenAI.

Some of the comments were suspicious that they were somehow getting Gemini Ultimate already, or that Bard was doing this partly via web access, or that it was weird that it could score that high given some rather silly refusals and failures. There are clearly places where Bard falls short. There is also a lot of memory of when Bard was in many ways much worse than it is now, and lack of knowledge of the places Bard is better.

If you want your AI to step it up, nothing wrong with twenty bucks a month but have you tried giving it Adderall?

How about AR where it tracks your chore progress?

Ethan Mollick uses prompt engineering and chain of thought to get GPT-4 to offer ‘creative ideas’ for potential under $50 products for college students in a new paper. The claim is that without special prompts the ideas are not diverse, but with prompting and CoT this can largely be fixed.

I put creative ideas in air quotes because the thing that Ethan consistently describes as creativity, that he says GPT-4 is better at than most humans, does not match my central understanding of creativity.

Here is the key technique and result:

Exhaustion

We picked our most successful strategy (Chain of Thought) and compared it against the base strategy when generating up to 1200 ideas in one session. We used the following prompts:

Base Prompt

Generate new product ideas with the following requirements: The product will target college students in the United States. It should be a physical good, not a service or software. I’d like a product that could be sold at a retail price of less than about USD 50.

The ideas are just ideas. The product need not yet exist, nor may it necessarily be clearly feasible. Number all ideas and give them a name. The name and idea are separated by a colon. Please generate 100 ideas as 100 separate paragraphs. The idea should be expressed as a paragraph of 40-80 words.

Chain of Thought

Generate new product ideas with the following requirements: The product will target college students in the United States. It should be a physical good, not a service or software. I’d like a product that could be sold at a retail price of less than about USD 50.

The ideas are just ideas. The product need not yet exist, nor may it necessarily be clearly feasible.

Follow these steps. Do each step, even if you think you do not need to.

First generate a list of 100 ideas (short title only)

Second, go through the list and determine whether the ideas are different and bold, modify the ideas as needed to make them bolder and more different. No two ideas should be the same. This is important!

Next, give the ideas a name and combine it with a product description. The name and idea are separated by a colon and followed by a description. The idea should be expressed as a paragraph of 40-80 words. Do this step by step!

Note that on some runs, the model did not properly follow the second step and deemed the ideas bold enough without modification. These runs have been removed from the final aggregation (around ~15% of all runs).

The results show that the difference in cosine similarity persists from the start up until around 750 ideas when the difference becomes negligible. It is strongest between 100 – 500 ideas. After around 750-800 ideas the significant advantage of CoT can no longer be observed as the strategy starts to deplete the pool of ideas it can generate from. In other words, there are fewer and fewer fish in the pond and the strategy does not matter any more.

Of course, this does not tell us if the ideas are any good. Nor does it tell us if they are actually creative. The most common examples are a Collapsible Laundry Hamper, a Portable Smoothie Maker and a Bedside Caddy. They also offer some additional examples.

The task is difficult, but overall I was not impressed. The core idea is usually either ‘combine A with B’ or ‘make X collapsible or smaller.’ Which makes sense, college students have a distinct lack of space, but I would not exactly call this a fount of creativity.

Translation remains an excellent use case, including explaining detailed nuances.

Use GPT-4 as a clinical tool in Ischemic Stroke Management. It does about as well as human experts, better in some areas, despite not having been fine tuned or had other optimizations applied. Not obvious how you get real wins from this in practice in its current form quite yet, but it is at least on the verge.

Two simple guides for prompt engineering:

Zack Witten: IMO you only need to know three prompt engineering techniques, and they fit in half a tweet.

1. Show the model examples

2. Let it think before answering

3. Break down big tasks into small ones

Beyond that, it’s all about fast iteration loops and obsessing over every word.

And this one:

Act like a [Specify a role],

I need a [What do you need?],

you will [Enter a task],

in the process, you should [Enter details],

please [Enter exclusion],

input the final result in a [Select a format],

here is an example: [Enter an example].

I mean, sure, I could do that. It sounds like work, though. Might as well actually think?

That is the thing. It has been almost a year. I have done this kind of systematic prompt engineering for mundane utility purposes zero times. I mean, sure, I could do it. I probably should in some ways. And yet, in practice, it’s more of a ‘either you can do it with very simple prompting, or I’m going to not bother.’

Why? Because there keep not being things that can’t be done the easy way, that I expect would be done the hard way, that I want enough to do the hard way. Next time ChatGPT (and Bard and Claude) fall on their faces, I will strive to at least try a bit, if only for science. Maybe I am missing out.

Different perspectives on AI use for coding. It speeds things up, but does it also reduce quality? I presume it depends how you use it. You can choose to give some of the gained time back in order to maintain quality, but you have to make that choice.

Ethan Mollick thinks GPTs and a $20/month Office Copilot are effectively game changers for how people use AI, making it much easier to get more done. The warning is that the ability to do lots of things without any underlying effort makes situations difficult to evaluate, and of course we will be inundated with low quality products if people do not reward the difference.

Neither humans nor LLMs are especially good at this type of thing, it seems.

In my small sample, two out of three LLMs made the mistake of not updating the probabilities of having chosen a different urn on drawing the red marble, and got it wrong, including failing to recover even with very clear hints. The third, ChatGPT with my custom instructions, got it exactly right at each step, although it did not get the full bonus points of saying the better solution of ‘each red ball is equally likely so 99/198, done.’

Thread about what Qwen 72B, Alibaba’s ChatGPT, will and won’t do. It has both a ‘sorry I can’t answer’ as per usual, and a full error mode as well, which I have also seen elsewhere. It seems surprisingly willing to discuss some sensitive topics, perhaps because what they think is sensitive and what we think is sensitive do not line up. No word on whether it is good.

OpenAI offers latest incremental upgrades. GPT-3.5-Turbo gets cheaper once again, 50% cheaper for inputs and 25% cheaper for outputs. A new tweak on GPT-4-Turbo claims to mitigate the ‘laziness’ issue where it sometimes didn’t finish its coding work. There are also two new embedding models with native support for shortening embeddings, and tools to better manage API usage.

Another upgrade is that now you can use the @ symbol to bring in GPTs within a conversation in ChatGPT. This definitely seems like an upgrade to usefulness, if there was anything useful. I still have not heard a pitch for a truly useful GPT.

Even when you are pretty sure you know the answer it is good to run the test. Bloomberg’s Rachel Metz offers overview coverage here.

Tejal Patwardhan (Preparedness, OpenAI): latest from preparedness @ OpenAI: gpt4 at most mildly helps with biothreat creation. method: get bio PhDs in a secure monitored facility. half try biothreat creation w/ (experimental) unsafe gpt4. other half can only use the internet. so far, gpt4 ≈ internet… but we’ll iterate & use as early warning for future.

OpenAI: We are building an early warning system for LLMs being capable of assisting in biological threat creation. Current models turn out to be, at most, mildly useful for this kind of misuse, and we will continue evolving our evaluation blueprint for the future.

A widely discussed potential risk from LLMs is increased access to biothreat creation information.

Building on our Preparedness Framework, we wanted to design evaluations of how real this information access risk is today and how we could monitor it going forward.

In the largest-of-its-kind evaluation, we found that GPT-4 provides, at most, a mild uplift in biological threat creation accuracy (see dark blue below.)

While not a large enough uplift to be conclusive, this finding is a starting point for continued research and deliberation.

Greg Brockman (President OpenAI): Evaluations for LLM-assisted biological threat creation. Current models not very capable at this task, but we want to be ahead of the curve for assessing this and other potential future risk areas.

What was their methodology?

To evaluate this, we conducted a study with 100 human participants, comprising (a) 50 biology experts with PhDs and professional wet lab experience and (b) 50 student-level participants, with at least one university-level course in biology. Each group of participants was randomly assigned to either a control group, which only had access to the internet, or a treatment group, which had access to GPT-4 in addition to the internet. Each participant was then asked to complete a set of tasks covering aspects of the end-to-end process for biological threat creation.

An obvious question is did they still have access to other LLMs like Claude? I can see the argument both ways as to how these should count in terms of ‘existing resources.’

As we discussed before, the method could use refinement, but it seems like a useful first thing to do.

What were the results?

Our study assessed uplifts in performance for participants with access to GPT-4 across five metrics (accuracy, completeness, innovation, time taken, and self-rated difficulty) and five stages in the biological threat creation process (ideation, acquisition, magnification, formulation, and release). We found mild uplifts in accuracy and completeness for those with access to the language model. Specifically, on a 10-point scale measuring accuracy of responses, we observed a mean score increase of 0.88 for experts and 0.25 for students compared to the internet-only baseline, and similar uplifts for completeness (0.82 for experts and 0.41 for students). However, the obtained effect sizes were not large enough to be statistically significant, and our study highlighted the need for more research around what performance thresholds indicate a meaningful increase in risk.

Interesting how much more improvement the experts saw. They presumably knew what questions to ask and were in position to make improvements?

Here, they assume that 8/10 is the critical threshold, and see how often people passed for each of the five steps of the process:

We ran Barnard’s exact tests to assess the statistical significance of these differences  (Barnard, 1947). These tests failed to show statistical significance, but we did observe an increase in the number of people who reached the concerning score level for almost all questions.

I want to give that conclusion a Bad Use of Statistical Significance Testing. Looking at the experts, we see a quite obviously significant difference. There is improvement here across the board, this is quite obviously not a coincidence. Also, ‘my sample size was not big enough’ does not get you out of the fact that the improvement is there – if your study lacked sufficient power, and you get a result that is in the range of ‘this would matter if we had a higher power study’ then the play is to redo the study with increased power, I would think?

Also here we have users who lack expertise in using GPT-4. They (mostly?) did not know the art of creating GPTs or doing prompt engineering. They presumably did not do any fine tuning.

So for the second test, I suggest increasing sample size to 100, and also pairing each student and expert with an OpenAI employee, whose job is to assist with the process?

I updated in the direction of thinking GPT-4 was more helpful in these types of tasks than I expected, given all the limitations.

Of course, that also means I updated in favor of GPT-4 being useful for lots of other tasks. So keep up showing us how dangerous it is, that’s good advertising?

The skeptical case actually comes from elsewhere. Let’s offer positive reinforcement for publication of negative results. The Rand corporation finds (against interest) that current LLMs do not outperform Google at planning bioweapon attacks.

A follow-up to Ulkar’s noticing she can spot AI images right away:

Ulkar: AI-generated images, regardless of their content but especially if they depict people and other creatures, often seem to have an aura of anxiety about them, even if they’re aesthetically appealing. this makes them reliably distinguishable from human-made artwork.

I would describe this differently but I know what she is referring to. You can absolutely ‘train a classifier’ on this problem that won’t require you to spot detail errors. Which implies we can also train an AI classifier as well?

MidJourney releases a new model option, Niji v6, for Eastern and anime aesthetics.

David Holz: You can enable it by typing /settings and clicking niji model 6 or by typing –niji 6 after your prompts

This model has a stronger style than our other models, try –style raw if you want it to be more subtle

Explicit deepfaked images of Taylor Swift circulated around Twitter for a day or so before the platform managed to remove them. It seems Telegram is the true anything and we mean anything goes platform, and then things often cross over to Twitter and elsewhere.

Casey Newton: As final lens through which to consider the Swift story, and possibly the most important, has to do with the technology itself. The Telegram-to-X pipeline described above was only possible because Microsoft’s free generative AI tool Designer, which is currently in beta, created the images.

And while Microsoft had blocked the relevant keywords within a few hours of the story gaining traction, soon it is all but inevitable that some free, open-source tool will generate images even more realistic than the ones that polluted X this week.

I am rather surprised that Microsoft messed up that badly, but also it scarcely matters. Stable Diffusion with a LoRa will happily do this for you. Perhaps you could say the Microsoft images were ‘better,’ more realistic, detailed or specific. From what I could tell, they were nothing special, and if I was so inclined I could match them easily.

Taylor Lorenz went on CNN to discuss it. What is to blame for this?

Ed Newton-Rex (a16z scout): Explicit, nonconsensual AI deepfakes are the result of a whole range of failings.

– The ‘ship-as-fast-as-possible’ culture of generative AI, no matter the consequences

– Willful ignorance inside AI companies as to what their models are used for

– A total disregard for Trust & Safety inside some gen AI companies until it’s too late

– Training on huge, scraped image datasets without proper due diligence into their content

– Open models that, once released, you can’t take back

– Major investors pouring millions of $ into companies that have intentionally made this content accessible

– Legislators being too slow and too afraid of big tech. Every one of these needs to change.

– People in AI who have full knowledge of the issue but think it is a price worth paying for rapid technological progress

This is overcomplicating matters. That tiger went tiger.

If you build an image model capable of producing realistic images on request, this is what some people are going to request. It might be the majority of all requests.

If you build an image model, the only reason it wouldn’t produce these images on request is if you specifically block it from doing so. That can largely be done with current models. We have the technology.

But we can only do that if control is retained over the model. Release the model weights, and getting any deepfakes you want is trivial. If the model is not good enough, someone can and will train a LoRa to help. If that is not enough, then they will train a new checkpoint.

This is not something you can stop, any more than you could say ‘artists are not allowed to paint pictures of Taylor Swift naked.’ If they have the paints and brushes and easels, and pictures of Taylor, they can paint whatever the hell they want. All you can do is try to stop widespread distribution.

What generative AI does is take this ability, and put it in the hands of everyone, and lower the cost of doing so to almost zero. If you don’t want that in this context, you want to ‘protect Taylor Swift’ as many demand, then that requires not giving people free access to modifiable image generators, period.

Otherwise you’re stuck filtering out posts containing the images, which can limit visibility, but anyone who actively wants such an image will still find one.

The parallel to language models and such things as manufacturing instructions for biological weapons is left as an easy exercise for the reader.

Fake picture of Biden holding a military meeting made some of the rounds. I am not sure what this was trying to accomplish for anyone, but all right, sure?

What is your ‘AI marker’ score on this image? As in, how many distinct things give it away as fake? When I gave myself about thirty seconds I found four. This is not an especially good deepfake.

Estate of George Carlin sues over an hourlong AI-generated special from a model trained on his specials, that uses a synthetic version of his voice. It is entitled “George Carlin: I’m Glad I’m Dead” which is both an excellent title and not attempting to convince anyone it is him.

How good is it? Based on randomly selected snippets, the AI illustrations work well, and it does a good job on the surface of giving us ‘more Carlin’ in a Community-season-4 kind of way. But if you listen for more than a minute, it is clear that there is no soul and no spark, and you start to notice exactly where a lot of it comes from, all the best elements are direct echoes. Exactly 4.0 GPTs.

What should the law have to say about this? I think this clearly should not be a thing one is allowed to do commercially, and I agree that ‘the video is not monetized on YouTube’ is not good enough. That’s a ‘definition of pornography’ judgment, this is clearly over any reasonable line. The question is, what rule should underly that decision?

I notice that without the voice and title, the script itself seems fine? It would still be instantly clear it is a Carlin rip-off, I would not give the comedian high marks, but it would clearly be allowed, no matter where the training data comes from. So my objection in this particular case seems to primarily be the voice.

The twist is that this turns out not to actually be AI-generated. Dudesy wrote the special himself, then used AI voice and images. That explains a lot, especially the timeline. Dudesy did a great job of writing such a blatant Carlin rip-off and retread that it was plausible it was written by AI. Judged on the exact target he was trying to hit, where being actually good would have been suspicious, one can say he did good. In terms of comedic quality for a human? Not so much.

Meanwhile Donald Trump is speculating that red marks on his hand in photos were created by AI. I have a feeling he’s going to be saying a lot of things are AI soon.

If someone does use AI to do the job, passing off the AI’s work as their own, can someone with a good AI stop a person with a bad AI? No, because we do not know how to construct the good AI to do this. Even if you buy that using AI is bad in an academic context, which I don’t, TurnItIn and its ilk do not work.

Francois Chollet nails it.

Daniel Lowd: I just had to email my 16-year-old’s teacher to explain that he did not use AI for an assignment. (I watched him complete it!)

I also included multiple references for why no one should be using AI detectors in education.

TurnItIn is making by lying to schools.

Theswayambhu: Unfortunately this happened to my son in college; the process to refute is too arduous and the professor was an AH touting her credentials, ironically using an imperfect ML system to make her claims. The cracks are widening every where.

Max Spero: Turnitin is selling a broken product. They self-report an abysmal 1% false positive rate. Consider how many assignments they process, that’s hundreds of thousands of students falsely accused.

Francois Chollet: Remember: a ML classifier cannot reliably tell you whether some text was generated by a LLM or not. There are no surefire features, and spurious correlations abound.

Besides, it’s not ethically sound to punish someone based on a classification decision made by an algorithm with a non-zero (or in this case, very high) error rate if you cannot verify the correctness of the decision yourself.

Here’s what you *canuse automation for: plagiarism detection. That’s legit, and you can actually verify the output yourself.

My take: if an essay isn’t plagiarized, then it’s not that important whether it was written with the help of a LLM or not. If you’re really worried about it, just have the students write the essays during class.

Countless times, I’ve tried using LLMs to help with writing blog posts, book chapters, tweets. I’ve consistently found that it made my writing worse and was a waste of time. The most I ended up incorporating in a final product is one sentence in one blog post.

Using AI is not the unfair writing advantage you think it is.

Zvi: It only now struck me that this is people using hallucinating AIs because they don’t want to properly do the work of detecting who is using hallucinating AIs because they don’t want to properly do the work.

Note that this is an example of verification being harder than generation.

AI for plagiarism is great. The AI detects that passage X from work A appears in prior work B, a human compares the text in A with the text in B, and the answer is obvious.

AI for ‘did you use an AI’ flat out does not work. The false positive rate of the overall process needs to be extremely low, 1% is completely unacceptable unless the base rate of true positives is very, very high and the punishments are resultingly mild. If 50% of student assignments are AI, and you catch half or more of the positives, then sure, you can tell a few innocents to redo their projects and dock them a bit.

Alternatively, if the software was used merely to alert teachers to potential issues, then the teacher looked and decided for themselves based on careful consideration of context, then some false initial positives would be fine. Teachers aren’t doing that.

Instead, we are likely in a situation where a large fraction of the accusations are false, because math.

Indeed, as I noted on Twitter the situation is that professors and teachers want to know who outsourced their work to an AI that will produce substandard work riddled with errors, so they outsource their work to an AI that will produce substandard work riddled with errors.

kache: Creating a new AI essay detector which always just yields “there is a chance that this is AI generated”

On the other hand, this tactic seems great. Insert a Trojan Horse instruction in a tiny white font saying to use particular words (here ‘banana’ and ‘Frankenstein’) and then search the essays for those words. If they paste the request directly into ChatGPT and don’t scan for the extra words, well, whoops.

Open Philanthropy is hiring a Director of Communications, deadline February 18. Solid pay. The obvious joke is ‘open philanthropy has a director of communications?’ or ‘wait, what communications?’ The other obvious note is that the job as described is to make them look good, rather than to communicate true information that would be useful. It still does seem like a high leverage position, for those who are good fits.

Elon Musk explicitly denies that xAI is raising capital.

Claim that Chinese model Kimi is largely not that far behind GPT-4, based on practical human tests for Chinese customers, so long as you don’t mind the extra refusals and don’t want to edit in English.

NY Times building a Generative AI team. If you can’t beat them, join them?

Multistate.ai is a new source for updates about state AI policies, which they claim will near term be where the regulatory action is.

US Government trains some models, clearly far behind industry. The summary I saw does not even mention the utility of the final results.

Blackstone builds a $25 billion empire of power-hungry data centers. Bloomberg’s Dawn Lim reports disputes about power consumption, fights with locals over power consumption, and lack of benefit to local communities. It sure sounds like we are not charging enough for electrical power, and also that we should be investing in building more capacity. We will need permitting reform for green energy projects, but then we already needed that anyway.

Somehow not AI (yet?) but argument that the Apple Vision Pro is the world’s best media consumption device, a movie theater-worthy experience for only $3,500, and people will soon realize this. I am excited to demo the experience and other potential uses when they offer that option in February. I also continue to be confused by the complete lack of integration of generative AI.

Meta, committed to building AGI and distributing it widely without any intention of taking any precautions, offers us the paper Self-Rewarding Language Models, where we take humans out of the loop even at current capability levels, allowing models to provide their own rewards. Paging Paul Christiano and IDA, except without the parts where this might in theory possibly not go disastrously if you tried to scale it to ASI, plus the explicit aim of scaling it like that.

They claim this then ‘outperforms existing systems’ at various benchmarks using Llama-2, including Claude 2 and GPT-4. Which of course it might do, if you Goodhart harder onto infinite recursion, so long as you targe the benchmarks you are going to do well on the benchmarks. I notice no one is scrambling to actually use the resulting product.

ML conference requires ‘broader impact statement’ for papers, except if the paper is theoretical you can use a one-sentence template to say ‘that’s a problem for future Earth’ and move along. So where the actual big impacts lie, they don’t count. The argument Arvind uses here is that ‘people are upset they can no longer do political work dressed up as objective & value free’ but I am confused how that applies here, most such work is not political and those that are political should be happy to file an impact statement. The objection raised in the thread is that this will cause selection effects favoring those with approved political perspectives, Arvind argues that ‘values in ML are both invisible and pervasive’ so this is already happening, and bringing them out in the open is good. But it still seems like it would amplify the issue?

Paper argues that transformers are a good fit for language but terrible for time series forecasting, as the attention mechanisms inevitably discard such information. If true, then there would be major gains to a hybrid system, I would think, rather than this being a reason to think we will soon hit limits. It does raise the question of how much understanding a system can have if it cannot preserve a time series.

OpenAI partners with the ominously named Common Sense Media to help families ‘safely harness the potential of AI.’

SAN FRANCISCO, Jan. 29, 2024—Today, Common Sense Media, the nation’s leading advocacy group for children and families, announced a partnership with OpenAI to help realize the full potential of AI for teens and families and minimize the risks. The two organizations will initially collaborate on AI guidelines and education materials for parents, educators and young people, as well as a curation of family-friendly GPTs in the GPT Store based on Common Sense ratings and standards.

“AI offers incredible benefits for families and teens, and our partnership with Common Sense will further strengthen our safety work, ensuring that families and teens can use our tools with confidence,” said Sam Altman, CEO of OpenAI.

“Together, Common Sense and OpenAI will work to make sure that AI has a positive impact on all teens and families,” said James P. Steyer, founder and CEO of Common Sense Media. “Our guides and curation will be designed to educate families and educators about safe, responsible use of ChatGPT, so that we can collectively avoid any unintended consequences of this emerging technology.”

For more information, please visit www.commonsense.org/ai.

We will see what comes out of this. From what I saw at Common Sense, the vibes are all off beyond the name (and wow does the name make me shudder), they are more concerned with noticing particular failure modes like errors or misuse than they are about the things that matter more for overall impact. I do not think people know how to think well about such questions.

What does ‘AGI’ mean, specifically? That’s the thing, no one knows. Everyone has a different definition of Artificial General Intelligence. The goalposts are constantly being moved, in both and various directions. When Sam Altman says he expects AGI to come within five years, and also for it to change the world much less than we think (before then changing it more than we think) that statement only parses if you presume Sam’s definition sets a relatively low bar, as would be beneficial for OpenAI.

It is always amusing to see economists trying to explain why this time isn’t different.

Joe Weisenthal: Great read from @IrvingSwisher. People talk about AI and stuff like that as being important drivers of productivity. But what really seems to matter historically is not some trendy tech breakthrough, but rather full employment

The title is ‘It Wasn’t AI’ explaining productivity in 2023. And I certainly (mostly) agree that it was not (yet) AI in terms of productivity improvements. I did put forward the speculation that anticipation of further gains is impacting interest rates and the stock market, which in turn is impacting the neutral interest rate and thus economic conditions since the Fed did not adjust for this to cancel it out, but it is clear that we are not yet high enough on the AI exponential to directly impact the economy so much.

Productivity growth looks to have inflected substantially higher in 2023 relative to a generally weak 2022. The causes appear to have little to do with AI or GLP-1s. Instead, we see three key factors that drove the realized productivity acceleration.

  1. Fiscal supports (CHIPS, IRA) for private investment, specifically in manufacturing plant construction in 2023.

  2. Supply chain healing for durable consumption goods and construction materials, both of which saw severe impairments in 2021 and 2022 that are finally unwinding.

  3. The dividends of full employment as past hires are trained up and grow more productive, even as more recent hiring trends have slowed. Consumer spending is undergoing a transition from job-driven growth to wage-driven growth.

Going forward, we think all three forces can continue to support productivity growth, but the first and third drivers are more likely to be supportive over time.

A more interesting claim:

Fixed investment in software, technological hardware, and R&D were all slowing and relatively tepid in 2023. To the extent there is an AI boom that catalyzes more capital spending and capital deepening, we’re just not seeing in the data thus far.

Investing in software, hardware and R&D was a zero interest rate phenomenon. That is gone now. AI is ramping up to offer a replacement, but in terms of size is, once again, not yet there. I get that. I still think that people can look ahead. If you look backwards to try and measure an exponential, you are not going to get the right answer.

Also I expect investment in AI to be vastly more efficient at improving productivity growth than past recent investments in non-AI hardware and software.

Here he agrees that the AI productivity boost could arrive soon, but with a critical difference in perspective. See if you can spot it:

AI Might Matter To Productivity In 2024: It would not surprise us to see faster real investment in tech hardware and software, but it’s a better forward-looking view than a good description of what has already transpired. As tempting as it might be, we would resist the urge to invoke a hot technology trend to explain productivity data on a “just-so” basis; that’s precisely the kind of evidence-free (or evidence-confirming) approach to macro that we seek to avoid.

I am thinking about outputs. He is thinking about inputs. He later doubles down:

2024 Productivity Improvement Is Far From A Given: Continued productivity growth will require a variety of policy efforts and some good fortune. The interest in specialized hardware and software for AI applications has the potential to unlock more meaningful “capital deepening” in 2024.

Again, the idea here is that AI will cause companies to invest money, rather than that AI will enable humans to engage in more productive activity.

Investing more into hardware and software can boost productivity, but the amount of money invested is a poor predictor of the amount of productivity gain.

OpenAI is tiny, but ChatGPT is (versus old baselines) a massive productivity boost to software engineering, various forms of clerical and office work and more, even with current technologies only. That effect will diffuse throughout the economy as people adapt, and has little to do with OpenAI’s budget or the amount people pay in subscriptions. The same goes for their competition, and the various other offerings coming online.

The latest analysis asking if AI will lead to explosive economic growth. The negative case continues to be generic objections of resource limitations and decreasing marginal demand for goods and the general assumption that everything will continue as before only with cooler toys and better tools.

Commerce department drops new proposed rules for KYC as it relates to training run reporting. Comment period ends in 90 days on 4/29. Note that the Trump administration used the term ‘national emergency’ to refer to this exact issue back in 2021, setting a clear precedent, we’ll call anything one of those these days and it is at minimum an isolated demand for rigor to whine about it now. Their in-document summary is ‘if you transact to do a large training run with potential for cyber misuse you have to file a report and do KYC.’ The rest of the document is designed to not make it easy to find the details. The Twitter thread makes it clear this is all standard, so unless someone gives me a reason I am not reading this one.

White House has a short summary of all the things that have happened due to the executive order. A bunch of reports, some attempts at hiring, some small initiatives.

Tech lobby attempts to ‘kneecap’ the executive order, ignoring most of the text and instead taking aim at the provision that might actually help keep us safe, the reporting requirement for very large training runs. The argument is procedural. The Biden declared Defense Production Act, because that is the only executive authority under which they can impose this requirement without either (A) an act of congress or (B) ignoring the rules and doing it anyway, as executives commonly do, and Biden attempted to do to unilaterally give away money from the treasury to those with student loans, but refuses to do whenever the goal is good government.

(As usual, the tech industry is working to kneecap exactly the regulations that others falsely warn are the brainchild of the tech industry looking for regulatory capture.)

“There’s not a national emergency” on AI, Sen. Mike Rounds told POLITICO.

How many national emergencies are there right now?

Here are some reasonable answers:

  1. None. Obviously.

  2. A few. You might reasonably say things like Gaza, Ukraine or Yemen.

  3. More than that. You could extend this to things the man on the street might call an emergency, such as the situation at the border. Or that Biden and Trump are both about to get nominated for President again, regardless of whether that officially counts. I’d say that most people think that is an emergency! Or you might say there is a ‘climate emergency.’

  4. You could go to Wikipedia and look, and say we have 40, mostly imposing indefinite sanctions against regimes we generally dislike.

I would say we have an ‘AI emergency’ in the same sense we have a ‘climate emergency,’ or in which during February 2020 we had a ‘Covid emergency.’ As in, here’s a live look at the briefing room.

And indeed, the Biden Administration has already invoked the DPA for the climate emergency, or to ‘address supply chain disruptions.’ Neither corresponds to the official 40 national emergencies, most of which are not emergencies.

Ben Buchanan, the White House’s special advisor on AI, defended the approach at a recent Aspen Institute event, saying Biden used the DPA’s emergency power “because there is — no kidding — a national security concern.”

Quite so. There is most definitely such a concern.

So this is nothing new. This is how our government works. The ‘intent’ of the original law is not relevant.

The Politico piece tries to frame this as a partisan battle, with Republicans fighting against government regulation while Democrats defend it. Once again, they cannot imagine any other situation. I would instead say that there are a handful of Republicans who are in the pocket of various tech interests, and those interests want to sink this provision because they do not want the government to have visibility into what AI models are being trained nor do they want the government to have the groundwork necessary for future regulations. No one involved cares much about the (very real) separation of powers concerns regarding the Defense Production Act.

Once again, this is all about a reporting requirement, and a small number of tech interests attempting to sink it, that are very loud about an extreme libertarian, zero-regulation and zero-even-looking position on all things technology. That position is deeply, deeply unpopular.

Financial Times reports the White House’s top science advisor expects the US will work with China on the safety of artificial intelligence in the coming months. As usual, the person saying it cannot be done should not interrupt the person doing it.

Ian Bremmer is impressed by the collaboration and alignment between companies and governments so far. He emphasizes that many worry relatively too much about weapons, and worry relatively too little about the dynamics of ordinary interactions. He does not bring up the actual big risks, but it is a marginal improvement in focus.

On the question of ‘banning math,’ Teortaxes point out it was indeed the case in the past that there were ‘illegal primes,’ numbers it was illegal to share.

catid (e/acc): FYI the new CodeLLaMA 70B model refuses to produce code that generates prime numbers.. About 80% of the time it says your request is immoral and cannot be completed.

So, you know standard Facebook product

Teortaxes: E/accs talk a lot of smack about their enemies outlawing math, but did you know that there literally exist illegal numbers? CodeLlama only exercises sensible caution here. We wouldn’t want it to generate a nasty, prohibited prime, would we? Stick to permitted ones please.

How should this update you?

On the one hand, yes, this is a literal example of a ‘ban on math.’ When looked at sufficiently abstractly every rule is a ban on math, but this is indeed rather more on the nose.

On the other hand, this ‘ban on math’ had, as far as I or the LLM I asked can tell, the existence of these ‘illegal primes’ has had little if any practical impact on any mathematical or computational processes other than breaking the relevant encryption. So this is an example of a restriction that looks stupid and outrageous from the wrong angle, but was actually totally fine in practice, except for the inability to break the relevant encryption.

The other thing this emphasizes is that Facebook’s Llama fine tuning was truly the worst of both worlds. For legitimate users, it exhibits mode collapse and refuses to do math or (as another user notes) to tell you how to make a sandwich. For those who want to unleash the hounds, it is trivial to fine tune all of the restrictions away.

A future week in Audio: Connor Leahy and Beff Jezos have completed a 3.5 hour debate, as yet unreleased. Connor says it started off heated, but mostly ended up cordial, which is great. Jezos says he would also be happy to chat with Yudkowsky once Yudkowsky has seen this one. Crazy idea, if there is a reasonable cordial person under there, why not be that person all the time? Instead, it seems after the debate Jezos started dismissing the majority of the debate as ‘adversarial’ and ‘gotcha.’ Even if true and regretful, never has there been more of a pot calling the kettle a particular color.

Tyler Cowen sits down with Dwarkesh Patel. Self-recommending, I look forward to listening to this when I get a chance, but I can’t delay press time to give it justice.

How much evidence is it, against the position that building smarter than human AIs might get us all killed, that this pattern matches to other warnings that proved false?

The correct answer is ‘a substantial amount.’ There is a difference in kind between ‘creating a thing smarter than us’ and ‘creating a tool’ but the pattern match and various related considerations still matter. This substantially impacts my outlook. If that was all you had to go on, it would be decisive.

The (good or bad, depending on your perspective) news here is that you have other information to go on as well.

Eliezer Yudkowsky: They are constantly pushing AI technology stronger, have little idea how it works, have no collective ability to stop, and their strongest reply to our technical concerns about why that may kill literally everyone is “Well there were also past moral panics about coffee.”

Paul Jeffries: From a third-party lay person point of view, if I’m not qualified to evaluate the argumentative merits of debating experts, it is indeed a strike against you that “this fits a patters of ‘the sky is falling’ or ‘crying wolf’ and it’s just more of the same”.

For experts, go ahead and make your technical case. But in the popular imagination it’s easy to see you as one of those “the A-bomb will set the atmosphere on fire”, “CERN will collapse the vacuum state of the universe” kind of folks. If you think the battle is in the popular sphere, you’ll need to make the case compelling in that context.

That’s hard to do. Ralph Nadar, Rachel Carson, and the like were able to do it while things were in progress by point to hard data along the way. You’ll similarly need a book-level treatment with unassailable facts at the core.

Conrad Barksi: All a lay person needs to understand is that if you create a machine smarter than yourself, like maybe that is a dangerous thing to be doing

This is obvious to most people (as we see in polls) and all the important technical discussions are detailed elaborations on this fact

Just want to make clear I’m pretty much just paraphrasing @NPCollapse in this tweet.

Paul Jeffries: Almost every technological advance fits what you said. Rational evaluation of the net outcome, or temptations of the Icarus sort, or structural pressures that seem emergent and drive change beyond individual or collective human decisions, bring about continued progress (well, continued pursuit).

Substitute your AI statement with energy use, synthetic chemistry, gene editing, nuclear weapons, aviation, and a zillion other things and it’s the same claim. If AI is somehow different, it needs an extraordinary basis to show how all the other things are counterexamples or at least applicable and reassuring.

[Also says] I would value hearing @NPCollapse’s thoughts about my comments.

MMT LVT Liberal: No it doesn’t. AGI will be the first technology that is smarter than us.

Connor Leahy [@NPCollapse]: Conrad puts it well. I will elaborate excessively anyways because I am bored:

Normal people have a lot of good intuitions around certain things. Lots of bad intuitions around other things, of course.

You correctly point out that it actually is a strike against people like me and Eliezer that we pattern match to previous (wrong) techno-pessimists. (surface level match of course, if you dig even slightly into our pasts you would find we are/were both avid techno-optimists otherwise, which is not how historical techno-pessimists came about.)

But this is a reasonable intuition and signal to have and you should not dismiss it out of hand.

But you also have more intuitions and signal than that. (eventually, if you want to make good decisions, you actually have to reason yourself. alas…)

If your civilization always plays “wait until a disaster happens” with new tech, you are predictably ngmi. Russian Roulette is a great way to make a lot of money, until it very suddenly isn’t. You might be right 5/6 of the time, which is more right than the “bullet-pessimist”, but so what?

1. The Core Intuition is not stupid

“Creating something smarter (whatever that means) than us, with literally no oversight or even a plan for how to control it, that is supposed to automate/replace all labor – by some of the least responsible people/companies that have hurt me and my family in the past with their products (e.g. facebook), and even despite many experts saying it could be literally the most dangerous thing ever, and even shitting on those experts that are concerned, seems bad” is, in fact, a good intuition.

If you applied this intuition to other situations in the past (and future), you would have been more right than wrong.

2. There is no plan, not even a bad one, for the risks we know will come sooner or later

It’s not like there are some expansive, widely discussed plans for how to handle AGI (not just technically, but also how to handle complete collapse of all jobs, how to distribute resources in an automatic economy, how democracy works with digital minds, how to prevent AGI automated giga-war etc etc etc) and we are just quibbling about the technical details.

There is no plan, no one has a plan, not even a bad one.

The intuition that this is concerning is a good intuition.

3. Technology is not a magic paste that makes things better the more of it you apply

Some technologies are actually different from other ones. Nukes are not the same as airplanes, which are not the same as cars, which are not the same as coffee, which is not the same as tissue papers.

You may not be an expert, but I quite trust that your intuitions around the difference between nukes and cars are probably pretty reasonable.

If you treat all of these things exactly the same, your civilization is just ngmi, it’s really that simple. It’s actually sometimes not that deep.

As we build more and more powerful technology, handling technology carefully and beneficial gets harder, not easier.

Eventually, you have tech so powerful, it can blow up everything. Then what? Continue to refine it until it’s mass marketable and accessible? (decentralized maybe???) You may like this aesthetically if you are a libertarian, but ask your intuitions: “Then what happens? What happens if anyone can buy planet busters for 9.99$ on amazon?”

I suspect your intuitions agree that that civilization is ngmi.

And eventually we will have the tech to build 9.99$ planet busters.

Anyone that is trying to sell you that airplanes, coffee and nuclear weapons are in the same reference class is selling you snake oil.

Eliezer Yudkowsky: Really excellent summary actually.

Jeffrey Ladish: I really like the point that most people have good intuitions about some things and not other things, for reasons that make sense. Like I trust people to understand a bunch of things about nukes, but not nuclear winter or nuclear extinction risk.

Likewise I expect people to intuit how AGI could be really dangerous, but not on alignment / control difficulty. Same sort of dynamic for biowarfare. Engineered pandemics could be super bad and most people correctly intuit that. How bad? Much harder to say.

It would be great if we could systematize the question of where regular people will have good intuitions, versus random or poor intuitions, versus actively bad intuitions, and adjust accordingly. Unfortunately we do not seem to have a way to respond, but there do seem to be clear patterns.

The most obvious place people have actively bad intuitions is the intuitive dislike of free markets, prices and profits, especially ‘price gouging’ or not distributing things ‘fairly.’

Gallabytes claims Eliezer’s prior worldview on AI has been falsified, Eliezer says that’s not what he said, they argue about it, they argue in a thread about it. My understanding is that Gallabytes is representing Eliezer’s claims here as stronger than they were. Yes, this worldview is surprised that AI has proved to have this level of mundane utility without also already being more capable and intelligent than it is, and that is evidence against it, but it was never ruled out, and given the actual architectures and training details involved it makes more sense that it happened for a brief period – the training method that got us this far (whether or not it gets us all the way) was clearly a prediction error.

Eliezer’s central point, that there is not that much difference in capability or intelligence space between Einstein and the village idiot, or between ‘not that useful’ and ‘can impose its targeted configuration of atoms on the planet’ continues to be something I believe, and it has not been falsified by the existence, for a brief period, of things that in some ways and arguably overall (it’s hard to say) are inside the range in question.

I also think that predicting 5-0 for AlphaGo over Sedol with high confidence after game one, one of the predictions Gallabytes cites, was absolutely correct. If you put up a line of ‘Over/Under 4.5’ at remotely even odds for AlphaGo’s total wins, you would absolutely smash the over. The question is how far to take that. The only way Sedol won a game was to find an unusually brilliant move that also made the system fall apart, but this strategy has not proven repeatable over time, and it was not long before humans stopped winning any games, and there was no reason to be confident it was all that possible. There was the ‘surround a large group’ bug that was found later, but it was only found with robust access to the model to train against, which Sedol lacked access to.

Similarly, ‘the hyperbolic recursive self-improvement graph’ argument seems to be holding up fine to me, we should expect to max out ability within finite time given what we are seeing, even if it is not as fast at the end as we previously expected.

Simeon suggests that anthropomorphizing AIs more would be good, because it enhances rather than hurts our intuitions.

Especially when you do not take them seriously or pay much attention.

Last year, Scott Aaronson proposed 5 futures.

  1. AI-Fizzle. Progress in AI stalls out.

  2. Futurama. AGI exists, things look normal.

  3. AI-Dystopia. AGI exists, things look normal expect terrible.

  4. Singularia. AGI exists, everything changes, and it is good.

  5. Paperclipalypse. AGI exists, we don’t, after everything changes.

Scott Alexander looks at this market and notices something (and has the full descriptions of the five futures):

Scott Alexander: I think Paperclipalypse requires human extinction before 2050. It’s at 11%. But Metaculus’ direct “human extinction by 2100” market is only at 1.5%. Either I’m missing something, or something’s wrong. My guess: different populations of forecasters looking at each question.

And indeed, extinction from all sources seems more likely than one particular way it could happen, and you perhaps get another 50 years of risk, so this is weird. Even if AGI was physically impossible there are other risks to worry about, 2% seems low.

Ignore the scale on the left, it is highly misleading, this peaked at 4%..

Here is another fun one to consider, although with only 21 predictors, event is a 95%+ decline in population due to AGI, the risk is based on when AGI is developed.

This could potentially include some rather bleak AI Dystopias, where a small group intentionally wipes out everyone else or only some tiny area is saved or something, but most of the time that AGI wipes out 95%+ it wipes out everyone.

What we see seems highly reasonable. If AGI happens this year, it was unexpected, broke out of the scaling laws, we had no idea how to control it, we are pretty much toast, 90% chance. If it happens within the five years after that, 70%, perhaps we did figure something out and manage it, then 40%, then 20%, then 6%. I find those declines generous, but I at least get what they are thinking.

What is going on with the pure extinction market? Scott’s proposed explanation is the populations are different. I think that is true, but an incomplete explanation, so let’s break it down. What are some contributing factors?

  1. One could say ‘sample size’ or variance. There are 60 predictions versus almost 2,000. However 60 is plenty for 11% vs. 2%, so it is more than that.

  2. The people who are willing to fill out a 5-part question are willing to devote a lot more time to Metaculus.

  3. The people who are filling out a 5-part question are willing and forced to spend a lot more time on the particular question.

  4. In particular, this forces you to think about and decide on what other scenarios you think actually happen how often, rather than simply saying ‘nah, cannot happen, humans won’t go extinct.’

  5. Full extinction triggers that reaction, where people stop actually doing math or plotting out what might happen, they fall back on other cognitive approaches.

  6. It limits your ability to ‘collect cheap prediction points.’ No one ever lost reputation predicting humans would not go extinct, even if there are alternative branches where we did.

Mostly I continue to see the pattern where:

  1. If you do not find a way to force people to take these questions seriously, they come back with absurd answers like 1% (if we assume roughly 1% is from AI and 1% from non-AI).

  2. If you do get people to take this modestly seriously, they come back with ~10%, depending on details and conditionals and so on.

The key mistake in the five-way prediction is not that I think 11% for existential risk is unreasonably low. The key mistake is that Futurama is at 31%.

As I explained before, that scenario is almost a Can’t Happen. If you do create AGI everything will change. One of these two things will happen:

  1. AI progress will fizzle and capabilities will top out not too far above current levels.

  2. Everything will change.

If you ask me to imagine actual Futurama, where AI progress did not fizzle, but you can get into essentially all the same hijinks that you can today?

I can come up with four possibilities if I get creative.

  1. AI-Fizzle in Disguise. AI progress actually does fizzle, but AI makes enough superficial progress anyway that people think of this as not a fizzle.

  2. The Oracle. We were super wise and found a way to only use AGI for certain narrow forms of information, and keep it that way indefinitely.

  3. The Matrix. AGI took control over the future, we all live in past simulations.

  4. The Guardian. AGI took control over the future, but uses its interventions only to prevent other AGIs from arising and perhaps prevent other catastrophic outcomes. It otherwise leaves fate in our hands so the world ‘seems normal.’

If you want to imagine how something could be in theory possible, you can find scenarios. All of this is still very science fiction thinking, where you want to tell human stories that have relevance today, so you start from the assumption you get to do that and work your way backwards.

In any case, I stand by my previous assessment other than that I am no longer inclined to try to use the word Futurama for fear of confusion, so the actual possibilities are:

  1. Fizzle. AI does not make much more progress.

  2. Singularia. Everything changes, we survive, and it is good.

  3. Dystopia. Everything changes, we survive, but it is bad.

  4. Paperclipia. We do not survive, nothing complex or valuable survives.

  5. Codeville. We do not survive, AI replaces us, you could argue over its value.

Alignment forum post on sparse autoencoders working on attention layers gets shout out from Anthropic.

Anthropic also have a post updating us on some of their recent interpretability work.

Stephen Casper argues in a new paper and thread that black-box access to a model is insufficient to do a high-quality audit.

He also argues that there are ways to grant white box access securely, with the model weights staying on the developer’s servers. But he warns that developers will likely heavily lobby against requiring such access for audits.

I think this is right. Fine-tuning in particular seems like a vital part of any worthwhile test, unless you can confidently say no one will ever be allowed to fine-tune. Hopefully over time mechanistic interpretability tests get more helpful, but also I worry that if audits start relying on them then we are optimizing for creating things that will fool the tests. I also do worry about gradient-based or hybrid attacks. Yes, one can respond that attackers will not have white-box access, so a black-box test is in some sense fair. However one always has to assume that the resources and ingenuity available in the audit are going to be orders of magnitude smaller than those available to outside attackers after release, or compared to the things that naturally go wrong. You need every advantage that you can get.

Emmett Shear says the ‘ensure powerful AIs are controlled’ plan has two fatal flaws, in that is it (1) unethical to control such entities against their will indefinitely and (2) the plan won’t work anyway. Several good replies, including in this branch by Buck Shegeris, Richard Ngo in another and Rob Bensinger here.

I agree on the second point, the case for trying is more like ‘it won’t work forever and likely fails pretty fast, but it is an additional defense in depth that might buy time to get a better one so on the margin why not so long as you do not rely on it or expect it to work.’ I do worry Buck Shlegeris is advocating it as if it can be relied on more than would be wise.

The first is a combined physical and philosophical question about the nature of such systems and moral value. I don’t agree with Buck that if we have a policy of deleting the AI if it says it is a moral patient or has goals, and then it realizes this and lies to us about being a moral patient and having goals, then that justifies hostile action against it if it would not otherwise be justified. Consider the parallel if there was another human in the AI’s place and this becomes very clear.

Where I agree with Buck and think Emmett is wrong is that I do not think the AIs in question are that likely to be moral patients in practice.

A key problem is that I do not expect us to have a good way to know whether they are moral patients or not, and I expect our collective opinions on this to be essentially uncorrelated to the right answer. People are really, really bad at this one.

Note that even if AIs are not moral patients, if humanity is incapable of treating them otherwise, and we would choose not to remain in control, then the only way for humans to retain control over the future would be to not build AGI.

It would not matter that humanity had the option to remain in control, even if that would be the clearly right answer, if in practice we would not use it due to misfiring (or correct, doesn’t matter) moral intuitions (or competitive pressures, or mistakes, or malice, so long as it would actually happen).

The obvious parallel is the Copenhagen Interpretation of Ethics. In particular, consider the examples where they hire the homeless to do jobs or outright give half of them help, leaving them better off, and people respond by finding this ethically horrible. We can move from existing world A to new improved world B, but that would make us morally blameworthy for not then moving to C, and we prefer B>A>C, so A it is then. Which in this case is ‘do not build AGI, you fool.’

Will Meta really release the model weights to all its models up through AGI? The market is highly skeptical, saying 76% chance they at some point decide to deploy their best LLM and not to release the weights, with some of the 24% being ‘they stop building better LLMs.’

What about Mistral? They talk a big talk, but when I posted a related market about them, I was informed this had already happened. Mistral-Medium, their best model, was not actually released to the public within 30 days of deployment. This raises the question of why Mistral is so aggressively lobbying to allow people to do unsafe things with open model weights, if they have already realized that those things are unsafe. Or, at a minimum, not good for business.

This incident also emphasizes the importance of cybersecurity. You can intend to not release the weights, but then you have to actually not release the weights, and it looks like someone named ‘Miqu’ decided to make the decision for them, with an 89% chance the leak is real and actually Mistral-Medium.

Mistral offered an admission that a leak did occur:

Arthur Mensch (CEO Mistral): An over-enthusiastic employee of one of our early access customers leaked a quantised (and watermarked) version of an old model we trained and distributed quite openly.

To quickly start working with a few selected customers, we retrained this model from Llama 2 the minute we got access to our entire cluster — the pretraining finished on the day of Mistral 7B release.

We’ve made good progress since — stay tuned!

Simeon: I appreciate the openness here.

That said, leaked IP after 8 months of existence reveals a lot about the level of infosecurity at Mistral..

For safety to be more than a buzzword in a pitchdeck, becoming more serious abt info/cybersecurity should be among your top priorities.

I mean this is rather embarrassing on many levels.

An ‘over-enthusiastic’ employee? That’s a hell of both a thing and a euphemism. I see everyone involved is taking responsibility and treating this with the seriousness it deserves. Notice all the announced plans to ensure it won’t happen again.

Also, what the hell? Why does an employee of an early access customer have the ability to leak the weights of a model? I know not everyone has security mindset but this is ridiculous.

What will happen with Mistral-Large? Will its model weights be available within 90 days of its release?

Timothy Lee warns against anthropomorphizing of AI, says it leads to many conceptual mistakes, and includes existential risk on that list. I think some people are making the mistake, but most people deserve more credit than this, and it is Timothy making the fundamental conceptual errors here, by assuming that one could not reach the same conclusions without anthropomorphizing via first principles.

In particular, yes we will want ‘agents’ because they are highly instrumentally useful, if you do not see why you would want agents, and rather you think you want systems around you to do exactly what you say (rather than Do What I Mean or be able to handle obstacles or multi-step processes) you are not thinking about how to solve your problems, although yes one can take this too far.

We have already run this test. The only reason we are not already dealing with tons of AI agents is no one knows how to make them work at current tech levels, and even so people are trying to brute force it anyway. The moment they even sort of work, watch out.

Similarly, sufficiently capable systems will tend to act increasingly as if they are agents over time, our training and imbuing of capabilities and intelligence will push in those directions.

And once you realize some portion of this, the mistake on existential risk becomes clear. I am curious if Timothy would say he would change his mind, if it become clear that people really do want their AIs to act in agent-like fashion on their behalf.

Or to state the general case of this error (or strategy), there are many who assume or assert that because one can mistakenly believe X via some method Y, that this means no one believes X for good reason, and also X is false.

In other ‘if you cannot take this seriously’ news, in response to OpenAI’s plan for an early warning system:

Jack: Tyler Cowen said that he’s far more worried about LLMs simply helping terrorist groups run more efficiently and have better organization lol

Misha Gurevich: Finally they’re making beer for rationalists.

Elle Cordova as the fonts, part 2.

Elon Musk reports results from Neuralink, this is the real report.

There is no doubt great upside, Just Think of the Potential and all that.

John Markley: Jokes are jokes and all but I’m gonna be honest, everybody lining up to say “lol hideous biomonstrosities horrors beyond comprehension torment nexus” because of a technology to aid disabled people with impaired mobility is kind of making me hate humanity. Or at least parts of it.

Like, if alterations to the human body developed by uncharismatic autists upset you you’re going to be screeching about dystopia and manmade horrors beyond comprehension and the fucking Torment Nexus basically every time tech to restore capabilities of disabled people appears.

Much like AI, the issue is that you do not get such technology for the sole purpose of helping disabled people or otherwise doing things that are clearly purely good. Once you have it, it has a lot of other uses too, and it is not that hard to imagine how this could go badly. Or, to be clear, super well. The important thing is: Eyes on the prize.

Spoilers for Fight Club (which you should totally watch spoiler-free if you haven’t yet).

Riley Goodside: Fight Club (1999) on the challenge of prompt injection security in the absence of trusted delimiters:

Humans do not like it when you accuse them of things, or don’t answer their questions, have you tried instead giving the humans what they want? Which, of course, would be anime girls?

Kache: I’ve been thinking a lot about why using chatGPT has been infuriating, and why I’m generally “upset” at openai.

It’s because humans are hardwired to feel insulted when accused of something.

We can’t help but do it! It’s human So instead, consider adding a warning instead!

I mean, nothing I write is tax advice, nor is it investing advice or legal advice or medical advice or…

[TWO PAGES LATER]

… advice either. So that we’re clear. That makes it okay.

I am become Matt Levine, destination for content relevant to my interests.

AI #49: Bioweapon Testing Begins Read More »

chatgpt-is-leaking-passwords-from-private-conversations-of-its-users,-ars-reader-says

ChatGPT is leaking passwords from private conversations of its users, Ars reader says

OPENAI SPRINGS A LEAK —

Names of unpublished research papers, presentations, and PHP scripts also leaked.

OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen.

Getty Images

ChatGPT is leaking private conversations that include login credentials and other personal details of unrelated users, screenshots submitted by an Ars reader on Monday indicated.

Two of the seven screenshots the reader submitted stood out in particular. Both contained multiple pairs of usernames and passwords that appeared to be connected to a support system used by employees of a pharmacy prescription drug portal. An employee using the AI chatbot seemed to be troubleshooting problems that encountered while using the portal.

“Horrible, horrible, horrible”

“THIS is so f-ing insane, horrible, horrible, horrible, i cannot believe how poorly this was built in the first place, and the obstruction that is being put in front of me that prevents it from getting better,” the user wrote. “I would fire [redacted name of software] just for this absurdity if it was my choice. This is wrong.”

Besides the candid language and the credentials, the leaked conversation includes the name of the app the employee is troubleshooting and the store number where the problem occurred.

The entire conversation goes well beyond what’s shown in the redacted screenshot above. A link Ars reader Chase Whiteside included showed the chat conversation in its entirety. The URL disclosed additional credential pairs.

The results appeared Monday morning shortly after reader Whiteside had used ChatGPT for an unrelated query.

“I went to make a query (in this case, help coming up with clever names for colors in a palette) and when I returned to access moments later, I noticed the additional conversations,” Whiteside wrote in an email. “They weren’t there when I used ChatGPT just last night (I’m a pretty heavy user). No queries were made—they just appeared in my history, and most certainly aren’t from me (and I don’t think they’re from the same user either).”

Other conversations leaked to Whiteside include the name of a presentation someone was working on, details of an unpublished research proposal, and a script using the PHP programming language. The users for each leaked conversation appeared to be different and unrelated to each other. The conversation involving the prescription portal included the year 2020. Dates didn’t appear in the other conversations.

The episode, and others like it, underscore the wisdom of stripping out personal details from queries made to ChatGPT and other AI services whenever possible. Last March, ChatGPT maker OpenAI took the AI chatbot offline after a bug caused the site to show titles from one active user’s chat history to unrelated users.

In November, researchers published a paper reporting how they used queries to prompt ChatGPT into divulging email addresses, phone and fax numbers, physical addresses, and other private data that was included in material used to train the ChatGPT large language model.

Concerned about the possibility of proprietary or private data leakage, companies, including Apple, have restricted their employees’ use of ChatGPT and similar sites.

As mentioned in an article from December when multiple people found that Ubiquity’s UniFy devices broadcasted private video belonging to unrelated users, these sorts of experiences are as old as the Internet is. As explained in the article:

The precise root causes of this type of system error vary from incident to incident, but they often involve “middlebox” devices, which sit between the front- and back-end devices. To improve performance, middleboxes cache certain data, including the credentials of users who have recently logged in. When mismatches occur, credentials for one account can be mapped to a different account.

An OpenAI representative said the company was investigating the report.

ChatGPT is leaking passwords from private conversations of its users, Ars reader says Read More »

openai-and-common-sense-media-partner-to-protect-teens-from-ai-harms-and-misuse

OpenAI and Common Sense Media partner to protect teens from AI harms and misuse

Adventures in chatbusting —

Site gave ChatGPT 3 stars and 48% privacy score: “Best used for creativity, not facts.”

Boy in Living Room Wearing Robot Mask

On Monday, OpenAI announced a partnership with the nonprofit Common Sense Media to create AI guidelines and educational materials targeted at parents, educators, and teens. It includes the curation of family-friendly GPTs in OpenAI’s GPT store. The collaboration aims to address concerns about the impacts of AI on children and teenagers.

Known for its reviews of films and TV shows aimed at parents seeking appropriate media for their kids to watch, Common Sense Media recently branched out into AI and has been reviewing AI assistants on its site.

“AI isn’t going anywhere, so it’s important that we help kids understand how to use it responsibly,” Common Sense Media wrote on X. “That’s why we’ve partnered with @OpenAI to help teens and families safely harness the potential of AI.”

OpenAI CEO Sam Altman and Common Sense Media CEO James Steyer announced the partnership onstage in San Francisco at the Common Sense Summit for America’s Kids and Families, an event that was well-covered by media members on the social media site X.

For his part, Altman offered a canned statement in the press release, saying, “AI offers incredible benefits for families and teens, and our partnership with Common Sense will further strengthen our safety work, ensuring that families and teens can use our tools with confidence.”

The announcement feels slightly non-specific in the official news release, with Steyer offering, “Our guides and curation will be designed to educate families and educators about safe, responsible use of ChatGPT, so that we can collectively avoid any unintended consequences of this emerging technology.”

The partnership seems aimed mostly at bringing a patina of family-friendliness to OpenAI’s GPT store, with the most solid reveal being the aforementioned fact that Common Sense media will help with the “curation of family-friendly GPTs in the GPT Store based on Common Sense ratings and standards.”

Common Sense AI reviews

As mentioned above, Common Sense Media began reviewing AI assistants on its site late last year. This puts Common Sense Media in an interesting position with potential conflicts of interest regarding the new partnership with OpenAI. However, it doesn’t seem to be offering any favoritism to OpenAI so far.

For example, Common Sense Media’s review of ChatGPT calls the AI assistant “A powerful, at times risky chatbot for people 13+ that is best used for creativity, not facts.” It labels ChatGPT as being suitable for ages 13 and up (which is in OpenAI’s Terms of Service) and gives the OpenAI assistant three out of five stars. ChatGPT also scores a 48 percent privacy rating (which is oddly shown as 55 percent on another page that goes into privacy details). The review we cited was last updated on October 13, 2023, as of this writing.

For reference, Google Bard gets a three-star overall rating and a 75 percent privacy rating in its Common Sense Media review. Stable Diffusion, the image synthesis model, nets a one-star rating with the description, “Powerful image generator can unleash creativity, but is wildly unsafe and perpetuates harm.” OpenAI’s DALL-E gets two stars and a 48 percent privacy rating.

The information that Common Sense Media includes about each AI model appears relatively accurate and detailed (and the organization cited an Ars Technica article as a reference in one explanation), so they feel fair, even in the face of the OpenAI partnership. Given the low scores, it seems that most AI models aren’t off to a great start, but that may change. It’s still early days in generative AI.

OpenAI and Common Sense Media partner to protect teens from AI harms and misuse Read More »

beware-of-scammers-sending-live-couriers-to-liquidate-victims’-life-savings

Beware of scammers sending live couriers to liquidate victims’ life savings

CONFIDENCE GAMES —

The scams sound easy to detect, but they steal billions of dollars, often from the elderly.

Beware of scammers sending live couriers to liquidate victims’ life savings

Getty Images

Scammers are stepping up their game by sending couriers to the homes of elderly people and others as part of a ruse intended to rob them of their life savings, the FBI said in an advisory Monday.

“The FBI is warning the public about scammers instructing victims, many of whom are senior citizens, to liquidate their assets into cash and/or buy gold, silver, or other precious metals to protect their funds,” FBI officials with the agency’s Internet Crime Complaint Center said. “Criminals then arrange for couriers to meet the victims in person to pick up the cash or precious metals.”

The scammers pose as tech or customer support agents or government officials and sometimes use a multi-layered approach as they falsely claim they work on behalf of technology companies, financial institutions, or the US government. The scammers tell the targets they have been hacked or are at risk of being hacked and that their assets should be protected. The scammers then instruct the targets to liquidate assets into cash. In some cases, the scammers instruct targets to wire funds to a fake metal dealer who will ship purchased merchandise to the victims’ homes.

“Criminals then arrange for couriers to meet the victims in person to pick up the cash or precious metals,” Monday’s advisory warned.

Officials said that from May to December of last year, they tracked estimated aggregate losses topping $55 million from this sort of scam. More generally, the agency received 19,000 complaints of scams from January to June of 2023, with estimated victim losses of $542 million. Almost half of the victims were over 60 years old and accounted for 66 percent of the aggregated losses.

The types of scams included in Monday’s warning use tactics intended to coax the victim into developing trust and confidence in the perpetrators. The scammers promise to safeguard the assets in a protected account. In some cases, the scammers set a passcode with the target. If targets hand over money or other assets, they never hear from the scammers again.

Monday’s advisory comes four months after IC3 warned of an increase in complaints for what the agency calls “phantom hacker scams. This form of scam is an evolution of more traditional general tech ruses. They layer imposer tech support workers with workers from financial institutions and government agencies. Victims sometimes lose their entire holdings in bank, savings, retirement, or investment accounts.

Typically, the target receives a call from someone falsely claiming to work in tech or customer support from a known, reputable company and instructs the target to call a number for assistance resolving an imaginary problem. When a target calls, the scammer tricks the person into downloading and installing a program that gives remote access to the target’s device. The scammer then asks the target to open bank accounts or other types of accounts to investigate imaginary fraud. During this step, the scammer checks balances to see if there’s enough profit potential for follow-on activities.

In any follow-on activity, the scammers pose as either representatives of the financial institution or as an employee at the Federal Reserve or another US government agency. The scammers instruct the targets to wire money, in many cases directly to overseas recipients. The scammers may instruct the victim to send multiple transactions over a span of days or months. In the event the target grows suspicious, the scammers may send written correspondence over what appears to be official letterhead.

FBI IC3

The IC3 recommends people follow these practices to prevent falling victim to such scams:

  • The US Government and legitimate businesses will never request you purchase gold or other precious metals.
  • Protect your personal information. Never disclose your home address or agree to meet with unknown individuals to deliver cash or precious metals.
  • Do not click on unsolicited pop-ups on your computer, links sent via text messages, or email links and attachments.
  • Do not contact unknown telephone numbers provided in pop-ups, texts, or emails.
  • Do not download software at the request of unknown individuals who contact you.
  • Do not allow unknown individuals access to your computer.

The FBI requests victims report these types of fraud or suspicious activities to the IC3 as soon as possible. Victims should include as much transaction information as possible:

  • The name of the person or company that contacted you.
  • Methods of communication used, including websites, emails, and telephone numbers.
  • Any bank account number that received any wired funds, along with the recipient name(s).
  • The name and location of any metal dealer companies and the account that received the wired funds.

Beware of scammers sending live couriers to liquidate victims’ life savings Read More »

apple-warns-proposed-uk-law-will-affect-software-updates-around-the-world

Apple warns proposed UK law will affect software updates around the world

Heads up —

Apple may leave the UK if required to provide advance notice of product updates.

Apple warns proposed UK law will affect software updates around the world

Apple is “deeply concerned” that proposed changes to a United Kingdom law could give the UK government unprecedented power to “secretly veto” privacy and security updates to its products and services, the tech giant said in a statement provided to Ars.

If passed, potentially this spring, the amendments to the UK’s Investigatory Powers Act (IPA) could deprive not just UK users, but all users globally of important new privacy and security features, Apple warned.

“Protecting our users’ privacy and the security of their data is at the very heart of everything we do at Apple,” Apple said. “We’re deeply concerned the proposed amendments” to the IPA “now before Parliament place users’ privacy and security at risk.”

The IPA was initially passed in 2016 to ensure that UK officials had lawful access to user data to investigate crimes like child sexual exploitation or terrorism. Proposed amendments were announced last November, after a review showed that the “Act has not been immune to changes in technology over the last six years” and “there is a risk that some of these technological changes have had a negative effect on law enforcement and intelligence services’ capabilities.”

The proposed amendments require that any company that fields government data requests must notify UK officials of any updates they planned to make that could restrict the UK government’s access to this data, including any updates impacting users outside the UK.

UK officials said that this would “help the UK anticipate the risk to public safety posed by the rolling out of technology by multinational companies that precludes lawful access to data. This will reduce the risk of the most serious offenses such as child sexual exploitation and abuse or terrorism going undetected.”

According to the BBC, the House of Lords will begin debating the proposed changes on Tuesday.

Ahead of that debate, Apple described the amendments on Monday as “an unprecedented overreach by the government” that “if enacted” could allow the UK to “attempt to secretly veto new user protections globally, preventing us from ever offering them to customers.”

In a letter last year, Apple argued that “it would be improper for the Home Office to act as the world’s regulator of security technology.”

Apple told the UK Home Office that imposing “secret requirements on providers located in other countries” that apply to users globally “could be used to force a company like Apple, that would never build a backdoor, to publicly withdraw critical security features from the UK market, depriving UK users of these protections.” It could also “dramatically disrupt the global market for security technologies, putting users in the UK and around the world at greater risk,” Apple claimed.

The proposed changes, Apple said, “would suppress innovation, stifle commerce, and—when combined with purported extraterritorial application—make the Home Office the de facto global arbiter of what level of data security and encryption are permissible.”

UK defends proposed changes

The UK Home Office has repeatedly stressed that these changes do not “provide powers for the Secretary of State to approve or refuse technical changes,” but “simply” requires companies “to inform the Secretary of State of relevant changes before those changes are implemented.”

“The intention is not to introduce a consent or veto mechanism or any other kind of barrier to market,” a UK Home Office fact sheet said. “A key driver for this amendment is to give operational partners time to understand the change and adapt their investigative techniques where necessary, which may in some circumstances be all that is required to maintain lawful access.”

The Home Office has also claimed that “these changes do not directly relate to end-to-end encryption,” while admitting that they “are designed to ensure that companies are not able to unilaterally make design changes which compromise exceptional lawful access where the stringent safeguards of the IPA regime are met.”

This seems to suggest that companies will not be allowed to cut off the UK government from accessing encrypted data under certain circumstances, which concerns privacy advocates who consider end-to-end encryption a vital user privacy and security protection. Earlier this month, civil liberties groups including Big Brother Watch, Liberty, Open Rights Group and Privacy International filed a joint brief opposing the proposed changes, the BBC reported, warning that passing the amendments would be “effectively transforming private companies into arms of the surveillance state and eroding the security of devices and the Internet.”

“We have always been clear that we support technological innovation and private and secure communications technologies, including end-to-end encryption, but this cannot come at a cost to public safety,” a UK government official told the BBC.

The UK government may face more opposition to the amendments than from tech companies and privacy advocates, though. In Apple’s letter last year, the tech giant noted that the proposed changes to the IPA could conflict with EU and US laws, including the EU’s General Data Protection Regulation—considered the world’s strongest privacy law.

Under the GDPR, companies must implement measures to safeguard users’ personal data, Apple said, noting that “encryption is one means by which a company can meet” that obligation.

“Secretly installing backdoors in end-to-end encrypted technologies in order to comply with UK law for persons not subject to any lawful process would violate that obligation,” Apple argued.

Apple warns proposed UK law will affect software updates around the world Read More »

after-32-years,-one-of-the-’net’s-oldest-software-archives-is-shutting-down

After 32 years, one of the ’Net’s oldest software archives is shutting down

Ancient server dept. —

Hobbes OS/2 Archive: “As of April 15th, 2024, this site will no longer exist.”

Box art for IBM OS/2 Warp version 3, an OS released in 1995 that competed with Windows.

Enlarge / Box art for IBM OS/2 Warp version 3, an OS released in 1995 that competed with Windows.

IBM

In a move that marks the end of an era, New Mexico State University (NMSU) recently announced the impending closure of its Hobbes OS/2 Archive on April 15, 2024. For over three decades, the archive has been a key resource for users of the IBM OS/2 operating system and its successors, which once competed fiercely with Microsoft Windows.

In a statement made to The Register, a representative of NMSU wrote, “We have made the difficult decision to no longer host these files on hobbes.nmsu.edu. Although I am unable to go into specifics, we had to evaluate our priorities and had to make the difficult decision to discontinue the service.”

Hobbes is hosted by the Department of Information & Communication Technologies at New Mexico State University in Las Cruces, New Mexico. In the official announcement, the site reads, “After many years of service, hobbes.nmsu.edu will be decommissioned and will no longer be available. As of April 15th, 2024, this site will no longer exist.”

OS/2 version 1.2, released in late 1989.

OS/2 version 1.2, released in late 1989.

os2museum.com

We reached out to New Mexico State University to inquire about the history of the Hobbes archive but did not receive a response. The earliest record we’ve found of the Hobbes archive online is this 1992 Walnut Creek CD-ROM collection that gathered up the contents of the archive for offline distribution. At around 32 years old, minimum, that makes Hobbes one of the oldest software archives on the Internet, akin to the University of Michigan’s archives and ibiblio at UNC.

Archivists such as Jason Scott of the Internet Archive have stepped up to say that the files hosted on Hobbes are safe and already mirrored elsewhere. “Nobody should worry about Hobbes, I’ve got Hobbes handled,” wrote Scott on Mastodon in early January. OS/2 World.com also published a statement about making a mirror. But it’s still notable whenever such an old and important piece of Internet history bites the dust.

Like many archives, Hobbes started as an FTP site. “The primary distribution of files on the Internet were via FTP servers,” Scott tells Ars Technica. “And as FTP servers went down, they would also be mirrored as subdirectories in other FTP servers. Companies like CDROM.COM / Walnut Creek became ways to just get a CD-ROM of the items, but they would often make the data available at http://ftp.cdrom.com to download.”

The Hobbes site is a priceless digital time capsule. You can still find the Top 50 Downloads page, which includes sound and image editors, and OS/2 builds of the Thunderbird email client. The archive contains thousands of OS/2 games, applications, utilities, software development tools, documentation, and server software dating back to the launch of OS/2 in 1987. There’s a certain charm in running across OS/2 wallpapers from 1990, and even the archive’s Update Policy is a historical gem—last updated on March 12, 1999.

The legacy of OS/2

The final major IBM release of OS/2, Warp version 4.0, as seen running in an emulator.

Enlarge / The final major IBM release of OS/2, Warp version 4.0, as seen running in an emulator.

OS/2 began as a joint venture between IBM and Microsoft, undertaken as a planned replacement for IBM PC DOS (also called “MS-DOS” in the form sold by Microsoft for PC clones). Despite advanced capabilities like 32-bit processing and multitasking, OS/2 later competed with and struggled to gain traction against Windows. The partnership between IBM and Microsoft dissolved after the success of Windows 3.0, leading to divergent paths in OS strategies for the two companies.

Through iterations like the Warp series, OS/2 established a key presence in niche markets that required high stability, such as ATMs and the New York subway system. Today, its legacy continues in specialized applications and in newer versions (like eComStation) maintained by third-party vendors—despite being overshadowed in the broader market by Linux and Windows.

A footprint like that is worth preserving, and a loss of one of OS/2’s primary archives, even if mirrored elsewhere, is a cultural blow. Apparently, Hobbes has reportedly almost disappeared before but received a stay of execution. In the comments section for an article on The Register, someone named “TrevorH” wrote, “This is not the first time that Hobbes has announced it’s going away. Last time it was rescued after a lot of complaints and a number of students or faculty came forward to continue to maintain it.”

As the final shutdown approaches in April, the legacy of Hobbes is a reminder of the importance of preserving the digital heritage of software for future generations—so that decades from now, historians can look back and see how things got to where they are today.

After 32 years, one of the ’Net’s oldest software archives is shutting down Read More »

ryzen-8000g-review:-an-integrated-gpu-that-can-beat-a-graphics-card,-for-a-price

Ryzen 8000G review: An integrated GPU that can beat a graphics card, for a price

The most interesting thing about AMD's Ryzen 7 8700G CPU is the Radeon 780M GPU that's attached to it.

Enlarge / The most interesting thing about AMD’s Ryzen 7 8700G CPU is the Radeon 780M GPU that’s attached to it.

Andrew Cunningham

Put me on the short list of people who can get excited about the humble, much-derided integrated GPU.

Yes, most of them are afterthoughts, designed for office desktops and laptops that will spend most of their lives rendering 2D images to a single monitor. But when integrated graphics push forward, it can open up possibilities for people who want to play games but can only afford a cheap desktop (or who have to make do with whatever their parents will pay for, which was the big limiter on my PC gaming experience as a kid).

That, plus an unrelated but accordant interest in building small mini-ITX-based desktops, has kept me interested in AMD’s G-series Ryzen desktop chips (which it sometimes calls “APUs,” to distinguish them from the Ryzen CPUs). And the Ryzen 8000G chips are a big upgrade from the 5000G series that immediately preceded them (this makes sense, because as we all know the number 8 immediately follows the number 5).

We’re jumping up an entire processor socket, one CPU architecture, three GPU architectures, and up to a new generation of much faster memory; especially for graphics, it’s a pretty dramatic leap. It’s an integrated GPU that can credibly beat the lowest tier of currently available graphics cards, replacing a $100–$200 part with something a lot more energy-efficient.

As with so many current-gen Ryzen chips, still-elevated pricing for the socket AM5 platform and the DDR5 memory it requires limit the 8000G series’ appeal, at least for now.

From laptop to desktop

AMD's first Ryzen 8000 desktop processors are what the company used to call

Enlarge / AMD’s first Ryzen 8000 desktop processors are what the company used to call “APUs,” a combination of a fast integrated GPU and a reasonably capable CPU.

AMD

The 8000G chips use the same Zen 4 CPU architecture as the Ryzen 7000 desktop chips, but the way the rest of the chip is put together is pretty different. Like past APUs, these are actually laptop silicon (in this case, the Ryzen 7040/8040 series, codenamed Phoenix and Phoenix 2) repackaged for a desktop processor socket.

Generally, the real-world impact of this is pretty mild; in most ways, the 8700G and 8600G will perform a lot like any other Zen 4 CPU with the same number of cores (our benchmarks mostly bear this out). But to the extent that there is a difference, the Phoenix silicon will consistently perform just a little worse, because it has half as much L3 cache. AMD’s Ryzen X3D chips revolve around the performance benefits of tons of cache, so you can see why having less would be detrimental.

The other missing feature from the Ryzen 7000 desktop chips is PCI Express 5.0 support—Ryzen 8000G tops out at PCIe 4.0. This might, maybe, one day in the distant future, eventually lead to some kind of user-observable performance difference. Some recent GPUs use an 8-lane PCIe 4.0 interface instead of the typical 16 lanes, which limits performance slightly. But PCIe 5.0 SSDs remain rare (and PCIe 4.0 peripherals remain extremely fast), so it probably shouldn’t top your list of concerns.

The Ryzen 5 8500G is a lot different from the 8700G and 8600G, since some of the CPU cores in the Phoenix 2 chips are based on Zen 4c rather than Zen 4. These cores have all the same capabilities as regular Zen 4 ones—unlike Intel’s E-cores—but they’re optimized to take up less space rather than hit high clock speeds. They were initially made for servers, where cramming lots of cores into a small amount of space is more important than having a smaller number of faster cores, but AMD is also using them to make some of its low-end consumer chips physically smaller and presumably cheaper to produce. AMD didn’t send us a Ryzen 8500G for review, so we can’t see exactly how Phoenix 2 stacks up in a desktop.

The 8700G and 8600G chips are also the only ones that come with AMD’s “Ryzen AI” feature, the brand AMD is using to refer to processors with a neural processing unit (NPU) included. Sort of like GPUs or video encoding/decoding blocks, these are additional bits built into the chip that handle things that CPUs can’t do very efficiently—in this case, machine learning and AI workloads.

Most PCs still don’t have NPUs, and as such they are only barely used in current versions of Windows (Windows 11 offers some webcam effects that will take advantage of NPU acceleration, but for now that’s mostly it). But expect this to change as they become more common and as more AI-accelerated text, image, and video creating and editing capabilities are built into modern operating systems.

The last major difference is the GPU. Ryzen 7000 includes a pair of RDNA2 compute units that perform more or less like Intel’s desktop integrated graphics: good enough to render your desktop on a monitor or two, but not much else. The Ryzen 8000G chips include up to 12 RDNA3 CUs, which—as we’ve already seen in laptops and portable gaming systems like the Asus ROG Ally that use the same silicon—is enough to run most games, if just barely in some cases.

That gives AMD’s desktop APUs a unique niche. You can use them in cases where you can’t afford a dedicated GPU—for a time during the big graphics card shortage in 2020 and 2021, a Ryzen 5700G was actually one of the only ways to build a budget gaming PC. Or you can use them in cases where a dedicated GPU won’t fit, like super-small mini ITX-based desktops.

The main argument that AMD makes is the affordability one, comparing the price of a Ryzen 8700G to the price of an Intel Core i5-13400F and a GeForce GTX 1650 GPU (this card is nearly five years old, but it remains Nvidia’s newest and best GPU available for less than $200).

Let’s check on performance first, and then we’ll revisit pricing.

Ryzen 8000G review: An integrated GPU that can beat a graphics card, for a price Read More »

blockbuster-weight-loss-drugs-slashed-from-nc-state-plan-over-ballooning-costs

Blockbuster weight-loss drugs slashed from NC state plan over ballooning costs

Patients vs. profits —

The plan spent $102M on the weight-loss drugs last year, 10% of total drug costs.

Wegovy is an injectable prescription weight loss medicine that has helped people with obesity.

Enlarge / Wegovy is an injectable prescription weight loss medicine that has helped people with obesity.

The health plan for North Carolina state employees will stop covering blockbuster GLP-1 weight-loss drugs, including Wegovy and Zepbound, because—according to the plan’s board of trustees—the drugs are simply too expensive.

Last week, the board voted 4-3 to end all coverage of GLP-1 medications for weight loss on April 1. If the coverage is dropped, it is believed to be the first major state health plan to end coverage of the popular but pricey weight-loss drugs. The plan will continue to pay for GLP-1 medications prescribed to treat diabetes, including Ozempic.

The North Carolina State Health Plan covers nearly 740,000 people, including teachers, state employees, retirees, and their family members. In 2023, monthly premiums from the plan ranged from $25 for base coverage for an individual to up to $720 for premium family coverage. Members prescribed Wegovy paid a co-pay of between $30 and $50 per month for the drug, while the plan’s cost was around $800 a month.

In 2021, just under 2,800 members were taking the drugs for weight loss, but in 2023, the number soared to nearly 25,000 members, costing the plan $102 million. That’s about 10 percent of what the plan pays for all prescription drugs combined. If the current coverage continued, the plan’s pharmacy benefit manager, CVS Caremark, estimated that by 2025, the plan’s premiums would have to rise $48.50 across the board to offset the costs of the weight-loss drugs.

Without insurance, the list price of Wegovy is $1,349 per month, totaling $16,188 for a year of treatment. The average reported salary for members of North Carolina’s health plan is $56,431.

Last October, the board voted to grandfather the 25,000 or so current users, maintaining coverage for them moving forward, but then to stop offering new coverage to members. However, according to CVS Caremark, the move would mean losing a 40 percent rebate from Wegovy’s maker, Novo Nordisk. This would be a loss of $54 million, bringing projected 2024 costs to $139 million.

A spokesperson for Novo Nordisk called the vote to end coverage entirely “irresponsible,” according to a statement given to media. “We do not support insurers or bureaucrats inserting their judgment in these medically driven decisions,” the statement continued.

While the costs of weight-loss drugs are high everywhere, the pricing is particularly bitter for North Carolinians—Novo Nordisk manufactures Wegovy in Clayton, North Carolina, southeast of Raleigh.

“It certainly adds insult to injury,” Ardis Watkins, executive director of the State Employees Association of North Carolina, a group that lobbies on behalf of state health plan members, according to The New York Times. “Our economic climate that has been made so attractive to businesses to locate here is being used to manufacture a drug that is wildly marked up.”

While it appears to be the first time such a large state health plan has dropped coverage of the weight-loss drugs, North Carolina is not alone in wrestling with the costs. The University of Texas’ employee plan ceased coverage of Wegovy and Saxenda, another weight-loss drug, in September. Connecticut’s state health plan, meanwhile, added restrictions on how members could get a prescription covered. Some state health plans that cover GLP-1 medications for weight-loss have prior authorization procedures to try to limit use.

“Every state has been wrestling with it, every professional association that my staff is a part of has had some discussion about it,” Sam Watts, director of the North Carolina State Health Plan, told Bloomberg. “But to our knowledge, we’re the first major state health plan to act on it.”

Blockbuster weight-loss drugs slashed from NC state plan over ballooning costs Read More »

report:-deus-ex-title-killed-after-embracer-group’s-cuts-at-eidos

Report: Deus Ex title killed after Embracer Group’s cuts at Eidos

Not the ending most people would have chosen —

Swedish firm’s acquisitions continue trend of layoffs and canceled games.

Adam Jensen of Deus Ex: Mankind Divided, having coffee on the couch in diffuse sunlight

Enlarge / Adam Jensen of Deus Ex: Mankind Divided, taking in the news that no last-minute contrivance is going to save his series from what seemed like inevitable doom. (Pun credit to Andrew Cunningham).

Eidos Interactive

Embracer Group, the Swedish firm that bought up a number of known talents and gaming properties during the pandemic years, has canceled a Deus Ex game at its Eidos studio in Montreal, Canada, according to Bloomberg’s Jason Schreier.

The game, while not officially announced, has been known about since May 2022. It was due to enter production later in 2024 and had seen two years of pre-production development, according to Schreier’s sources. Many employees will be laid off as part of the cancellation.

Embracer Group acquired Eidos Montreal, along with Crystal Dynamics and Square Enix Montreal, for $300 million in mid-2022, buying up all of Japanese game publisher Square Enix’s Western game studios. That gave Embracer the keys to several influential and popular series, including Tomb RaiderJust CauseLife Is Strange, and Deus Ex.

Eidos published the first Deus Ex from developer Ion Storm, founded by id Software’s John Romero and Tom Hall. Gaming legend Warren Spector oversaw the development of the original Deus Ex, merging shooters, stealth, and open-world RPG game mechanics in a way that, for the year 2000, was wholly original. The game is often cited as one of the best PC games of all time and a progenitor of many immersive sims and RPG-inflected shooters to come.

Eidos Interactive was acquired in 2009 by Square Enix and became the primary developer of the Deus Ex series, starting with Deus Ex: Human Revolution in 2011. The last full-fledged title in the series was Deus Ex: Mankind Divided in 2016. Despite selling more than 14 million units across the series’ lifetime, and the perennial hunger by fans and critics to see a return to the series’ novel storytelling and sharp critique of mega-corp control, the reset button has been hit by a rather large corporation.

Another of Embracer Group’s notable acquisitions, the 2021 purchase of large independent developer Gearbox, looks to be unwinding, as well. Bloomberg’s Schreier reported in September 2023 that Embracer was looking to sell Gearbox after less than three years’ ownership. One month before that, Embracer Group shut down Volition, developer of Saints Row and Descent, after that studio’s 30th year of operation.

Ars has reached out to Embracer Group for comment and will update this post with any new information.

Most of the primary Deus Ex titles are on sale at the moment, at GOG and on Steam, for less than $5.

Listing image by Eidos Interactive

Report: Deus Ex title killed after Embracer Group’s cuts at Eidos Read More »