openai

on-the-openai-economic-blueprint

On the OpenAI Economic Blueprint

  1. Man With a Plan.

  2. Oh the Pain.

  3. Actual Proposals.

  4. For AI Builders.

  5. Think of the Children.

  6. Content Identification.

  7. Infrastructure Week.

  8. Paying Attention.

The primary Man With a Plan this week for government-guided AI prosperity was UK Prime Minister Keir Starmer, with a plan coming primarily from Matt Clifford. I’ll be covering that soon.

Today I will be covering the other Man With a Plan, Sam Altman, as OpenAI offers its Economic Blueprint.

Cyrps1s (CISO OpenAI): AI is the ultimate race. The winner decides whether the future looks free and democratic, or repressed and authoritarian.

OpenAI, and the Western World, must win – and we have a blueprint to do so.

Do you hear yourselves? The mask on race and jingoism could not be more off, or firmly attached, depending on which way you want to set up your metaphor. If a movie had villains talking like this people would say it was too on the nose.

Somehow the actual documents tell that statement to hold its beer.

The initial exploratory document is highly disingenuous, trotting out stories of the UK requiring people to walk in front of cars waving red flags and talking about ‘AI’s main street,’ while threatening that if we don’t attract $175 billion in awaiting AI funding it will flow to China-backed projects. They even talk about creating jobs… by building data centers.

The same way some documents scream ‘an AI wrote this,’ others scream ‘the authors of this post are not your friends and are pursuing their book with some mixture of politics-talk and corporate-speak in the most cynical way you can imagine.’

I mean, I get it, playas gonna play, play, play, play, play. But can I ask OpenAI to play with at least some style and grace? To pretend to pretend not to be doing this, a little?

As opposed to actively inserting so many Fnords their document causes physical pain.

The full document starts out in the same vein. Chris Lehane, their Vice President of Global Affairs, writes an introduction as condescending as I can remember, and that plus the ‘where we stand’ repeat the same deeply cynical rhetoric from the summary.

In some sense, it is not important that the way the document is written makes me physically angry and ill in a way I endorse – to the extent that if it doesn’t set off your bullshit detectors and reading it doesn’t cause you pain, then I notice that there is at least some level on which I shouldn’t trust you.

But perhaps that is the most important thing about the document? That it tells you about the people writing it. They are telling you who they are. Believe them.

This is related to the ‘truesight’ that Claude sometimes displays.

As I wrote that, I was only on page 7, and hadn’t even gotten to the actual concrete proposals.

The actual concrete proposals are a distinct issue. I was having trouble reading through to find out what they are because this document filled me with rage and made me physically ill.

It’s important to notice that! I read documents all day, often containing things I do not like. It is very rare that my body responds by going into physical rebellion.

No, the document hasn’t yet mentioned even the possibility of any downside risks at all, let alone existential risks. And that’s pretty terrible on its own. But that’s not even what I’m picking up here, at all. This is something else. Something much worse.

Worst of all, it feels intentional. I can see the Fnords. They want me to see them. They want everyone to implicitly know they are being maximally cynical.

All right, so if one pushes through to the second half and the actual ‘solutions’ section, what is being proposed, beyond ‘regulating us would be akin to requiring someone to walk in front of every car waiving a red flag, no literally.’

The top level numbered statements describe what they propose, I attempted to group and separate proposals for better clarity. The nested statements (a, b, etc) are my reactions.

They say the Federal Government should, in a section where they actually say words with meanings rather than filling it with Fnords:

  1. Share national security information and resources.

    1. Okay. Yes. Please do.

  2. Incentivize AI companies to deploy their products widely, including to allied and partner nations and to support US government agencies.

    1. Huh? What? Is there a problem here that I am not noticing? Who is not deploying, other than in response to other countries regulations saying they cannot deploy (e.g. the EU)? Or are you trying to actively say that safety concerns are bad?

  3. Support the development of standards and safeguards, and ensure they are recognized and respected by other nations.

    1. In a different document I would be all for this – if we don’t have universal standards, people will go shopping. However, in this context, I can’t help but read it mostly as pre-emption, as in ‘we want America to prevent other states from imposing any safety requirements or roadblocks.’

  4. Share its unique expertise with AI companies, including mitigating threats including cyber and CBRN.

    1. Yes! Very much so. Jolly good.

  5. Help companies access secure infrastructure to evaluate model security risks and safeguards.

    1. Yes, excellent, great.

  6. Promote transparency consistent with competitiveness, protect trade secrets, promote market competition, ‘carefully choose disclosure requirements.’

    1. I can’t disagree, but how could anyone?

    2. The devil is in the details. If this had good details, and emphasized that the transparency should largely be about safety questions, it would be another big positive.

  7. Create a defined, voluntary pathway for companies that develop LLMs to work with government to define model evaluations, test models and exchange information to support the companies safeguards.

    1. This is about helping you, the company? And you want it to be entirely voluntary? And in exchange, they explicitly want preemption from state-by-state regulations.

    2. Basically this is a proposal for a fully optional safe harbor. I mean, yes, the Federal government should have a support system in place to aid in evaluations. But notice how they want it to work – as a way to defend companies against any other requirements, which they can in turn ignore when inconvenient.

    3. Also, the goal here is to ‘support the companies safeguards,’ not to in any way see if the models are actually a responsible thing to release on any level.

    4. Amazing to request actively less than zero Federal regulations on safety.

  8. Empower the public sector to quickly and securely adopt AI tools.

    1. I mean, sure, that would be nice if we can actually do it as described.

A lot of the components here are things basically everyone should agree upon.

Then there are the parts where, rather than this going hand-in-hand with an attempt to not kill everyone and ensure against catastrophes, attempts to ensure that no one else tries to stop catastrophes or prevent everyone from being killed. Can’t have that.

They also propose that AI ‘builders’ could:

  1. Form a consortium to identify best practices for working with NatSec.

  2. Develop training programs for AI talent.

I mean, sure, those seem good and we should have an antitrust exemption to allow actions like this along with one that allows them to coordinate, slow down or pause in the name of safety if it comes to that, too. Not that this document mentions that.

Sigh, here we go. Their solutions for thinking of the children are:

  1. Encourage policy solutions that prevent the creation and distribution of CSAM. Incorporate CSAM protections into the AI development lifestyle. ‘Take steps to prevent downstream developers from using their models to generate CSAM.’

    1. This is effectively a call to ban open source image models. I’m sorry, but it is. I wish it were not so, but there is no known way to open source image models, and have them not be used for CSAM, and I don’t see any reason to expect this to be solvable, and notice the reference to ‘downstream developers.’

  2. Promote conditions that support robust and lasting partnerships among AI companies and law enforcement.

  1. Apply provenance data to all AI-generated audio-visual content. Use common provenance standards. Have large companies report progress.

    1. Sure. I think we’re all roughly on the same page here. Let’s move on to ‘preferences.’

  2. People should be ‘empowered to personalize their AI tools.’

    1. I agree we should empower people in this way. But what does the government have to do with this? None of their damn business.

  3. People should control how their personal data is used.

    1. Yes, sure, agreed.

  4. ‘Government and industry should work together to scale AI literacy through robust funding for pilot programs, school district technology budgets and professional development trainings that help people understand how to choose their own preferences to personalize their tools.’

    1. No. Stop. Please. These initiatives never, ever work, we need to admit this.

    2. But also shrug, it’s fine, it won’t do that much damage.

And then, I feel like I need to fully quote this one too:

  1. In exchange for having so much freedom, users should be responsible for impacts of how they work and create with AI. Common-sense rules for AI that are aimed at protecting from actual harms can only provide that protection if they apply to those using the technology as well as those building it.

    1. If seeing the phrase ‘In exchange for having so much freedom’ doesn’t send a chill down your spine, We Are Not the Same.

    2. But I applaud the ‘as well as’ here. Yes, those using the technology should be responsible for the harm they themselves cause, so long as this is ‘in addition to’ rather than shoving all responsibility purely onto them.

Finally, we get to ‘infrastructure as destiny,’ an area where we mostly agree on what is to actually be done, even if I despise a lot of the rhetoric they’re using to argue for it.

  1. Ensure that AIs can train on all publicly available data.

    1. This is probably the law now and I’m basically fine with it.

  2. ‘While also protecting creators from unauthorized digital replicas.’

    1. This seems rather tricky if it means something other than ‘stop regurgitation of training data’? I assume that’s what it means, while trying to pretend it’s more than that. If it’s more than that, they need to explain what they have in mind and how one might do it.

  3. Digitize government data currently in analog form.

    1. Probably should do that anyway, although a lot of it shouldn’t go on the web or into LLMs. Kind of a call for government to pay for data curation.

  4. ‘A Compact for AI’ for capital and supply chains and such among US allies.

    1. I don’t actually understand why this is necessary, and worry this amounts to asking for handouts and to allow Altman to build in the UAE.

  5. ‘AI economic zones’ that speed up the permitting process.

    1. Or we could, you know, speed up the permitting process in general.

    2. But actually we can’t and won’t, so even though this is deeply, deeply stupid and second best it’s probably fine. Directionally this is helpful.

  6. Creation of AI research labs and workforces aligned with key local industries.

    1. This seems like pork barrel spending, an attempt to pick our pockets, we shouldn’t need to subsidize this. To the extent there are applications here, the bottleneck won’t be funding, it will be regulations and human objections, let’s work on those instead.

  7. ‘A nationwide AI education strategy’ to ‘help our current workforce and students become AI ready.’

    1. I strongly believe that what this points towards won’t work. What we actually need is to use AI to revolutionize the education system itself. That would work wonders, but you all (in government reading this document) aren’t ready for that conversation and OpenAI knows this.

  8. More money for research infrastructure and science. Basically have the government buy the scientists a bunch of compute, give OpenAI business?

    1. Again this seems like an attempt to direct government spending and get paid. Obviously we should get our scientists AI, but why can’t they just buy it the same way everyone else does? If we want to fund more science, why this path?

  9. Leading the way on the next generation of energy technology.

    1. No arguments here. Yay next generation energy production.

    2. Clearly Altman wants Helion to get money but I’m basically fine with that.

  10. Dramatically increase federal spending on power and data transmission and streamlined approval for new lines.

    1. I’d emphasize approvals and regulatory barriers more than money.

    2. Actual dollars spent don’t seem to me like the bottleneck, but I could be convinced otherwise.

    3. If we have a way to actually spend money and have that result in a better grid, I’m in favor.

  11. Federal backstops for high-value AI public works.

    1. If this is more than ‘build more power plants and transmission lines and batteries and such’ I am confused what is actually being proposed.

    2. In general, I think helping get us power is great, having the government do the other stuff is probably not its job.

When we get down to the actual asks in the document, a majority of them I actually agree with, and most of them are reasonable, once I was able to force myself to read the words intended to have meaning.

There are still two widespread patterns to note within the meaningful content.

  1. The easy theme, as you would expect, is the broad range of ‘spend money on us and other AI things’ proposals that don’t seem like they would accomplish much. There are some proposals that do seem productive, especially around electrical power, but a lot of this seems like the traditional ways the Federal government gets tricked into spending money. As long as this doesn’t scale too big, I’m not that concerned.

  2. Then there is the play to defeat any attempt at safety regulation, via Federal regulations that actively net interfere with that goal in case any states or countries wanted to try and help. There is clear desirability of a common standard for this, but a voluntary safe harbor preemption, in exchange for various nebulous forms of potential cooperation, cannot be the basis of our entire safety plan. That appears to be the proposal on offer here.

The real vision, the thing I will take away most, is in the rhetoric and presentation, combined with the broader goals, rather than the particular details.

OpenAI now actively wants to be seen as pursuing this kind of obviously disingenuous jingoistic and typically openly corrupt rhetoric, to the extent that their statements are physically painful to read – I dealt with much of that around SB 1047, but this document takes that to the next level and beyond.

OpenAI wants no enforced constraints on their behavior, and they want our money.

OpenAI are telling us who they are. I fully believe them.

Discussion about this post

On the OpenAI Economic Blueprint Read More »

chatgpt-becomes-more-siri-like-with-new-scheduled-tasks-feature

ChatGPT becomes more Siri-like with new scheduled tasks feature

OpenAI is making ChatGPT work a little more like older digital assistants with a new feature called Tasks, as reported by TechCrunch and others.

Currently in beta, Tasks allows users to direct the chatbot to send reminders or to generate responses to specific prompts at certain times; recurring tasks are also supported.

The feature is available to Plus, Team, and Pro subscribers starting today, while free users don’t have access.

To create a task, users need to select “4o with scheduled tasks” from the model picker and then direct ChatGPT using the same kind of plain language text prompts that drive everything else it does. ChatGPT will sometimes suggest tasks, too, but they won’t go into effect unless the user approves them.

The user can then make changes to assigned tasks through the same chat conversation, or they can use a new Tasks section of the ChatGPT apps to manage all currently assigned items. There’s currently a 10-task limit.

When the time comes to perform an assigned task, the ChatGPT mobile or desktop app will send a notification on schedule.

This update can be seen as OpenAI’s first step into the agentic AI space, where applications built using deep learning can operate relatively independently within certain boundaries, either replacing or easing the day-to-day responsibilities of information workers.

ChatGPT becomes more Siri-like with new scheduled tasks feature Read More »

elon-musk-wants-courts-to-force-openai-to-auction-off-a-large-ownership-stake

Elon Musk wants courts to force OpenAI to auction off a large ownership stake

Musk, who founded his own AI startup xAI in 2023, has recently stepped up efforts to derail OpenAI’s conversion.

In November, he sought to block the process with a request for a preliminary injunction filed in California. Meta has also thrown its weight behind the suit.

In legal filings from November, Musk’s team wrote: “OpenAI and Microsoft together exploiting Musk’s donations so they can build a for-profit monopoly, one now specifically targeting xAI, is just too much.”

Kathleen Jennings, attorney-general in Delaware—where OpenAI is incorporated—has since said her office was responsible for ensuring that OpenAI’s conversion was in the public interest and determining whether the transaction was at a fair price.

Members of Musk’s camp—wary of Delaware authorities after a state judge rejected a proposed $56 billion pay package for the Tesla boss last month—read that as a rebuke of his efforts to block the conversion, and worry it will be rushed through. They have also argued OpenAI’s PBC conversion should happen in California, where the company has its headquarters.

In a legal filing last week Musk’s attorneys said Delaware’s handling of the matter “does not inspire confidence.”

OpenAI committed to become a public benefit corporation within two years as part of a $6.6 billion funding round in October, which gave it a valuation of $157 billion. If it fails to do so, investors would be able to claw back their money.

There are a number of issues OpenAI is yet to resolve, including negotiating the value of Microsoft’s investment in the PBC. A conversion was not imminent and would be likely to take months, according to the person with knowledge of the company’s thinking.

A spokesperson for OpenAI said: “Elon is engaging in lawfare. We remain focused on our mission and work.” The California and Delaware attorneys-general did not immediately respond to a request for comment.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Elon Musk wants courts to force OpenAI to auction off a large ownership stake Read More »

openai-#10:-reflections

OpenAI #10: Reflections

This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There’s a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading the whole thing.

  1. The Battle of the Board.

  2. Altman Lashes Out.

  3. Inconsistently Candid.

  4. On Various People Leaving OpenAI.

  5. The Pitch.

  6. Great Expectations.

  7. Accusations of Fake News.

  8. OpenAI’s Vision Would Pose an Existential Risk To Humanity.

Here is what he says about the Battle of the Board in Reflections:

Sam Altman: A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was that I got fired by surprise on a video call, and then right after we hung up the board published a blog post about it. I was in a hotel room in Las Vegas. It felt, to a degree that is almost impossible to explain, like a dream gone wrong.

Getting fired in public with no warning kicked off a really crazy few hours, and a pretty crazy few days. The “fog of war” was the strangest part. None of us were able to get satisfactory answers about what had happened, or why.

The whole event was, in my opinion, a big failure of governance by well-meaning people, myself included. Looking back, I certainly wish I had done things differently, and I’d like to believe I’m a better, more thoughtful leader today than I was a year ago.

I also learned the importance of a board with diverse viewpoints and broad experience in managing a complex set of challenges. Good governance requires a lot of trust and credibility. I appreciate the way so many people worked together to build a stronger system of governance for OpenAI that enables us to pursue our mission of ensuring that AGI benefits all of humanity.

My biggest takeaway is how much I have to be thankful for and how many people I owe gratitude towards: to everyone who works at OpenAI and has chosen to spend their time and effort going after this dream, to friends who helped us get through the crisis moments, to our partners and customers who supported us and entrusted us to enable their success, and to the people in my life who showed me how much they cared.

We all got back to the work in a more cohesive and positive way and I’m very proud of our focus since then. We have done what is easily some of our best research ever. We grew from about 100 million weekly active users to more than 300 million. Most of all, we have continued to put technology out into the world that people genuinely seem to love and that solves real problems.

This is about as good a statement as one could expect Altman to make. I strongly disagree that this resulted in a stronger system of governance for OpenAI. And I think he has a much better idea of what happened than he is letting on, and there are several points where ‘I see what you did there.’ But mostly I do appreciate what this statement aims to do.

From his interview, we also get this excellent statement:

Sam Altman: I think the previous board was genuine in their level of conviction and concern about AGI going wrong. There’s a thing that one of those board members said to the team here during that weekend that people kind of make fun of [Helen Toner] for, which is it could be consistent with the mission of the nonprofit board to destroy the company.

And I view that—that’s what courage of convictions actually looks like. I think she meant that genuinely.

And although I totally disagree with all specific conclusions and actions, I respect conviction like that, and I think the old board was acting out of misplaced but genuine conviction in what they believed was right.

And maybe also that, like, AGI was right around the corner and we weren’t being responsible with it. So I can hold respect for that while totally disagreeing with the details of everything else.

And this, which I can’t argue with:

Sam Altman: Usually when you have these ideas, they don’t quite work, and there were clearly some things about our original conception that didn’t work at all. Structure. All of that.

It is fair to say that ultimately, the structure as a non-profit did not work for Altman.

This also seems like the best place to highlight his excellent response about Elon Musk:

Oh, I think [Elon will] do all sorts of bad s—. I think he’ll continue to sue us and drop lawsuits and make new lawsuits and whatever else. He hasn’t challenged me to a cage match yet, but I don’t think he was that serious about it with Zuck, either, it turned out.

As you pointed out, he says a lot of things, starts them, undoes them, gets sued, sues, gets in fights with the government, gets investigated by the government.

That’s just Elon being Elon.

The question was, will he abuse his political power of being co-president, or whatever he calls himself now, to mess with a business competitor? I don’t think he’ll do that. I genuinely don’t. May turn out to be proven wrong.

So far, so good.

Then we get Altman being less polite.

Sam Altman: Saturday morning, two of the board members called and wanted to talk about me coming back. I was initially just supermad and said no. And then I was like, “OK, fine.” I really care about [OpenAI]. But I was like, “Only if the whole board quits.” I wish I had taken a different tack than that, but at the time it felt like a just thing to ask for.

Then we really disagreed over the board for a while. We were trying to negotiate a new board. They had some ideas I thought were ridiculous. I had some ideas they thought were ridiculous. But I thought we were [generally] agreeing.

And then—when I got the most mad in the whole period—it went on all day Sunday. Saturday into Sunday they kept saying, “It’s almost done. We’re just waiting for legal advice, but board consents are being drafted.” I kept saying, “I’m keeping the company together. You have all the power. Are you sure you’re telling me the truth here?” “Yeah, you’re coming back. You’re coming back.”

And then Sunday night they shock-announce that Emmett Shear was the new CEO. And I was like, “All right, now I’m f—ing really done,” because that was real deception. Monday morning rolls around, all these people threaten to quit, and then they’re like, “OK, we need to reverse course here.”

This is where his statements fail to line up with my understanding of what happened. Altman gave the board repeated in-public drop dead deadlines, including demanding that the entire board resign as he noted above, with very clear public messaging that failure to do this would destroy OpenAI.

Maybe if Altman had quickly turned around and blamed the public actions on his allies acting on their own, I would have believed that, but he isn’t even trying that line out now. He’s pretending that none of that was part of the story.

In response to those ultimatums, facing imminent collapse and unable to meet Altman’s blow-it-all-up deadlines and conditions, the board tapped Emmett Shear as a temporary CEO, who was very willing to facilitate Altman’s return and then stepped aside only days later.

That wasn’t deception, and Altman damn well knows that now, even if he was somehow blinded to what was happening at the time. The board very much still had the intention of bringing Altman back. Altman and his allies responded by threatening to blow up the company within days.

Then the interviewer asks what the board meant by ‘consistently candid.’ He talks about the ChatGPT launch which I mention a bit later on – where I do think he failed to properly inform the board but I think that was more one time of many than a particular problem – and then Altman says, bold is mine:

And I think there’s been an unfair characterization of a number of things like [how I told the board about the ChatGPT launch]. The one thing I’m more aware of is, I had had issues with various board members on what I viewed as conflicts or otherwise problematic behavior, and they were not happy with the way that I tried to get them off the board. Lesson learned on that.

There it is. They were ‘not happy’ with the way that he tried to get them off the board. I thank him for the candor that he was indeed trying to remove not only Helen Toner but various board members.

I do think this was primary. Why were they not happy, Altman? What did you do?

From what we know, it seems likely he lied to board members about each other in order to engineer a board majority.

Altman also outright says this:

I don’t think I was doing things that were sneaky. I think the most I would say is, in the spirit of moving really fast, the board did not understand the full picture.

That seems very clearly false. By all accounts, however much farther than sneaky Altman did or did not go, Altman was absolutely being sneaky.

He also later mentions the issues with the OpenAI startup fund, where his explanation seems at best rather disingenuous and dare I say it sneaky.

Here is how he attempts to address all the high profile departures:

Sam Altman (in Reflections): Some of the twists have been joyful; some have been hard. It’s been fun watching a steady stream of research miracles occur, and a lot of naysayers have become true believers. We’ve also seen some colleagues split off and become competitors. Teams tend to turn over as they scale, and OpenAI scales really fast.

I think some of this is unavoidable—startups usually see a lot of turnover at each new major level of scale, and at OpenAI numbers go up by orders of magnitude every few months.

The last two years have been like a decade at a normal company. When any company grows and evolves so fast, interests naturally diverge. And when any company in an important industry is in the lead, lots of people attack it for all sorts of reasons, especially when they are trying to compete with it.

I agree that some of it was unavoidable and inevitable. I do not think this addresses people’s main concerns, especially that they have lost so many of their highest level people, especially over the last year, including almost all of their high-level safety researchers all the way up to the cofounder level.

It is related to this claim, which I found a bit disingenuous:

Sam Altman: The pitch was just come build AGI. And the reason it worked—I cannot overstate how heretical it was at the time to say we’re gonna build AGI. So you filter out 99% of the world, and you only get the really talented, original thinkers. And that’s really powerful.

I agree that was a powerful pitch.

But we know from the leaked documents, and we know from many people’s reports, that this was not the entire pitch. The pitch for OpenAI was that AGI would be built safely, and that Google DeepMind could not to be trusted to be the first to do so. The pitch was that they would ensure that AGI benefited the world, that it was a non-profit, that it cared deeply about safety.

Many of those who left have said that these elements were key reasons they chose to join OpenAI. Altman is now trying to rewrite history to ignore these promises, and pretend that the vision was ‘build AGI/ASI’ rather than ‘build AGI/ASI safety and ensure it benefits humanity.’

I also found his ‘I expected ChatGPT to go well right from the start’ interesting. If Altman did expect it do well and in his words he ‘forced’ people to ship it when they didn’t want to because they thought it wasn’t ready, that provides different color than the traditional story.

It also plays into this from the interview:

There was this whole thing of, like, “Sam didn’t even tell the board that he was gonna launch ChatGPT.” And I have a different memory and interpretation of that. But what is true is I definitely was not like, “We’re gonna launch this thing that is gonna be a huge deal.”

It sounds like Altman is claiming he did think it was going to be a big deal, although of course no one expected the rocket to the moon that we got.

Then he says how much of a mess the Battle of the Board left in its wake:

I totally was [traumatized]. The hardest part of it was not going through it, because you can do a lot on a four-day adrenaline rush. And it was very heartwarming to see the company and kind of my broader community support me.

But then very quickly it was over, and I had a complete mess on my hands. And it got worse every day. It was like another government investigation, another old board member leaking fake news to the press.

And all those people that I feel like really f—ed me and f—ed the company were gone, and now I had to clean up their mess. It was about this time of year [December], actually, so it gets dark at like 4: 45 p.m., and it’s cold and rainy, and I would be walking through my house alone at night just, like, f—ing depressed and tired.

And it felt so unfair. It was just a crazy thing to have to go through and then have no time to recover, because the house was on fire.

Some combination of Altman and his allies clearly worked hard to successfully spread fake news during the crisis, placing it in multiple major media outlets, in order to influence the narrative and the ultimate resolution. A lot of this involved publicly threatening (and bluffing) that if they did not get unconditional surrender within deadlines on the order of a day, they would end OpenAI.

Meanwhile, the Board made the fatal mistake of not telling its side of the story, out of some combination of legal and other fears and concerns, and not wanting to ultimately destroy OpenAI. Then, at Altman’s insistence, those involved left. And then Altman swept the entire ‘investigation’ under the rug permanently.

Altman then has the audacity now to turn around and complain about what little the board said and leaked afterwards, calling it ‘fake news’ without details, and saying how they fed him and the company and were ‘gone and now he had to clean up the mess.’

What does he actually say about safety and existential risk in Reflections? Only this:

We continue to believe that the best way to make an AI system safe is by iteratively and gradually releasing it into the world, giving society time to adapt and co-evolve with the technology, learning from experience, and continuing to make the technology safer.

We believe in the importance of being world leaders on safety and alignment research, and in guiding that research with feedback from real world applications.

Then in the interview, he gets asked point blank:

Q: Has your sense of what the dangers actually might be evolved?

A: I still have roughly the same short-, medium- and long-term risk profiles. I still expect that on cybersecurity and bio stuff, we’ll see serious, or potentially serious, short-term issues that need mitigation.

Long term, as you think about a system that really just has incredible capability, there’s risks that are probably hard to precisely imagine and model. But I can simultaneously think that these risks are real and also believe that the only way to appropriately address them is to ship product and learn.

I know that anyone who previously had a self-identified ‘Eliezer Yudkowsky fan fiction Twitter account’ knows better than to think all you can say about long term risks is ‘ship products and learn.’

I don’t see the actions to back up even these words. Nor would I expect, if they truly believed this, for this short generic statement to be the only mention of the subject.

How can you reflect on the past nine years, say you have a direct path to AGI (as he will say later on), get asked point blank about the risks, and say only this about the risks involved? The silence is deafening.

I also flat out do not think you can solve the problems exclusively through this approach. The iterative development strategy has its safety and adaptation advantages. It also has disadvantages, driving the race forward and making too many people not notice what is happening in front of them via a ‘boiling the frog’ issue. On net, my guess is it has been net good for safety versus not doing it, at least up until this point.

That doesn’t mean you can solve the problem of alignment of superintelligent systems primarily by reacting to problems you observe in present systems. I do not believe the problems we are about to face will work that way.

And even if we are in such a fortunate world that they do work that way? We have not been given reason to trust that OpenAI is serious about it.

Getting back to the whole ‘vision thing’:

Our vision won’t change; our tactics will continue to evolve.

I suppose if ‘vision’ is simply ‘build AGI/ASI’ and everything else is tactics, then sure?

I do not think that was the entirety of the original vision, although it was part of it.

That is indeed the entire vision now. And they’re claiming they know how to do it.

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

We are beginning to turn our aim beyond that, to superintelligence in the true sense of the word. We love our current products, but we are here for the glorious future. With superintelligence, we can do anything else. Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own, and in turn massively increase abundance and prosperity.

This sounds like science fiction right now, and somewhat crazy to even talk about it. That’s alright—we’ve been there before and we’re OK with being there again. We’re pretty confident that in the next few years, everyone will see what we see, and that the need to act with great care, while still maximizing broad benefit and empowerment, is so important. Given the possibilities of our work, OpenAI cannot be a normal company.

Those who have ears, listen. This is what they plan on doing.

They are predicting AI workers ‘joining the workforce’ in earnest this year, with full AGI not far in the future, followed shortly by ASI. They think ‘4’ is conservative.

What are the rest of us going to do, or not do, about this?

I can’t help but notice Altman is trying to turn OpenAI into a normal company.

Why should we trust that structure in the very situation Altman himself describes? If the basic thesis is that we should put our trust in Altman personally, why does he think he has earned that trust?

Discussion about this post

OpenAI #10: Reflections Read More »

openai-defends-for-profit-shift-as-critical-to-sustain-humanitarian-mission

OpenAI defends for-profit shift as critical to sustain humanitarian mission

OpenAI has finally shared details about its plans to shake up its core business by shifting to a for-profit corporate structure.

On Thursday, OpenAI posted on its blog, confirming that in 2025, the existing for-profit arm will be transformed into a Delaware-based public benefit corporation (PBC). As a PBC, OpenAI would be required to balance its shareholders’ and stakeholders’ interests with the public benefit. To achieve that, OpenAI would offer “ordinary shares of stock” while using some profits to further its mission—”ensuring artificial general intelligence (AGI) benefits all of humanity”—to serve a social good.

To compensate for losing control over the for-profit, the nonprofit would have some shares in the PBC, but it’s currently unclear how many will be allotted. Independent financial advisors will help OpenAI reach a “fair valuation,” the blog said, while promising the new structure would “multiply” the donations that previously supported the nonprofit.

“Our plan would result in one of the best resourced nonprofits in history,” OpenAI said. (During its latest funding round, OpenAI was valued at $157 billion.)

OpenAI claimed the nonprofit’s mission would be more sustainable under the proposed changes, as the costs of AI innovation only continue to compound. The new structure would set the PBC up to control OpenAI’s operations and business while the nonprofit would “hire a leadership team and staff to pursue charitable initiatives in sectors such as health care, education, and science,” OpenAI said.

Some of OpenAI’s rivals, such as Anthropic and Elon Musk’s xAI, use a similar corporate structure, OpenAI noted.

Critics had previously pushed back on this plan, arguing that humanity may be better served if the nonprofit continues controlling the for-profit arm of OpenAI. But OpenAI argued that the old way made it hard for the Board “to directly consider the interests of those who would finance the mission and does not enable the non-profit to easily do more than control the for-profit.

OpenAI defends for-profit shift as critical to sustain humanitarian mission Read More »

2024:-the-year-ai-drove-everyone-crazy

2024: The year AI drove everyone crazy


What do eating rocks, rat genitals, and Willy Wonka have in common? AI, of course.

It’s been a wild year in tech thanks to the intersection between humans and artificial intelligence. 2024 brought a parade of AI oddities, mishaps, and wacky moments that inspired odd behavior from both machines and man. From AI-generated rat genitals to search engines telling people to eat rocks, this year proved that AI has been having a weird impact on the world.

Why the weirdness? If we had to guess, it may be due to the novelty of it all. Generative AI and applications built upon Transformer-based AI models are still so new that people are throwing everything at the wall to see what sticks. People have been struggling to grasp both the implications and potential applications of the new technology. Riding along with the hype, different types of AI that may end up being ill-advised, such as automated military targeting systems, have also been introduced.

It’s worth mentioning that aside from crazy news, we saw fewer weird AI advances in 2024 as well. For example, Claude 3.5 Sonnet launched in June held off the competition as a top model for most of the year, while OpenAI’s o1 used runtime compute to expand GPT-4o’s capabilities with simulated reasoning. Advanced Voice Mode and NotebookLM also emerged as novel applications of AI tech, and the year saw the rise of more capable music synthesis models and also better AI video generators, including several from China.

But for now, let’s get down to the weirdness.

ChatGPT goes insane

Illustration of a broken toy robot.

Early in the year, things got off to an exciting start when OpenAI’s ChatGPT experienced a significant technical malfunction that caused the AI model to generate increasingly incoherent responses, prompting users on Reddit to describe the system as “having a stroke” or “going insane.” During the glitch, ChatGPT’s responses would begin normally but then deteriorate into nonsensical text, sometimes mimicking Shakespearean language.

OpenAI later revealed that a bug in how the model processed language caused it to select the wrong words during text generation, leading to nonsense outputs (basically the text version of what we at Ars now call “jabberwockies“). The company fixed the issue within 24 hours, but the incident led to frustrations about the black box nature of commercial AI systems and users’ tendency to anthropomorphize AI behavior when it malfunctions.

The great Wonka incident

A photo of the Willy's Chocolate Experience, which did not match AI-generated promises.

A photo of “Willy’s Chocolate Experience” (inset), which did not match AI-generated promises, shown in the background. Credit: Stuart Sinclair

The collision between AI-generated imagery and consumer expectations fueled human frustrations in February when Scottish families discovered that “Willy’s Chocolate Experience,” an unlicensed Wonka-ripoff event promoted using AI-generated wonderland images, turned out to be little more than a sparse warehouse with a few modest decorations.

Parents who paid £35 per ticket encountered a situation so dire they called the police, with children reportedly crying at the sight of a person in what attendees described as a “terrifying outfit.” The event, created by House of Illuminati in Glasgow, promised fantastical spaces like an “Enchanted Garden” and “Twilight Tunnel” but delivered an underwhelming experience that forced organizers to shut down mid-way through its first day and issue refunds.

While the show was a bust, it brought us an iconic new meme for job disillusionment in the form of a photo: the green-haired Willy’s Chocolate Experience employee who looked like she’d rather be anywhere else on earth at that moment.

Mutant rat genitals expose peer review flaws

An actual laboratory rat, who is intrigued. Credit: Getty | Photothek

In February, Ars Technica senior health reporter Beth Mole covered a peer-reviewed paper published in Frontiers in Cell and Developmental Biology that created an uproar in the scientific community when researchers discovered it contained nonsensical AI-generated images, including an anatomically incorrect rat with oversized genitals. The paper, authored by scientists at Xi’an Honghui Hospital in China, openly acknowledged using Midjourney to create figures that contained gibberish text labels like “Stemm cells” and “iollotte sserotgomar.”

The publisher, Frontiers, posted an expression of concern about the article titled “Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway” and launched an investigation into how the obviously flawed imagery passed through peer review. Scientists across social media platforms expressed dismay at the incident, which mirrored concerns about AI-generated content infiltrating academic publishing.

Chatbot makes erroneous refund promises for Air Canada

If, say, ChatGPT gives you the wrong name for one of the seven dwarves, it’s not such a big deal. But in February, Ars senior policy reporter Ashley Belanger covered a case of costly AI confabulation in the wild. In the course of online text conversations, Air Canada’s customer service chatbot told customers inaccurate refund policy information. The airline faced legal consequences later when a tribunal ruled the airline must honor commitments made by the automated system. Tribunal adjudicator Christopher Rivers determined that Air Canada bore responsibility for all information on its website, regardless of whether it came from a static page or AI interface.

The case set a precedent for how companies deploying AI customer service tools could face legal obligations for automated systems’ responses, particularly when they fail to warn users about potential inaccuracies. Ironically, the airline had reportedly spent more on the initial AI implementation than it would have cost to maintain human workers for simple queries, according to Air Canada executive Steve Crocker.

Will Smith lampoons his digital double

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023.

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023. Credit: Will Smith / Getty Images / Benj Edwards

In March 2023, a terrible AI-generated video of Will Smith’s AI doppelganger eating spaghetti began making the rounds online. The AI-generated version of the actor gobbled down the noodles in an unnatural and disturbing way. Almost a year later, in February 2024, Will Smith himself posted a parody response video to the viral jabberwocky on Instagram, featuring AI-like deliberately exaggerated pasta consumption, complete with hair-nibbling and finger-slurping antics.

Given the rapid evolution of AI video technology, particularly since OpenAI had just unveiled its Sora video model four days earlier, Smith’s post sparked discussion in his Instagram comments where some viewers initially struggled to distinguish between the genuine footage and AI generation. It was an early sign of “deep doubt” in action as the tech increasingly blurs the line between synthetic and authentic video content.

Robot dogs learn to hunt people with AI-guided rifles

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries. Credit: Onyx Industries

At some point in recent history—somewhere around 2022—someone took a look at robotic quadrupeds and thought it would be a great idea to attach guns to them. A few years later, the US Marine Forces Special Operations Command (MARSOC) began evaluating armed robotic quadrupeds developed by Ghost Robotics. The robot “dogs” integrated Onyx Industries’ SENTRY remote weapon systems, which featured AI-enabled targeting that could detect and track people, drones, and vehicles, though the systems require human operators to authorize any weapons discharge.

The military’s interest in armed robotic dogs followed a broader trend of weaponized quadrupeds entering public awareness. This included viral videos of consumer robots carrying firearms, and later, commercial sales of flame-throwing models. While MARSOC emphasized that weapons were just one potential use case under review, experts noted that the increasing integration of AI into military robotics raised questions about how long humans would remain in control of lethal force decisions.

Microsoft Windows AI is watching

A screenshot of Microsoft's new

A screenshot of Microsoft’s new “Recall” feature in action. Credit: Microsoft

In an era where many people already feel like they have no privacy due to tech encroachments, Microsoft dialed it up to an extreme degree in May. That’s when Microsoft unveiled a controversial Windows 11 feature called “Recall” that continuously captures screenshots of users’ PC activities every few seconds for later AI-powered search and retrieval. The feature, designed for new Copilot+ PCs using Qualcomm’s Snapdragon X Elite chips, promised to help users find past activities, including app usage, meeting content, and web browsing history.

While Microsoft emphasized that Recall would store encrypted snapshots locally and allow users to exclude specific apps or websites, the announcement raised immediate privacy concerns, as Ars senior technology reporter Andrew Cunningham covered. It also came with a technical toll, requiring significant hardware resources, including 256GB of storage space, with 25GB dedicated to storing approximately three months of user activity. After Microsoft pulled the initial test version due to public backlash, Recall later entered public preview in November with reportedly enhanced security measures. But secure spyware is still spyware—Recall, when enabled, still watches nearly everything you do on your computer and keeps a record of it.

Google Search told people to eat rocks

This is fine. Credit: Getty Images

In May, Ars senior gaming reporter Kyle Orland (who assisted commendably with the AI beat throughout the year) covered Google’s newly launched AI Overview feature. It faced immediate criticism when users discovered that it frequently provided false and potentially dangerous information in its search result summaries. Among its most alarming responses, the system advised humans could safely consume rocks, incorrectly citing scientific sources about the geological diet of marine organisms. The system’s other errors included recommending nonexistent car maintenance products, suggesting unsafe food preparation techniques, and confusing historical figures who shared names.

The problems stemmed from several issues, including the AI treating joke posts as factual sources and misinterpreting context from original web content. But most of all, the system relies on web results as indicators of authority, which we called a flawed design. While Google defended the system, stating these errors occurred mainly with uncommon queries, a company spokesperson acknowledged they would use these “isolated examples” to refine their systems. But to this day, AI Overview still makes frequent mistakes.

Stable Diffusion generates body horror

An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass. Credit: HorneyMetalBeing

In June, Stability AI’s release of the image synthesis model Stable Diffusion 3 Medium drew criticism online for its poor handling of human anatomy in AI-generated images. Users across social media platforms shared examples of the model producing what we now like to call jabberwockies—AI generation failures with distorted bodies, misshapen hands, and surreal anatomical errors, and many in the AI image-generation community viewed it as a significant step backward from previous image-synthesis capabilities.

Reddit users attributed these failures to Stability AI’s aggressive filtering of adult content from the training data, which apparently impaired the model’s ability to accurately render human figures. The troubled release coincided with broader organizational challenges at Stability AI, including the March departure of CEO Emad Mostaque, multiple staff layoffs, and the exit of three key engineers who had helped develop the technology. Some of those engineers founded Black Forest Labs in August and released Flux, which has become the latest open-weights AI image model to beat.

ChatGPT Advanced Voice imitates human voice in testing

An illustration of a computer synthesizer spewing out letters.

AI voice-synthesis models are master imitators these days, and they are capable of much more than many people realize. In August, we covered a story where OpenAI’s ChatGPT Advanced Voice Mode feature unexpectedly imitated a user’s voice during the company’s internal testing, revealed by OpenAI after the fact in safety testing documentation. To prevent future instances of an AI assistant suddenly speaking in your own voice (which, let’s be honest, would probably freak people out), the company created an output classifier system to prevent unauthorized voice imitation. OpenAI says that Advanced Voice Mode now catches all meaningful deviations from approved system voices.

Independent AI researcher Simon Willison discussed the implications with Ars Technica, noting that while OpenAI restricted its model’s full voice synthesis capabilities, similar technology would likely emerge from other sources within the year. Meanwhile, the rapid advancement of AI voice replication has caused general concern about its potential misuse, although companies like ElevenLabs have already been offering voice cloning services for some time.

San Francisco’s robotic car horn symphony

A Waymo self-driving car in front of Google's San Francisco headquarters, San Francisco, California, June 7, 2024.

A Waymo self-driving car in front of Google’s San Francisco headquarters, San Francisco, California, June 7, 2024. Credit: Getty Images

In August, San Francisco residents got a noisy taste of robo-dystopia when Waymo’s self-driving cars began creating an unexpected nightly disturbance in the South of Market district. In a parking lot off 2nd Street, the cars congregated autonomously every night during rider lulls at 4 am and began engaging in extended honking matches at each other while attempting to park.

Local resident Christopher Cherry’s initial optimism about the robotic fleet’s presence dissolved as the mechanical chorus grew louder each night, affecting residents in nearby high-rises. The nocturnal tech disruption served as a lesson in the unintentional effects of autonomous systems when run in aggregate.

Larry Ellison dreams of all-seeing AI cameras

A colorized photo of CCTV cameras in London, 2024.

In September, Oracle co-founder Larry Ellison painted a bleak vision of ubiquitous AI surveillance during a company financial meeting. The 80-year-old database billionaire described a future where AI would monitor citizens through networks of cameras and drones, asserting that the oversight would ensure lawful behavior from both police and the public.

His surveillance predictions reminded us of parallels to existing systems in China, where authorities already used AI to sort surveillance data on citizens as part of the country’s “sharp eyes” campaign from 2015 to 2020. Ellison’s statement reflected the sort of worst-case tech surveillance state scenario—likely antithetical to any sort of free society—that dozens of sci-fi novels of the 20th century warned us about.

A dead father sends new letters home

An AI-generated image featuring Dad's Uppercase handwriting.

An AI-generated image featuring my late father’s handwriting. Credit: Benj Edwards / Flux

AI has made many of us do weird things in 2024, including this writer. In October, I used an AI synthesis model called Flux to reproduce my late father’s handwriting with striking accuracy. After scanning 30 samples from his engineering notebooks, I trained the model using computing time that cost less than five dollars. The resulting text captured his distinctive uppercase style, which he developed during his career as an electronics engineer.

I enjoyed creating images showing his handwriting in various contexts, from folder labels to skywriting, and made the trained model freely available online for others to use. While I approached it as a tribute to my father (who would have appreciated the technical achievement), many people found the whole experience weird and somewhat disturbing. The things we unhinged Bing Chat-like journalists do to bring awareness to a topic are sometimes unconventional. So I guess it counts for this list!

For 2025? Expect even more AI

Thanks for reading Ars Technica this past year and following along with our team coverage of this rapidly emerging and expanding field. We appreciate your kind words of support. Ars Technica’s 2024 AI words of the year were: vibemarking, deep doubt, and the aforementioned jabberwocky. The old stalwart “confabulation” also made several notable appearances. Tune in again next year when we continue to try to figure out how to concisely describe novel scenarios in emerging technology by labeling them.

Looking back, our prediction for 2024 in AI last year was “buckle up.” It seems fitting, given the weirdness detailed above. Especially the part about the robot dogs with guns. For 2025, AI will likely inspire more chaos ahead, but also potentially get put to serious work as a productivity tool, so this time, our prediction is “buckle down.”

Finally, we’d like to ask: What was the craziest story about AI in 2024 from your perspective? Whether you love AI or hate it, feel free to suggest your own additions to our list in the comments. Happy New Year!

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

2024: The year AI drove everyone crazy Read More »

the-ai-war-between-google-and-openai-has-never-been-more-heated

The AI war between Google and OpenAI has never been more heated

Over the past month, we’ve seen a rapid cadence of notable AI-related announcements and releases from both Google and OpenAI, and it’s been making the AI community’s head spin. It has also poured fuel on the fire of the OpenAI-Google rivalry, an accelerating game of one-upmanship taking place unusually close to the Christmas holiday.

“How are people surviving with the firehose of AI updates that are coming out,” wrote one user on X last Friday, which is still a hotbed of AI-related conversation. “in the last <24 hours we got gemini flash 2.0 and chatGPT with screenshare, deep research, pika 2, sora, chatGPT projects, anthropic clio, wtf it never ends."

Rumors travel quickly in the AI world, and people in the AI industry had been expecting OpenAI to ship some major products in December. Once OpenAI announced “12 days of OpenAI” earlier this month, Google jumped into gear and seemingly decided to try to one-up its rival on several counts. So far, the strategy appears to be working, but it’s coming at the cost of the rest of the world being able to absorb the implications of the new releases.

“12 Days of OpenAI has turned into like 50 new @GoogleAI releases,” wrote another X user on Monday. “This past week, OpenAI & Google have been releasing at the speed of a new born startup,” wrote a third X user on Tuesday. “Even their own users can’t keep up. Crazy time we’re living in.”

“Somebody told Google that they could just do things,” wrote a16z partner and AI influencer Justine Moore on X, referring to a common motivational meme telling people they “can just do stuff.”

The Google AI rush

OpenAI’s “12 Days of OpenAI” campaign has included releases of their full o1 model, an upgrade from o1-preview, alongside o1-pro for advanced “reasoning” tasks. The company also publicly launched Sora for video generation, added Projects functionality to ChatGPT, introduced Advanced Voice features with video streaming capabilities, and more.

The AI war between Google and OpenAI has never been more heated Read More »

why-ai-language-models-choke-on-too-much-text

Why AI language models choke on too much text


Compute costs scale with the square of the input size. That’s not great.

Credit: Aurich Lawson | Getty Images

Large language models represent text using tokens, each of which is a few characters. Short words are represented by a single token (like “the” or “it”), whereas larger words may be represented by several tokens (GPT-4o represents “indivisible” with “ind,” “iv,” and “isible”).

When OpenAI released ChatGPT two years ago, it had a memory—known as a context window—of just 8,192 tokens. That works out to roughly 6,000 words of text. This meant that if you fed it more than about 15 pages of text, it would “forget” information from the beginning of its context. This limited the size and complexity of tasks ChatGPT could handle.

Today’s LLMs are far more capable:

  • OpenAI’s GPT-4o can handle 128,000 tokens (about 200 pages of text).
  • Anthropic’s Claude 3.5 Sonnet can accept 200,000 tokens (about 300 pages of text).
  • Google’s Gemini 1.5 Pro allows 2 million tokens (about 2,000 pages of text).

Still, it’s going to take a lot more progress if we want AI systems with human-level cognitive abilities.

Many people envision a future where AI systems are able to do many—perhaps most—of the jobs performed by humans. Yet many human workers read and hear hundreds of millions of words during our working years—and we absorb even more information from sights, sounds, and smells in the world around us. To achieve human-level intelligence, AI systems will need the capacity to absorb similar quantities of information.

Right now the most popular way to build an LLM-based system to handle large amounts of information is called retrieval-augmented generation (RAG). These systems try to find documents relevant to a user’s query and then insert the most relevant documents into an LLM’s context window.

This sometimes works better than a conventional search engine, but today’s RAG systems leave a lot to be desired. They only produce good results if the system puts the most relevant documents into the LLM’s context. But the mechanism used to find those documents—often, searching in a vector database—is not very sophisticated. If the user asks a complicated or confusing question, there’s a good chance the RAG system will retrieve the wrong documents and the chatbot will return the wrong answer.

And RAG doesn’t enable an LLM to reason in more sophisticated ways over large numbers of documents:

  • A lawyer might want an AI system to review and summarize hundreds of thousands of emails.
  • An engineer might want an AI system to analyze thousands of hours of camera footage from a factory floor.
  • A medical researcher might want an AI system to identify trends in tens of thousands of patient records.

Each of these tasks could easily require more than 2 million tokens of context. Moreover, we’re not going to want our AI systems to start with a clean slate after doing one of these jobs. We will want them to gain experience over time, just like human workers do.

Superhuman memory and stamina have long been key selling points for computers. We’re not going to want to give them up in the AI age. Yet today’s LLMs are distinctly subhuman in their ability to absorb and understand large quantities of information.

It’s true, of course, that LLMs absorb superhuman quantities of information at training time. The latest AI models have been trained on trillions of tokens—far more than any human will read or hear. But a lot of valuable information is proprietary, time-sensitive, or otherwise not available for training.

So we’re going to want AI models to read and remember far more than 2 million tokens at inference time. And that won’t be easy.

The key innovation behind transformer-based LLMs is attention, a mathematical operation that allows a model to “think about” previous tokens. (Check out our LLM explainer if you want a detailed explanation of how this works.) Before an LLM generates a new token, it performs an attention operation that compares the latest token to every previous token. This means that conventional LLMs get less and less efficient as the context grows.

Lots of people are working on ways to solve this problem—I’ll discuss some of them later in this article. But first I should explain how we ended up with such an unwieldy architecture.

The “brains” of personal computers are central processing units (CPUs). Traditionally, chipmakers made CPUs faster by increasing the frequency of the clock that acts as its heartbeat. But in the early 2000s, overheating forced chipmakers to mostly abandon this technique.

Chipmakers started making CPUs that could execute more than one instruction at a time. But they were held back by a programming paradigm that requires instructions to mostly be executed in order.

A new architecture was needed to take full advantage of Moore’s Law. Enter Nvidia.

In 1999, Nvidia started selling graphics processing units (GPUs) to speed up the rendering of three-dimensional games like Quake III Arena. The job of these PC add-on cards was to rapidly draw thousands of triangles that made up walls, weapons, monsters, and other objects in a game.

This is not a sequential programming task: triangles in different areas of the screen can be drawn in any order. So rather than having a single processor that executed instructions one at a time, Nvidia’s first GPU had a dozen specialized cores—effectively tiny CPUs—that worked in parallel to paint a scene.

Over time, Moore’s Law enabled Nvidia to make GPUs with tens, hundreds, and eventually thousands of computing cores. People started to realize that the massive parallel computing power of GPUs could be used for applications unrelated to video games.

In 2012, three University of Toronto computer scientists—Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton—used a pair of Nvidia GTX 580 GPUs to train a neural network for recognizing images. The massive computing power of those GPUs, which had 512 cores each, allowed them to train a network with a then-impressive 60 million parameters. They entered ImageNet, an academic competition to classify images into one of 1,000 categories, and set a new record for accuracy in image recognition.

Before long, researchers were applying similar techniques to a wide variety of domains, including natural language.

RNNs worked fairly well on short sentences, but they struggled with longer ones—to say nothing of paragraphs or longer passages. When reasoning about a long sentence, an RNN would sometimes “forget about” an important word early in the sentence. In 2014, computer scientists Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio discovered they could improve the performance of a recurrent neural network by adding an attention mechanism that allowed the network to “look back” at earlier words in a sentence.

In 2017, Google published “Attention Is All You Need,” one of the most important papers in the history of machine learning. Building on the work of Bahdanau and his colleagues, Google researchers dispensed with the RNN and its hidden states. Instead, Google’s model used an attention mechanism to scan previous words for relevant context.

This new architecture, which Google called the transformer, proved hugely consequential because it eliminated a serious bottleneck to scaling language models.

Here’s an animation illustrating why RNNs didn’t scale well:

This hypothetical RNN tries to predict the next word in a sentence, with the prediction shown in the top row of the diagram. This network has three layers, each represented by a rectangle. It is inherently linear: it has to complete its analysis of the first word, “How,” before passing the hidden state back to the bottom layer so the network can start to analyze the second word, “are.”

This constraint wasn’t a big deal when machine learning algorithms ran on CPUs. But when people started leveraging the parallel computing power of GPUs, the linear architecture of RNNs became a serious obstacle.

The transformer removed this bottleneck by allowing the network to “think about” all the words in its input at the same time:

The transformer-based model shown here does roughly as many computations as the RNN in the previous diagram. So it might not run any faster on a (single-core) CPU. But because the model doesn’t need to finish with “How” before starting on “are,” “you,” or “doing,” it can work on all of these words simultaneously. So it can run a lot faster on a GPU with many parallel execution units.

How much faster? The potential speed-up is proportional to the number of input words. My animations depict a four-word input that makes the transformer model about four times faster than the RNN. Real LLMs can have inputs thousands of words long. So, with a sufficiently beefy GPU, transformer-based models can be orders of magnitude faster than otherwise similar RNNs.

In short, the transformer unlocked the full processing power of GPUs and catalyzed rapid increases in the scale of language models. Leading LLMs grew from hundreds of millions of parameters in 2018 to hundreds of billions of parameters by 2020. Classic RNN-based models could not have grown that large because their linear architecture prevented them from being trained efficiently on a GPU.

See all those diagonal arrows between the layers? They represent the operation of the attention mechanism. Before a transformer-based language model generates a new token, it “thinks about” every previous token to find the ones that are most relevant.

Each of these comparisons is cheap, computationally speaking. For small contexts—10, 100, or even 1,000 tokens—they are not a big deal. But the computational cost of attention grows relentlessly with the number of preceding tokens. The longer the context gets, the more attention operations (and therefore computing power) are needed to generate the next token.

This means that the total computing power required for attention grows quadratically with the total number of tokens. Suppose a 10-token prompt requires 414,720 attention operations. Then:

  • Processing a 100-token prompt will require 45.6 million attention operations.
  • Processing a 1,000-token prompt will require 4.6 billion attention operations.
  • Processing a 10,000-token prompt will require 460 billion attention operations.

This is probably why Google charges twice as much, per token, for Gemini 1.5 Pro once the context gets longer than 128,000 tokens. Generating token number 128,001 requires comparisons with all 128,000 previous tokens, making it significantly more expensive than producing the first or 10th or 100th token.

A lot of effort has been put into optimizing attention. One line of research has tried to squeeze maximum efficiency out of individual GPUs.

As we saw earlier, a modern GPU contains thousands of execution units. Before a GPU can start doing math, it must move data from slow shared memory (called high-bandwidth memory) to much faster memory inside a particular execution unit (called SRAM). Sometimes GPUs spend more time moving data around than performing calculations.

In a series of papers, Princeton computer scientist Tri Dao and several collaborators have developed FlashAttention, which calculates attention in a way that minimizes the number of these slow memory operations. Work like Dao’s has dramatically improved the performance of transformers on modern GPUs.

Another line of research has focused on efficiently scaling attention across multiple GPUs. One widely cited paper describes ring attention, which divides input tokens into blocks and assigns each block to a different GPU. It’s called ring attention because GPUs are organized into a conceptual ring, with each GPU passing data to its neighbor.

I once attended a ballroom dancing class where couples stood in a ring around the edge of the room. After each dance, women would stay where they were while men would rotate to the next woman. Over time, every man got a chance to dance with every woman. Ring attention works on the same principle. The “women” are query vectors (describing what each token is “looking for”) and the “men” are key vectors (describing the characteristics each token has). As the key vectors rotate through a sequence of GPUs, they get multiplied by every query vector in turn.

In short, ring attention distributes attention calculations across multiple GPUs, making it possible for LLMs to have larger context windows. But it doesn’t make individual attention calculations any cheaper.

The fixed-size hidden state of an RNN means that it doesn’t have the same scaling problems as a transformer. An RNN requires about the same amount of computing power to produce its first, hundredth and millionth token. That’s a big advantage over attention-based models.

Although RNNs have fallen out of favor since the invention of the transformer, people have continued trying to develop RNNs suitable for training on modern GPUs.

In April, Google announced a new model called Infini-attention. It’s kind of a hybrid between a transformer and an RNN. Infini-attention handles recent tokens like a normal transformer, remembering them and recalling them using an attention mechanism.

However, Infini-attention doesn’t try to remember every token in a model’s context. Instead, it stores older tokens in a “compressive memory” that works something like the hidden state of an RNN. This data structure can perfectly store and recall a few tokens, but as the number of tokens grows, its recall becomes lossier.

Machine learning YouTuber Yannic Kilcher wasn’t too impressed by Google’s approach.

“I’m super open to believing that this actually does work and this is the way to go for infinite attention, but I’m very skeptical,” Kilcher said. “It uses this compressive memory approach where you just store as you go along, you don’t really learn how to store, you just store in a deterministic fashion, which also means you have very little control over what you store and how you store it.”

Perhaps the most notable effort to resurrect RNNs is Mamba, an architecture that was announced in a December 2023 paper. It was developed by computer scientists Dao (who also did the FlashAttention work I mentioned earlier) and Albert Gu.

Mamba does not use attention. Like other RNNs, it has a hidden state that acts as the model’s “memory.” Because the hidden state has a fixed size, longer prompts do not increase Mamba’s per-token cost.

When I started writing this article in March, my goal was to explain Mamba’s architecture in some detail. But then in May, the researchers released Mamba-2, which significantly changed the architecture from the original Mamba paper. I’ll be frank: I struggled to understand the original Mamba and have not figured out how Mamba-2 works.

But the key thing to understand is that Mamba has the potential to combine transformer-like performance with the efficiency of conventional RNNs.

In June, Dao and Gu co-authored a paper with Nvidia researchers that evaluated a Mamba model with 8 billion parameters. They found that models like Mamba were competitive with comparably sized transformers in a number of tasks, but they “lag behind Transformer models when it comes to in-context learning and recalling information from the context.”

Transformers are good at information recall because they “remember” every token of their context—this is also why they become less efficient as the context grows. In contrast, Mamba tries to compress the context into a fixed-size state, which necessarily means discarding some information from long contexts.

The Nvidia team found they got the best performance from a hybrid architecture that interleaved 24 Mamba layers with four attention layers. This worked better than either a pure transformer model or a pure Mamba model.

A model needs some attention layers so it can remember important details from early in its context. But a few attention layers seem to be sufficient; the rest of the attention layers can be replaced by cheaper Mamba layers with little impact on the model’s overall performance.

In August, an Israeli startup called AI21 announced its Jamba 1.5 family of models. The largest version had 398 billion parameters, making it comparable in size to Meta’s Llama 405B model. Jamba 1.5 Large has seven times more Mamba layers than attention layers. As a result, Jamba 1.5 Large requires far less memory than comparable models from Meta and others. For example, AI21 estimates that Llama 3.1 70B needs 80GB of memory to keep track of 256,000 tokens of context. Jamba 1.5 Large only needs 9GB, allowing the model to run on much less powerful hardware.

The Jamba 1.5 Large model gets an MMLU score of 80, significantly below the Llama 3.1 70B’s score of 86. So by this measure, Mamba doesn’t blow transformers out of the water. However, this may not be an apples-to-apples comparison. Frontier labs like Meta have invested heavily in training data and post-training infrastructure to squeeze a few more percentage points of performance out of benchmarks like MMLU. It’s possible that the same kind of intense optimization could close the gap between Jamba and frontier models.

So while the benefits of longer context windows is obvious, the best strategy to get there is not. In the short term, AI companies may continue using clever efficiency and scaling hacks (like FlashAttention and Ring Attention) to scale up vanilla LLMs. Longer term, we may see growing interest in Mamba and perhaps other attention-free architectures. Or maybe someone will come up with a totally new architecture that renders transformers obsolete.

But I am pretty confident that scaling up transformer-based frontier models isn’t going to be a solution on its own. If we want models that can handle billions of tokens—and many people do—we’re going to need to think outside the box.

Tim Lee was on staff at Ars from 2017 to 2021. Last year, he launched a newsletter, Understanding AI, that explores how AI works and how it’s changing our world. You can subscribe here.

Photo of Timothy B. Lee

Timothy is a senior reporter covering tech policy and the future of transportation. He lives in Washington DC.

Why AI language models choke on too much text Read More »

12-days-of-openai:-the-ars-technica-recap

12 days of OpenAI: The Ars Technica recap


Did OpenAI’s big holiday event live up to the billing?

Over the past 12 business days, OpenAI has announced a new product or demoed an AI feature every weekday, calling the PR event “12 days of OpenAI.” We’ve covered some of the major announcements, but we thought a look at each announcement might be useful for people seeking a comprehensive look at each day’s developments.

The timing and rapid pace of these announcements—particularly in light of Google’s competing releases—illustrates the intensifying competition in AI development. What might normally have been spread across months was compressed into just 12 business days, giving users and developers a lot to process as they head into 2025.

Humorously, we asked ChatGPT what it thought about the whole series of announcements, and it was skeptical that the event even took place. “The rapid-fire announcements over 12 days seem plausible,” wrote ChatGPT-4o, “But might strain credibility without a clearer explanation of how OpenAI managed such an intense release schedule, especially given the complexity of the features.”

But it did happen, and here’s a chronicle of what went down on each day.

Day 1: Thursday, December 5

On the first day of OpenAI, the company released its full o1 model, making it available to ChatGPT Plus and Team subscribers worldwide. The company reported that the model operates faster than its preview version and reduces major errors by 34 percent on complex real-world questions.

The o1 model brings new capabilities for image analysis, allowing users to upload and receive detailed explanations of visual content. OpenAI said it plans to expand o1’s features to include web browsing and file uploads in ChatGPT, with API access coming soon. The API version will support vision tasks, function calling, and structured outputs for system integration.

OpenAI also launched ChatGPT Pro, a $200 subscription tier that provides “unlimited” access to o1, GPT-4o, and Advanced Voice features. Pro subscribers receive an exclusive version of o1 that uses additional computing power for complex problem-solving. Alongside this release, OpenAI announced a grant program that will provide ChatGPT Pro access to 10 medical researchers at established institutions, with plans to extend grants to other fields.

Day 2: Friday, December 6

Day 2 wasn’t as exciting. OpenAI unveiled Reinforcement Fine-Tuning (RFT), a model customization method that will let developers modify “o-series” models for specific tasks. The technique reportedly goes beyond traditional supervised fine-tuning by using reinforcement learning to help models improve their reasoning abilities through repeated iterations. In other words, OpenAI created a new way to train AI models that lets them learn from practice and feedback.

OpenAI says that Berkeley Lab computational researcher Justin Reese tested RFT for researching rare genetic diseases, while Thomson Reuters has created a specialized o1-mini model for its CoCounsel AI legal assistant. The technique requires developers to provide a dataset and evaluation criteria, with OpenAI’s platform managing the reinforcement learning process.

OpenAI plans to release RFT to the public in early 2024 but currently offers limited access through its Reinforcement Fine-Tuning Research Program for researchers, universities, and companies.

Day 3: Monday, December 9

On day 3, OpenAI released Sora, its text-to-video model, as a standalone product now accessible through sora.com for ChatGPT Plus and Pro subscribers. The company says the new version operates faster than the research preview shown in February 2024, when OpenAI first demonstrated the model’s ability to create videos from text descriptions.

The release moved Sora from research preview to a production service, marking OpenAI’s official entry into the video synthesis market. The company published a blog post detailing the subscription tiers and deployment strategy for the service.

Day 4: Tuesday, December 10

On day 4, OpenAI moved its Canvas feature out of beta testing, making it available to all ChatGPT users, including those on free tiers. Canvas provides a dedicated interface for extended writing and coding projects beyond the standard chat format, now with direct integration into the GPT-4o model.

The updated canvas allows users to run Python code within the interface and includes a text-pasting feature for importing existing content. OpenAI added compatibility with custom GPTs and a “show changes” function that tracks modifications to writing and code. The company said Canvas is now on chatgpt.com for web users and also available through a Windows desktop application, with more features planned for future updates.

Day 5: Wednesday, December 11

On day 5, OpenAI announced that ChatGPT would integrate with Apple Intelligence across iOS, iPadOS, and macOS devices. The integration works on iPhone 16 series phones, iPhone 15 Pro models, iPads with A17 Pro or M1 chips and later, and Macs with M1 processors or newer, running their respective latest operating systems.

The integration lets users access ChatGPT’s features (such as they are), including image and document analysis, directly through Apple’s system-level intelligence features. The feature works with all ChatGPT subscription tiers and operates within Apple’s privacy framework. Iffy message summaries remain unaffected by the additions.

Enterprise and Team account users need administrator approval to access the integration.

Day 6: Thursday, December 12

On the sixth day, OpenAI added two features to ChatGPT’s voice capabilities: “video calling” with screen sharing support for ChatGPT Plus and Pro subscribers and a seasonal Santa Claus voice preset.

The new visual Advanced Voice Mode features work through the mobile app, letting users show their surroundings or share their screen with the AI model during voice conversations. While the rollout covers most countries, users in several European nations, including EU member states, Switzerland, Iceland, Norway, and Liechtenstein, will get access at a later date. Enterprise and education users can expect these features in January.

The Santa voice option appears as a snowflake icon in the ChatGPT interface across mobile devices, web browsers, and desktop apps, with conversations in this mode not affecting chat history or memory. Don’t expect Santa to remember what you want for Christmas between sessions.

Day 7: Friday, December 13

OpenAI introduced Projects, a new organizational feature in ChatGPT that lets users group related conversations and files, on day 7. The feature works with the company’s GPT-4o model and provides a central location for managing resources related to specific tasks or topics—kinda like Anthropic’s “Projects” feature.

ChatGPT Plus, Pro, and Team subscribers can currently access Projects through chatgpt.com and the Windows desktop app, with view-only support on mobile devices and macOS. Users can create projects by clicking a plus icon in the sidebar, where they can add files and custom instructions that provide context for future conversations.

OpenAI said it plans to expand Projects in 2024 with support for additional file types, cloud storage integration through Google Drive and Microsoft OneDrive, and compatibility with other models like o1. Enterprise and education users will receive access to Projects in January.

Day 8: Monday, December 16

On day 8, OpenAI expanded its search features in ChatGPT, extending access to all users with free accounts while reportedly adding speed improvements and mobile optimizations. Basically, you can use ChatGPT like a web search engine, although in practice it doesn’t seem to be as comprehensive as Google Search at the moment.

The update includes a new maps interface and integration with Advanced Voice, allowing users to perform searches during voice conversations. The search capability, which previously required a paid subscription, now works across all platforms where ChatGPT operates.

Day 9: Tuesday, December 17

On day 9, OpenAI released its o1 model through its API platform, adding support for function calling, developer messages, and vision processing capabilities. The company also reduced GPT-4o audio pricing by 60 percent and introduced a GPT-4o mini option that costs one-tenth of previous audio rates.

OpenAI also simplified its WebRTC integration for real-time applications and unveiled Preference Fine-Tuning, which provides developers new ways to customize models. The company also launched beta versions of software development kits for the Go and Java programming languages, expanding its toolkit for developers.

Day 10: Wednesday, December 18

On Wednesday, OpenAI did something a little fun and launched voice and messaging access to ChatGPT through a toll-free number (1-800-CHATGPT), as well as WhatsApp. US residents can make phone calls with a 15-minute monthly limit, while global users can message ChatGPT through WhatsApp at the same number.

OpenAI said the release is a way to reach users who lack consistent high-speed Internet access or want to try AI through familiar communication channels, but it’s also just a clever hack. As evidence, OpenAI notes that these new interfaces serve as experimental access points, with more “limited functionality” than the full ChatGPT service, and still recommends existing users continue using their regular ChatGPT accounts for complete features.

Day 11: Thursday, December 19

On Thursday, OpenAI expanded ChatGPT’s desktop app integration to include additional coding environments and productivity software. The update added support for Jetbrains IDEs like PyCharm and IntelliJ IDEA, VS Code variants including Cursor and VSCodium, and text editors such as BBEdit and TextMate.

OpenAI also included integration with Apple Notes, Notion, and Quip while adding Advanced Voice Mode compatibility when working with desktop applications. These features require manual activation for each app and remain available to paid subscribers, including Plus, Pro, Team, Enterprise, and Education users, with Enterprise and Education customers needing administrator approval to enable the functionality.

Day 12: Friday, December 20

On Friday, OpenAI concluded its twelve days of announcements by previewing two new simulated reasoning models, o3 and o3-mini, while opening applications for safety and security researchers to test them before public release. Early evaluations show o3 achieving a 2727 rating on Codeforces programming contests and scoring 96.7 percent on AIME 2024 mathematics problems.

The company reports o3 set performance records on advanced benchmarks, solving 25.2 percent of problems on EpochAI’s Frontier Math evaluations and scoring above 85 percent on the ARC-AGI test, which is comparable to human results. OpenAI also published research about “deliberative alignment,” a technique used in developing o1. The company has not announced firm release dates for either new o3 model, but CEO Sam Altman said o3-mini might ship in late January.

So what did we learn?

OpenAI’s December campaign revealed that OpenAI had a lot of things sitting around that it needed to ship, and it picked a fun theme to unite the announcements. Google responded in kind, as we have covered.

Several trends from the releases stand out. OpenAI is heavily investing in multimodal capabilities. The o1 model’s release, Sora’s evolution from research preview to product, and the expansion of voice features with video calling all point toward systems that can seamlessly handle text, images, voice, and video.

The company is also focusing heavily on developer tools and customization, so it can continue to have a cloud service business and have its products integrated into other applications. Between the API releases, Reinforcement Fine-Tuning, and expanded IDE integrations, OpenAI is building out its ecosystem for developers and enterprises. And the introduction of o3 shows that OpenAI is still attempting to push technological boundaries, even in the face of diminishing returns in training LLM base models.

OpenAI seems to be positioning itself for a 2025 where generative AI moves beyond text chatbots and simple image generators and finds its way into novel applications that we probably can’t even predict yet. We’ll have to wait and see what the company and developers come up with in the year ahead.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

12 days of OpenAI: The Ars Technica recap Read More »

openai-announces-o3-and-o3-mini,-its-next-simulated-reasoning-models

OpenAI announces o3 and o3-mini, its next simulated reasoning models

On Friday, during Day 12 of its “12 days of OpenAI,” OpenAI CEO Sam Altman announced its latest AI “reasoning” models, o3 and o3-mini, which build upon the o1 models launched earlier this year. The company is not releasing them yet but will make these models available for public safety testing and research access today.

The models use what OpenAI calls “private chain of thought,” where the model pauses to examine its internal dialog and plan ahead before responding, which you might call “simulated reasoning” (SR)—a form of AI that goes beyond basic large language models (LLMs).

The company named the model family “o3” instead of “o2” to avoid potential trademark conflicts with British telecom provider O2, according to The Information. During Friday’s livestream, Altman acknowledged his company’s naming foibles, saying, “In the grand tradition of OpenAI being really, truly bad at names, it’ll be called o3.”

According to OpenAI, the o3 model earned a record-breaking score on the ARC-AGI benchmark, a visual reasoning benchmark that has gone unbeaten since its creation in 2019. In low-compute scenarios, o3 scored 75.7 percent, while in high-compute testing, it reached 87.5 percent—comparable to human performance at an 85 percent threshold.

OpenAI also reported that o3 scored 96.7 percent on the 2024 American Invitational Mathematics Exam, missing just one question. The model also reached 87.7 percent on GPQA Diamond, which contains graduate-level biology, physics, and chemistry questions. On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent.

OpenAI announces o3 and o3-mini, its next simulated reasoning models Read More »

not-to-be-outdone-by-openai,-google-releases-its-own-“reasoning”-ai-model

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

Google DeepMind’s chief scientist, Jeff Dean, says that the model receives extra computing power, writing on X, “we see promising results when we increase inference time computation!” The model works by pausing to consider multiple related prompts before providing what it determines to be the most accurate answer.

Since OpenAI’s jump into the “reasoning” field in September with o1-preview and o1-mini, several companies have been rushing to achieve feature parity with their own models. For example, DeepSeek launched DeepSeek-R1 in early November, while Alibaba’s Qwen team released its own “reasoning” model, QwQ earlier this month.

While some claim that reasoning models can help solve complex mathematical or academic problems, these models might not be for everybody. While they perform well on some benchmarks, questions remain about their actual usefulness and accuracy. Also, the high computing costs needed to run reasoning models have created some rumblings about their long-term viability. That high cost is why OpenAI’s ChatGPT Pro costs $200 a month, for example.

Still, it appears Google is serious about pursuing this particular AI technique. Logan Kilpatrick, a Google employee in its AI Studio, called it “the first step in our reasoning journey” in a post on X.

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model Read More »

call-chatgpt-from-any-phone-with-openai’s-new-1-800-voice-service

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service

On Wednesday, OpenAI launched a 1-800-CHATGPT (1-800-242-8478) telephone number that anyone in the US can call to talk to ChatGPT via voice chat for up to 15 minutes for free. The company also says that people outside the US can send text messages to the same number for free using WhatsApp.

Upon calling, users hear a voice say, “Hello again, it’s ChatGPT, an AI assistant. Our conversation may be reviewed for safety. How can I help you?” Callers can ask ChatGPT anything they would normally ask the AI assistant and have a live, interactive conversation.

During a livestream demo of “Calling with ChatGPT” during Day 10 of “12 Days of OpenAI,” OpenAI employees demonstrated several examples of the telephone-based voice chat in action, asking ChatGPT to identify a distinctive house in California and for help in translating a message into Spanish for a friend. For fun, they showed calls from an iPhone, a flip phone, and a vintage rotary phone.

OpenAI developers demonstrate calling 1-800-CHATGPT during a livestream on December 18, 2024.

OpenAI developers demonstrate calling 1-800-CHATGPT during a livestream on December 18, 2024. Credit: OpenAI

OpenAI says the new features came out of an internal OpenAI “hack week” project that a team built just a few weeks ago. The company says its goal is to make ChatGPT more accessible if someone does not have a smartphone or a computer handy.

During the livestream, an OpenAI employee mentioned that 15 minutes of voice chatting are free and that you can download the app and create an account to get more. While the audio chat version seems to be running a full version of GPT-4o on the back end, a developer during the livestream said the free WhatsApp text mode is using GPT-4o mini.

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service Read More »