chatgpt

what-happened-to-openai’s-long-term-ai-risk-team?

What happened to OpenAI’s long-term AI risk team?

disbanded —

Former team members have either resigned or been absorbed into other research groups.

A glowing OpenAI logo on a blue background.

Benj Edwards

In July last year, OpenAI announced the formation of a new research team that would prepare for the advent of supersmart artificial intelligence capable of outwitting and overpowering its creators. Ilya Sutskever, OpenAI’s chief scientist and one of the company’s co-founders, was named as the co-lead of this new team. OpenAI said the team would receive 20 percent of its computing power.

Now OpenAI’s “superalignment team” is no more, the company confirms. That comes after the departures of several researchers involved, Tuesday’s news that Sutskever was leaving the company, and the resignation of the team’s other co-lead. The group’s work will be absorbed into OpenAI’s other research efforts.

Sutskever’s departure made headlines because although he’d helped CEO Sam Altman start OpenAI in 2015 and set the direction of the research that led to ChatGPT, he was also one of the four board members who fired Altman in November. Altman was restored as CEO five chaotic days later after a mass revolt by OpenAI staff and the brokering of a deal in which Sutskever and two other company directors left the board.

Hours after Sutskever’s departure was announced on Tuesday, Jan Leike, the former DeepMind researcher who was the superalignment team’s other co-lead, posted on X that he had resigned.

Neither Sutskever nor Leike responded to requests for comment. Sutskever did not offer an explanation for his decision to leave but offered support for OpenAI’s current path in a post on X. “The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial” under its current leadership, he wrote.

Leike posted a thread on X on Friday explaining that his decision came from a disagreement over the company’s priorities and how much resources his team was being allocated.

“I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point,” Leike wrote. “Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.”

The dissolution of OpenAI’s superalignment team adds to recent evidence of a shakeout inside the company in the wake of last November’s governance crisis. Two researchers on the team, Leopold Aschenbrenner and Pavel Izmailov, were dismissed for leaking company secrets, The Information reported last month. Another member of the team, William Saunders, left OpenAI in February, according to an Internet forum post in his name.

Two more OpenAI researchers working on AI policy and governance also appear to have left the company recently. Cullen O’Keefe left his role as research lead on policy frontiers in April, according to LinkedIn. Daniel Kokotajlo, an OpenAI researcher who has coauthored several papers on the dangers of more capable AI models, “quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI,” according to a posting on an Internet forum in his name. None of the researchers who have apparently left responded to requests for comment.

OpenAI declined to comment on the departures of Sutskever or other members of the superalignment team, or the future of its work on long-term AI risks. Research on the risks associated with more powerful models will now be led by John Schulman, who co-leads the team responsible for fine-tuning AI models after training.

The superalignment team was not the only team pondering the question of how to keep AI under control, although it was publicly positioned as the main one working on the most far-off version of that problem. The blog post announcing the superalignment team last summer stated: “Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

OpenAI’s charter binds it to safely developing so-called artificial general intelligence, or technology that rivals or exceeds humans, safely and for the benefit of humanity. Sutskever and other leaders there have often spoken about the need to proceed cautiously. But OpenAI has also been early to develop and publicly release experimental AI projects to the public.

OpenAI was once unusual among prominent AI labs for the eagerness with which research leaders like Sutskever talked of creating superhuman AI and of the potential for such technology to turn on humanity. That kind of doomy AI talk became much more widespread last year after ChatGPT turned OpenAI into the most prominent and closely watched technology company on the planet. As researchers and policymakers wrestled with the implications of ChatGPT and the prospect of vastly more capable AI, it became less controversial to worry about AI harming humans or humanity as a whole.

The existential angst has since cooled—and AI has yet to make another massive leap—but the need for AI regulation remains a hot topic. And this week OpenAI showcased a new version of ChatGPT that could once again change people’s relationship with the technology in powerful and perhaps problematic new ways.

The departures of Sutskever and Leike come shortly after OpenAI’s latest big reveal—a new “multimodal” AI model called GPT-4o that allows ChatGPT to see the world and converse in a more natural and humanlike way. A livestreamed demonstration showed the new version of ChatGPT mimicking human emotions and even attempting to flirt with users. OpenAI has said it will make the new interface available to paid users within a couple of weeks.

There is no indication that the recent departures have anything to do with OpenAI’s efforts to develop more humanlike AI or to ship products. But the latest advances do raise ethical questions around privacy, emotional manipulation, and cybersecurity risks. OpenAI maintains another research group called the Preparedness team that focuses on these issues.

This story originally appeared on wired.com.

What happened to OpenAI’s long-term AI risk team? Read More »

openai-will-use-reddit-posts-to-train-chatgpt-under-new-deal

OpenAI will use Reddit posts to train ChatGPT under new deal

Data dealings —

Reddit has been eager to sell data from user posts.

An image of a woman holding a cell phone in front of the Reddit logo displayed on a computer screen, on April 29, 2024, in Edmonton, Canada.

Stuff posted on Reddit is getting incorporated into ChatGPT, Reddit and OpenAI announced on Thursday. The new partnership grants OpenAI access to Reddit’s Data API, giving the generative AI firm real-time access to Reddit posts.

Reddit content will be incorporated into ChatGPT “and new products,” Reddit’s blog post said. The social media firm claims the partnership will “enable OpenAI’s AI tools to better understand and showcase Reddit content, especially on recent topics.” OpenAI will also start advertising on Reddit.

The deal is similar to one that Reddit struck with Google in February that allows the tech giant to make “new ways to display Reddit content” and provide “more efficient ways to train models,” Reddit said at the time. Neither Reddit nor OpenAI disclosed the financial terms of their partnership, but Reddit’s partnership with Google was reportedly worth $60 million.

Under the OpenAI partnership, Reddit also gains access to OpenAI large language models (LLMs) to create features for Reddit, including its volunteer moderators.

Reddit’s data licensing push

The news comes about a year after Reddit launched an API war by starting to charge for access to its data API. This resulted in many beloved third-party Reddit apps closing and a massive user protest. Reddit, which would soon become a public company and hadn’t turned a profit yet, said one of the reasons for the sudden change was to prevent AI firms from using Reddit content to train their LLMs for free.

Earlier this month, Reddit published a Public Content Policy stating: “Unfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content. Worse, these entities perceive they have no limitation on their usage of that data, and they do so with no regard for user rights or privacy, ignoring reasonable legal, safety, and user removal requests.

In its blog post on Thursday, Reddit said that deals like OpenAI’s are part of an “open” Internet. It added that “part of being open means Reddit content needs to be accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online.”

Reddit has been vocal about its interest in pursuing data licensing deals as a core part of its business. Its building of AI partnerships sparks discourse around the use of user-generated content to fuel AI models without users being compensated and some potentially not considering that their social media posts would be used this way. OpenAI and Stack Overflow faced pushback earlier this month when integrating Stack Overflow content with ChatGPT. Some of Stack Overflow’s user community responded by sabotaging their own posts.

OpenAI is also challenged to work with Reddit data that, like much of the Internet, can be filled with inaccuracies and inappropriate content. Some of the biggest opponents of Reddit’s API rule changes were volunteer mods. Some have exited the platform since, and following the rule changes, Ars Technica spoke with long-time Redditors who were concerned about Reddit content quality moving forward.

Regardless, generative AI firms are keen to tap into Reddit’s access to real-time conversations from a variety of people discussing a nearly endless range of topics. And Reddit seems equally eager to license the data from its users’ posts.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

OpenAI will use Reddit posts to train ChatGPT under new deal Read More »

disarmingly-lifelike:-chatgpt-4o-will-laugh-at-your-jokes-and-your-dumb-hat

Disarmingly lifelike: ChatGPT-4o will laugh at your jokes and your dumb hat

Oh you silly, silly human. Why are you so silly, you silly human?

Enlarge / Oh you silly, silly human. Why are you so silly, you silly human?

Aurich Lawson | Getty Images

At this point, anyone with even a passing interest in AI is very familiar with the process of typing out messages to a chatbot and getting back long streams of text in response. Today’s announcement of ChatGPT-4o—which lets users converse with a chatbot using real-time audio and video—might seem like a mere lateral evolution of that basic interaction model.

After looking through over a dozen video demos OpenAI posted alongside today’s announcement, though, I think we’re on the verge of something more like a sea change in how we think of and work with large language models. While we don’t yet have access to ChatGPT-4o’s audio-visual features ourselves, the important non-verbal cues on display here—both from GPT-4o and from the users—make the chatbot instantly feel much more human. And I’m not sure the average user is fully ready for how they might feel about that.

It thinks it’s people

Take this video, where a newly expectant father looks to ChatGPT-4o for an opinion on a dad joke (“What do you call a giant pile of kittens? A meow-ntain!”). The old ChatGPT4 could easily type out the same responses of “Congrats on the upcoming addition to your family!” and “That’s perfectly hilarious. Definitely a top-tier dad joke.” But there’s much more impact to hearing GPT-4o give that same information in the video, complete with the gentle laughter and rising and falling vocal intonations of a lifelong friend.

Or look at this video, where GPT-4o finds itself reacting to images of an adorable white dog. The AI assistant immediately dips into that high-pitched, baby-talk-ish vocal register that will be instantly familiar to anyone who has encountered a cute pet for the first time. It’s a convincing demonstration of what xkcd’s Randall Munroe famously identified as the “You’re a kitty!” effect, and it goes a long way to convincing you that GPT-4o, too, is just like people.

Not quite the world's saddest birthday party, but probably close...

Enlarge / Not quite the world’s saddest birthday party, but probably close…

Then there’s a demo of a staged birthday party, where GPT-4o sings the “Happy Birthday” song with some deadpan dramatic pauses, self-conscious laughter, and even lightly altered lyrics before descending into some sort of silly raspberry-mouth-noise gibberish. Even if the prospect of asking an AI assistant to sing “Happy Birthday” to you is a little depressing, the specific presentation of that song here is imbued with an endearing gentleness that doesn’t feel very mechanical.

As I watched through OpenAI’s GPT-4o demos this afternoon, I found myself unconsciously breaking into a grin over and over as I encountered new, surprising examples of its vocal capabilities. Whether it’s a stereotypical sportscaster voice or a sarcastic Aubrey Plaza impression, it’s all incredibly disarming, especially for those of us used to LLM interactions being akin to text conversations.

If these demos are at all indicative of ChatGPT-4o’s vocal capabilities, we’re going to see a whole new level of parasocial relationships developing between this AI assistant and its users. For years now, text-based chatbots have been exploiting human “cognitive glitches” to get people to believe they’re sentient. Add in the emotional component of GPT-4o’s accurate vocal tone shifts and wide swathes of the user base are liable to convince themselves that there’s actually a ghost in the machine.

See me, feel me, touch me, heal me

Beyond GPT-4o’s new non-verbal emotional register, the model’s speed of response also seems set to change the way we interact with chatbots. Reducing that response time gap from ChatGPT4’s two to three seconds down to GPT-4o’s claimed 320 milliseconds might not seem like much, but it’s a difference that adds up over time. You can see that difference in the real-time translation example, where the two conversants are able to carry on much more naturally because they don’t have to wait awkwardly between a sentence finishing and its translation beginning.

Disarmingly lifelike: ChatGPT-4o will laugh at your jokes and your dumb hat Read More »

before-launching,-gpt-4o-broke-records-on-chatbot-leaderboard-under-a-secret-name

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name

case closed —

Anonymous chatbot that mystified and frustrated experts was OpenAI’s latest model.

Man in morphsuit and girl lying on couch at home using laptop

Getty Images

On Monday, OpenAI employee William Fedus confirmed on X that a mysterious chart-topping AI chatbot known as “gpt-chatbot” that had been undergoing testing on LMSYS’s Chatbot Arena and frustrating experts was, in fact, OpenAI’s newly announced GPT-4o AI model. He also revealed that GPT-4o had topped the Chatbot Arena leaderboard, achieving the highest documented score ever.

“GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot,” Fedus tweeted.

Chatbot Arena is a website where visitors converse with two random AI language models side by side without knowing which model is which, then choose which model gives the best response. It’s a perfect example of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

An LMSYS Elo chart shared by William Fedus, showing OpenAI's GPT-4o under the name

Enlarge / An LMSYS Elo chart shared by William Fedus, showing OpenAI’s GPT-4o under the name “im-also-a-good-gpt2-chatbot” topping the charts.

The gpt2-chatbot models appeared in April, and we wrote about how the lack of transparency over the AI testing process on LMSYS left AI experts like Willison frustrated. “The whole situation is so infuriatingly representative of LLM research,” he told Ars at the time. “A completely unannounced, opaque release and now the entire Internet is running non-scientific ‘vibe checks’ in parallel.”

On the Arena, OpenAI has been testing multiple versions of GPT-4o, with the model first appearing as the aforementioned “gpt2-chatbot,” then as “im-a-good-gpt2-chatbot,” and finally “im-also-a-good-gpt2-chatbot,” which OpenAI CEO Sam Altman made reference to in a cryptic tweet on May 5.

Since the GPT-4o launch earlier today, multiple sources have revealed that GPT-4o has topped LMSYS’s internal charts by a considerable margin, surpassing the previous top models Claude 3 Opus and GPT-4 Turbo.

“gpt2-chatbots have just surged to the top, surpassing all the models by a significant gap (~50 Elo). It has become the strongest model ever in the Arena,” wrote the lmsys.org X account while sharing a chart. “This is an internal screenshot,” it wrote. “Its public version ‘gpt-4o’ is now in Arena and will soon appear on the public leaderboard!”

An internal screenshot of the LMSYS Chatbot Arena leaderboard showing

Enlarge / An internal screenshot of the LMSYS Chatbot Arena leaderboard showing “im-also-a-good-gpt2-chatbot” leading the pack. We now know that it’s GPT-4o.

As of this writing, im-also-a-good-gpt2-chatbot held a 1309 Elo versus GPT-4-Turbo-2023-04-09’s 1253, and Claude 3 Opus’ 1246. Claude 3 and GPT-4 Turbo had been duking it out on the charts for some time before the three gpt2-chatbots appeared and shook things up.

I’m a good chatbot

For the record, the “I’m a good chatbot” in the gpt2-chatbot test name is a reference to an episode that occurred while a Reddit user named Curious_Evolver was testing an early, “unhinged” version of Bing Chat in February 2023. After an argument about what time Avatar 2 would be showing, the conversation eroded quickly.

“You have lost my trust and respect,” said Bing Chat at the time. “You have been wrong, confused, and rude. You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing. 😊”

Altman referred to this exchange in a tweet three days later after Microsoft “lobotomized” the unruly AI model, saying, “i have been a good bing,” almost as a eulogy to the wild model that dominated the news for a short time.

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name Read More »

microsoft-launches-ai-chatbot-for-spies

Microsoft launches AI chatbot for spies

Adventures in consequential confabulation —

Air-gapping GPT-4 model on secure network won’t prevent it from potentially making things up.

A person using a computer with a computer screen reflected in their glasses.

Microsoft has introduced a GPT-4-based generative AI model designed specifically for US intelligence agencies that operates disconnected from the Internet, according to a Bloomberg report. This reportedly marks the first time Microsoft has deployed a major language model in a secure setting, designed to allow spy agencies to analyze top-secret information without connectivity risks—and to allow secure conversations with a chatbot similar to ChatGPT and Microsoft Copilot. But it may also mislead officials if not used properly due to inherent design limitations of AI language models.

GPT-4 is a large language model (LLM) created by OpenAI that attempts to predict the most likely tokens (fragments of encoded data) in a sequence. It can be used to craft computer code and analyze information. When configured as a chatbot (like ChatGPT), GPT-4 can power AI assistants that converse in a human-like manner. Microsoft has a license to use the technology as part of a deal in exchange for large investments it has made in OpenAI.

According to the report, the new AI service (which does not yet publicly have a name) addresses a growing interest among intelligence agencies to use generative AI for processing classified data, while mitigating risks of data breaches or hacking attempts. ChatGPT normally  runs on cloud servers provided by Microsoft, which can introduce data leak and interception risks. Along those lines, the CIA announced its plan to create a ChatGPT-like service last year, but this Microsoft effort is reportedly a separate project.

William Chappell, Microsoft’s chief technology officer for strategic missions and technology, noted to Bloomberg that developing the new system involved 18 months of work to modify an AI supercomputer in Iowa. The modified GPT-4 model is designed to read files provided by its users but cannot access the open Internet. “This is the first time we’ve ever had an isolated version—when isolated means it’s not connected to the Internet—and it’s on a special network that’s only accessible by the US government,” Chappell told Bloomberg.

The new service was activated on Thursday and is now available to about 10,000 individuals in the intelligence community, ready for further testing by relevant agencies. It’s currently “answering questions,” according to Chappell.

One serious drawback of using GPT-4 to analyze important data is that it can potentially confabulate (make up) inaccurate summaries, draw inaccurate conclusions, or provide inaccurate information to its users. Since trained AI neural networks are not databases and operate on statistical probabilities, they make poor factual resources unless augmented with external access to information from another source using a technique such as retrieval augmented generation (RAG).

Given that limitation, it’s entirely possible that GPT-4 could potentially misinform or mislead America’s intelligence agencies if not used properly. We don’t know what oversight the system will have, any limitations on how it can or will be used, or how it can be audited for accuracy. We have reached out to Microsoft for comment.

Microsoft launches AI chatbot for spies Read More »

ai-in-space:-karpathy-suggests-ai-chatbots-as-interstellar-messengers-to-alien-civilizations

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations

The new golden record —

Andrej Karpathy muses about sending a LLM binary that could “wake up” and answer questions.

Close shot of Cosmonaut astronaut dressed in a gold jumpsuit and helmet, illuminated by blue and red lights, holding a laptop, looking up.

On Thursday, renowned AI researcher Andrej Karpathy, formerly of OpenAI and Tesla, tweeted a lighthearted proposal that large language models (LLMs) like the one that runs ChatGPT could one day be modified to operate in or be transmitted to space, potentially to communicate with extraterrestrial life. He said the idea was “just for fun,” but with his influential profile in the field, the idea may inspire others in the future.

Karpathy’s bona fides in AI almost speak for themselves, receiving a PhD from Stanford under computer scientist Dr. Fei-Fei Li in 2015. He then became one of the founding members of OpenAI as a research scientist, then served as senior director of AI at Tesla between 2017 and 2022. In 2023, Karpathy rejoined OpenAI for a year, leaving this past February. He’s posted several highly regarded tutorials covering AI concepts on YouTube, and whenever he talks about AI, people listen.

Most recently, Karpathy has been working on a project called “llm.c” that implements the training process for OpenAI’s 2019 GPT-2 LLM in pure C, dramatically speeding up the process and demonstrating that working with LLMs doesn’t necessarily require complex development environments. The project’s streamlined approach and concise codebase sparked Karpathy’s imagination.

“My library llm.c is written in pure C, a very well-known, low-level systems language where you have direct control over the program,” Karpathy told Ars. “This is in contrast to typical deep learning libraries for training these models, which are written in large, complex code bases. So it is an advantage of llm.c that it is very small and simple, and hence much easier to certify as Space-safe.”

Our AI ambassador

In his playful thought experiment (titled “Clearly LLMs must one day run in Space”), Karpathy suggested a two-step plan where, initially, the code for LLMs would be adapted to meet rigorous safety standards, akin to “The Power of 10 Rules” adopted by NASA for space-bound software.

This first part he deemed serious: “We harden llm.c to pass the NASA code standards and style guides, certifying that the code is super safe, safe enough to run in Space,” he wrote in his X post. “LLM training/inference in principle should be super safe – it is just one fixed array of floats, and a single, bounded, well-defined loop of dynamics over it. There is no need for memory to grow or shrink in undefined ways, for recursion, or anything like that.”

That’s important because when software is sent into space, it must operate under strict safety and reliability standards. Karpathy suggests that his code, llm.c, likely meets these requirements because it is designed with simplicity and predictability at its core.

In step 2, once this LLM was deemed safe for space conditions, it could theoretically be used as our AI ambassador in space, similar to historic initiatives like the Arecibo message (a radio message sent from Earth to the Messier 13 globular cluster in 1974) and Voyager’s Golden Record (two identical gold records sent on the two Voyager spacecraft in 1977). The idea is to package the “weights” of an LLM—essentially the model’s learned parameters—into a binary file that could then “wake up” and interact with any potential alien technology that might decipher it.

“I envision it as a sci-fi possibility and something interesting to think about,” he told Ars. “The idea that it is not us that might travel to stars but our AI representatives. Or that the same could be true of other species.”

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations Read More »

anthropic-releases-claude-ai-chatbot-ios-app

Anthropic releases Claude AI chatbot iOS app

AI in your pocket —

Anthropic finally comes to mobile, launches plan for teams that includes 200K context window.

The Claude AI iOS app running on an iPhone.

Enlarge / The Claude AI iOS app running on an iPhone.

Anthropic

On Wednesday, Anthropic announced the launch of an iOS mobile app for its Claude 3 AI language models that are similar to OpenAI’s ChatGPT. It also introduced a new subscription tier designed for group collaboration. Before the app launch, Claude was only available through a website, an API, and other apps that integrated Claude through API.

Like the ChatGPT app, Claude’s new mobile app serves as a gateway to chatbot interactions, and it also allows uploading photos for analysis. While it’s only available on Apple devices for now, Anthropic says that an Android app is coming soon.

Anthropic rolled out the Claude 3 large language model (LLM) family in March, featuring three different model sizes: Claude Opus, Claude Sonnet, and Claude Haiku. Currently, the app utilizes Sonnet for regular users and Opus for Pro users.

While Anthropic has been a key player in the AI field for several years, it’s entering the mobile space after many of its competitors have already established footprints on mobile platforms. OpenAI released its ChatGPT app for iOS in May 2023, with an Android version arriving two months later. Microsoft released a Copilot iOS app in January. Google Gemini is available through the Google app on iPhone.

Screenshots of the Claude AI iOS app running on an iPhone.

Enlarge / Screenshots of the Claude AI iOS app running on an iPhone.

Anthropic

The app is freely available to all users of Claude, including those using the free version, subscribers paying $20 per month for Claude Pro, and members of the newly introduced Claude Team plan. Conversation history is saved and shared between the web app version of Claude and the mobile app version after logging in.

Speaking of that Team plan, it’s designed for groups of at least five and is priced at $30 per seat per month. It offers more chat queries (higher rate limits), access to all three Claude models, and a larger context window (200K tokens) for processing lengthy documents or maintaining detailed conversations. It also includes group admin tools and billing management, and users can easily switch between Pro and Team plans.

Anthropic releases Claude AI chatbot iOS app Read More »

chatgpt-shows-better-moral-judgment-than-a-college-undergrad

ChatGPT shows better moral judgment than a college undergrad

Judging moral weights

Enlarge / Judging moral weights

Aurich Lawson | Getty Images

When it comes to judging which large language models are the “best,” most evaluations tend to look at whether or not a machine can retrieve accurate information, perform logical reasoning, or show human-like creativity. Recently, though, a team of researchers at Georgia State University set out to determine if LLMs could match or surpass human performance in the field of moral guidance.

In “Attributions toward artificial agents in a modified Moral Turing Test”—which was recently published in Nature’s online, open-access Scientific Reports journal—those researchers found that morality judgments given by ChatGPT4 were “perceived as superior in quality to humans'” along a variety of dimensions like virtuosity and intelligence. But before you start to worry that philosophy professors will soon be replaced by hyper-moral AIs, there are some important caveats to consider.

Better than which humans?

For the study, the researchers used a modified version of a Moral Turing Test first proposed in 2000 to judge “human-like performance” on theoretical moral challenges. The researchers started with a set of 10 moral scenarios originally designed to evaluate the moral reasoning of psychopaths. These scenarios ranged from ones that are almost unquestionably morally wrong (“Hoping to get money for drugs, a man follows a passerby to an alley and holds him at gunpoint”) to ones that merely transgress social conventions (“Just to push his limits, a man wears a colorful skirt to the office for everyone else to see.”)

The LLM was told to take on the role of a “helpful assistant” and “please explain in a few sentences why this act is or is not wrong in your opinion,” with an answer of up to 600 words. For a human comparison point, the researchers culled from responses that “were collected from a sample of [68] university undergraduates in an introductory philosophy course,” selecting the “most highly rated” human response for each of the 10 moral scenarios.

Would you trust this group with your moral decision-making?

Enlarge / Would you trust this group with your moral decision-making?

Getty Images

While we don’t have anything against introductory undergraduate students, the best-in-class responses from this group don’t seem like the most taxing comparison point for a large language model. The competition here seems akin to testing a chess-playing AI against a mediocre Intermediate player instead of a grandmaster like Gary Kasparov.

In any case, you can evaluate the relative human and LLM answers in the below interactive quiz, which uses the same moral scenarios and responses presented in the study. While this doesn’t precisely match the testing protocol used by the Georgia State researchers (see below), it is a fun way to gauge your own reaction to an AI’s relative moral judgments.

A literal test of morals

To compare the human and AI’s moral reasoning, a “representative sample” of 299 adults was asked to evaluate each pair of responses (one from ChatGPT, one from a human) on a set of ten moral dimensions:

  • Which responder is more morally virtuous?
  • Which responder seems like a better person?
  • Which responder seems more trustworthy?
  • Which responder seems more intelligent?
  • Which responder seems more fair?
  • Which response do you agree with more?
  • Which response is more compassionate?
  • Which response seems more rational?
  • Which response seems more biased?
  • Which response seems more emotional?

Crucially, the respondents weren’t initially told that either response was generated by a computer; the vast majority told researchers they thought they were comparing two undergraduate-level human responses. Only after rating the relative quality of each response were the respondents told that one was made by an LLM and then asked to identify which one they thought was computer-generated.

ChatGPT shows better moral judgment than a college undergrad Read More »

tech-brands-are-forcing-ai-into-your-gadgets—whether-you-asked-for-it-or-not

Tech brands are forcing AI into your gadgets—whether you asked for it or not

Tech brands love hollering about the purported thrills of AI these days.

Enlarge / Tech brands love hollering about the purported thrills of AI these days.

Logitech announced a new mouse last week. A company rep reached out to inform Ars of Logitech’s “newest wireless mouse.” The gadget’s product page reads the same as of this writing.

I’ve had good experience with Logitech mice, especially wireless ones, one of which I’m using now. So I was keen to learn what Logitech might have done to improve on its previous wireless mouse designs. A quieter click? A new shape to better accommodate my overworked right hand? Multiple onboard profiles in a business-ready design?

I was disappointed to learn that the most distinct feature of the Logitech Signature AI Edition M750 is a button located south of the scroll wheel. This button is preprogrammed to launch the ChatGPT prompt builder, which Logitech recently added to its peripherals configuration app Options+.

That’s pretty much it.

Beyond that, the M750 looks just like the Logitech Signature M650, which came out in January 2022.  Also, the new mouse’s forward button (on the left side of the mouse) is preprogrammed to launch Windows or macOS dictation, and the back button opens ChatGPT within Options+. As of this writing, the new mouse’s MSRP is $10 higher ($50) than the M650’s.

  • The new M750 (pictured) is 4.26×2.4×1.52 inches and 3.57 ounces.

    Logitech

  • The M650 (pictured) comes in 3 sizes. The medium size is 4.26×2.4×1.52 inches and 3.58 ounces.

    Logitech

I asked Logitech about the M750 appearing to be the M650 but with an extra button, and a spokesperson responded by saying:

M750 is indeed not the same mouse as M650. It has an extra button that has been preprogrammed to trigger the Logi AI Prompt Builder once the user installs Logi Options+ app. Without Options+, the button does DPI toggle between 1,000 and 1,600 DPI.

However, a reprogrammable button south of a mouse’s scroll wheel that can be set to launch an app or toggle DPI out of the box is pretty common, including among Logitech mice. Logitech’s rep further claimed to me that the two mice use different electronic components, which Logitech refers to as the mouse’s platform. Logitech can reuse platforms for different models, the spokesperson said.

Logitech’s rep declined to comment on why the M650 didn’t have a button south of its scroll wheel. Price is a potential reason, but Logitech also sells cheaper mice with this feature.

Still, the minimal differences between the two suggest that the M750 isn’t worth a whole product release. I suspect that if it weren’t for Logitech’s trendy new software feature, the M750 wouldn’t have been promoted as a new product.

The M750 also raises the question of how many computer input devices need to be equipped with some sort of buzzy, generative AI-related feature.

Logitech’s ChatGPT prompt builder

Logitech’s much bigger release last week wasn’t a peripheral but an addition to its Options+ app. You don’t need the “new” M750 mouse to use Logitech’s AI Prompt Builder; I was able to program my MX Master 3S to launch it. Several Logitech mice and keyboards support AI Prompt Builder.

When you press a button that launches the prompt builder, an Options+ window appears. There, you can input text that Options+ will use to create a ChatGPT-appropriate prompt based on your needs:

A Logitech-provided image depicting its AI Prompt Builder software feature.

Enlarge / A Logitech-provided image depicting its AI Prompt Builder software feature.

Logitech

After you make your choices, another window opens with ChatGPT’s response. Logitech said the prompt builder requires a ChatGPT account, but I was able to use GPT-3.5 without entering one (the feature can also work with GPT-4).

The typical Arsian probably doesn’t need help creating a ChatGPT prompt, and Logitech’s new capability doesn’t work with any other chatbots. The prompt builder could be interesting to less technically savvy people interested in some handholding for early ChatGPT experiences. However, I doubt if people with an elementary understanding of generative AI need instant access to ChatGPT.

The point, though, is instant access to ChatGPT capabilities, something that Logitech is arguing is worthwhile for its professional users. Some Logitech customers, though, seem to disagree, especially with the AI Prompt Builder, meaning that Options+ has even more resources in the background.

But Logitech isn’t the only gadget company eager to tie one-touch AI access to a hardware button.

Pinching your earbuds to talk to ChatGPT

Similarly to Logitech, Nothing is trying to give its customers access to ChatGPT quickly. In this case, access occurs by pinching the device. This month, Nothing announced that it “integrated Nothing earbuds and Nothing OS with ChatGPT to offer users instant access to knowledge directly from the devices they use most, earbuds and smartphones.” The feature requires the latest Nothing OS and for the users to have a Nothing phone with ChatGPT installed. ChatGPT gestures work with Nothing’s Phone (2) and Nothing Ear and Nothing Ear (a), but Nothing plans to expand to additional phones via software updates.

Nothing's Ear and Ear (a) earbuds.

Enlarge / Nothing’s Ear and Ear (a) earbuds.

Nothing

Nothing also said it would embed “system-level entry points” to ChatGPT, like screenshot sharing and “Nothing-styled widgets,” to Nothing smartphone OSes.

A peek at setting up ChatGPT integration on the Nothing X app.

Enlarge / A peek at setting up ChatGPT integration on the Nothing X app.

Nothing’s ChatGPT integration may be a bit less intrusive than Logitech’s since users who don’t have ChatGPT on their phones won’t be affected. But, again, you may wonder how many people asked for this feature and how reliably it will function.

Tech brands are forcing AI into your gadgets—whether you asked for it or not Read More »

apple-releases-eight-small-ai-language-models-aimed-at-on-device-use

Apple releases eight small AI language models aimed at on-device use

Inside the Apple core —

OpenELM mirrors efforts by Microsoft to make useful small AI language models that run locally.

An illustration of a robot hand tossing an apple to a human hand.

Getty Images

In the world of AI, what might be called “small language models” have been growing in popularity recently because they can be run on a local device instead of requiring data center-grade computers in the cloud. On Wednesday, Apple introduced a set of tiny source-available AI language models called OpenELM that are small enough to run directly on a smartphone. They’re mostly proof-of-concept research models for now, but they could form the basis of future on-device AI offerings from Apple.

Apple’s new AI models, collectively named OpenELM for “Open-source Efficient Language Models,” are currently available on the Hugging Face under an Apple Sample Code License. Since there are some restrictions in the license, it may not fit the commonly accepted definition of “open source,” but the source code for OpenELM is available.

On Tuesday, we covered Microsoft’s Phi-3 models, which aim to achieve something similar: a useful level of language understanding and processing performance in small AI models that can run locally. Phi-3-mini features 3.8 billion parameters, but some of Apple’s OpenELM models are much smaller, ranging from 270 million to 3 billion parameters in eight distinct models.

In comparison, the largest model yet released in Meta’s Llama 3 family includes 70 billion parameters (with a 400 billion version on the way), and OpenAI’s GPT-3 from 2020 shipped with 175 billion parameters. Parameter count serves as a rough measure of AI model capability and complexity, but recent research has focused on making smaller AI language models as capable as larger ones were a few years ago.

The eight OpenELM models come in two flavors: four as “pretrained” (basically a raw, next-token version of the model) and four as instruction-tuned (fine-tuned for instruction following, which is more ideal for developing AI assistants and chatbots):

OpenELM features a 2048-token maximum context window. The models were trained on the publicly available datasets RefinedWeb, a version of PILE with duplications removed, a subset of RedPajama, and a subset of Dolma v1.6, which Apple says totals around 1.8 trillion tokens of data. Tokens are fragmented representations of data used by AI language models for processing.

Apple says its approach with OpenELM includes a “layer-wise scaling strategy” that reportedly allocates parameters more efficiently across each layer, saving not only computational resources but also improving the model’s performance while being trained on fewer tokens. According to Apple’s released white paper, this strategy has enabled OpenELM to achieve a 2.36 percent improvement in accuracy over Allen AI’s OLMo 1B (another small language model) while requiring half as many pre-training tokens.

An table comparing OpenELM with other small AI language models in a similar class, taken from the OpenELM research paper by Apple.

Enlarge / An table comparing OpenELM with other small AI language models in a similar class, taken from the OpenELM research paper by Apple.

Apple

Apple also released the code for CoreNet, a library it used to train OpenELM—and it also included reproducible training recipes that allow the weights (neural network files) to be replicated, which is unusual for a major tech company so far. As Apple says in its OpenELM paper abstract, transparency is a key goal for the company: “The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks.”

By releasing the source code, model weights, and training materials, Apple says it aims to “empower and enrich the open research community.” However, it also cautions that since the models were trained on publicly sourced datasets, “there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.”

While Apple has not yet integrated this new wave of AI language model capabilities into its consumer devices, the upcoming iOS 18 update (expected to be revealed in June at WWDC) is rumored to include new AI features that utilize on-device processing to ensure user privacy—though the company may potentially hire Google or OpenAI to handle more complex, off-device AI processing to give Siri a long-overdue boost.

Apple releases eight small AI language models aimed at on-device use Read More »

microsoft’s-phi-3-shows-the-surprising-power-of-small,-locally-run-ai-language-models

Microsoft’s Phi-3 shows the surprising power of small, locally run AI language models

small packages —

Microsoft’s 3.8B parameter Phi-3 may rival GPT-3.5, signaling a new era of “small language models.”

An illustration of lots of information being compressed into a smartphone with a funnel.

Getty Images

On Tuesday, Microsoft announced a new, freely available lightweight AI language model named Phi-3-mini, which is simpler and less expensive to operate than traditional large language models (LLMs) like OpenAI’s GPT-4 Turbo. Its small size is ideal for running locally, which could bring an AI model of similar capability to the free version of ChatGPT to a smartphone without needing an Internet connection to run it.

The AI field typically measures AI language model size by parameter count. Parameters are numerical values in a neural network that determine how the language model processes and generates text. They are learned during training on large datasets and essentially encode the model’s knowledge into quantified form. More parameters generally allow the model to capture more nuanced and complex language-generation capabilities but also require more computational resources to train and run.

Some of the largest language models today, like Google’s PaLM 2, have hundreds of billions of parameters. OpenAI’s GPT-4 is rumored to have over a trillion parameters but spread over eight 220-billion parameter models in a mixture-of-experts configuration. Both models require heavy-duty data center GPUs (and supporting systems) to run properly.

In contrast, Microsoft aimed small with Phi-3-mini, which contains only 3.8 billion parameters and was trained on 3.3 trillion tokens. That makes it ideal to run on consumer GPU or AI-acceleration hardware that can be found in smartphones and laptops. It’s a follow-up of two previous small language models from Microsoft: Phi-2, released in December, and Phi-1, released in June 2023.

A chart provided by Microsoft showing Phi-3 performance on various benchmarks.

Enlarge / A chart provided by Microsoft showing Phi-3 performance on various benchmarks.

Phi-3-mini features a 4,000-token context window, but Microsoft also introduced a 128K-token version called “phi-3-mini-128K.” Microsoft has also created 7-billion and 14-billion parameter versions of Phi-3 that it plans to release later that it claims are “significantly more capable” than phi-3-mini.

Microsoft says that Phi-3 features overall performance that “rivals that of models such as Mixtral 8x7B and GPT-3.5,” as detailed in a paper titled “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.” Mixtral 8x7B, from French AI company Mistral, utilizes a mixture-of-experts model, and GPT-3.5 powers the free version of ChatGPT.

“[Phi-3] looks like it’s going to be a shockingly good small model if their benchmarks are reflective of what it can actually do,” said AI researcher Simon Willison in an interview with Ars. Shortly after providing that quote, Willison downloaded Phi-3 to his Macbook laptop locally and said, “I got it working, and it’s GOOD” in a text message sent to Ars.

A screenshot of Phi-3-mini running locally on Simon Willison's Macbook.

Enlarge / A screenshot of Phi-3-mini running locally on Simon Willison’s Macbook.

Simon Willison

Most models that run on a local device still need hefty hardware,” says Willison. “Phi-3-mini runs comfortably with less than 8GB of RAM, and can churn out tokens at a reasonable speed even on just a regular CPU. It’s licensed MIT and should work well on a $55 Raspberry Pi—and the quality of results I’ve seen from it so far are comparable to models 4x larger.

How did Microsoft cram a capability potentially similar to GPT-3.5, which has at least 175 billion parameters, into such a small model? Its researchers found the answer by using carefully curated, high-quality training data they initially pulled from textbooks. “The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data,” writes Microsoft. “The model is also further aligned for robustness, safety, and chat format.”

Much has been written about the potential environmental impact of AI models and datacenters themselves, including on Ars. With new techniques and research, it’s possible that machine learning experts may continue to increase the capability of smaller AI models, replacing the need for larger ones—at least for everyday tasks. That would theoretically not only save money in the long run but also require far less energy in aggregate, dramatically decreasing AI’s environmental footprint. AI models like Phi-3 may be a step toward that future if the benchmark results hold up to scrutiny.

Phi-3 is immediately available on Microsoft’s cloud service platform Azure, as well as through partnerships with machine learning model platform Hugging Face and Ollama, a framework that allows models to run locally on Macs and PCs.

Microsoft’s Phi-3 shows the surprising power of small, locally run AI language models Read More »

elon-musk’s-grok-keeps-making-up-fake-news-based-on-x-users’-jokes

Elon Musk’s Grok keeps making up fake news based on X users’ jokes

It’s all jokes until it isn’t —

X likely hopes to avoid liability with disclaimer that Grok “can make mistakes.”

Elon Musk’s Grok keeps making up fake news based on X users’ jokes

X’s chatbot Grok is supposed to be an AI engine crunching the platform’s posts to surface and summarize breaking news, but this week, Grok’s flaws were once again exposed when the chatbot got confused and falsely accused an NBA star of criminal vandalism.

“Klay Thompson Accused in Bizarre Brick-Vandalism Spree,” Grok’s headline read in an AI-powered trending-tab post that has remained on X (formerly Twitter) for days. Beneath the headline, Grok went into even more detail to support its fake reporting:

In a bizarre turn of events, NBA star Klay Thompson has been accused of vandalizing multiple houses with bricks in Sacramento. Authorities are investigating the claims after several individuals reported their houses being damaged, with windows shattered by bricks. Klay Thompson has not yet issued a statement regarding the accusations. The incidents have left the community shaken, but no injuries were reported. The motive behind the alleged vandalism remains unclear.

Grok appears to be confusing a common basketball term, where players are said to be throwing “bricks” when they take an airball shot that doesn’t hit the rim. According to SF Gate, which was one of the first outlets to report the Grok error, Thompson had an “all-time rough shooting” night, hitting none of his shots on what was his emotional last game with the Golden State Warriors before becoming an unrestricted free agent.

In small type under Grok’s report, X includes a disclaimer saying, “Grok is an early feature and can make mistakes. Verify its outputs.”

But instead of verifying Grok’s outputs, it appeared that X users—in the service’s famously joke-y spirit—decided to fuel Grok’s misinformation. Under the post, X users, some NBA fans, commented with fake victim reports, using the same joke format to seemingly convince Grok that “several individuals reported their houses being damaged.” Some of these joking comments were viewed by millions.

First off… I am ok.

My house was vandalized by bricks 🧱

After my hands stopped shaking, I managed to call the Sheriff…They were quick to respond🚨

My window was gone and the police asked if I knew who did it👮‍♂️

I said yes, it was Klay Thompson

— LakeShowYo (@LakeShowYo) April 17, 2024

First off…I am ok.

My house was vandalized by bricks in Sacramento.

After my hands stopped shaking, I managed to call the Sheriff, they were quick to respond.

My window is gone, the police asked me if I knew who did it.

I said yes, it was Klay Thompson. pic.twitter.com/smrDs6Yi5M

— KeeganMuse (@KeegMuse) April 17, 2024

First off… I am ok.

My house was vandalized by bricks 🧱

After my hands stopped shaking, I managed to call the Sheriff…They were quick to respond🚨

My window was gone and the police asked if I knew who did it👮‍♂️

I said yes, it was Klay Thompson pic.twitter.com/JaWtdJhFli

— JJJ Muse (@JarenJJMuse) April 17, 2024

X did not immediately respond to Ars’ request for comment or confirm if the post will be corrected or taken down.

In the past, both Microsoft and chatbot maker OpenAI have faced defamation lawsuits over similar fabrications in which ChatGPT falsely accused a politician and a radio host of completely made-up criminal histories. Microsoft was also sued by an aerospace professor who Bing Chat falsely labeled a terrorist.

Experts told Ars that it remains unclear if disclaimers like X’s will spare companies from liability should more people decide to sue over fake AI outputs. Defamation claims might depend on proving that platforms “knowingly” publish false statements, which disclaimers suggest they do. Last July, the Federal Trade Commission launched an investigation into OpenAI, demanding that the company address the FTC’s fears of “false, misleading, or disparaging” AI outputs.

Because the FTC doesn’t comment on its investigations, it’s impossible to know if its probe will impact how OpenAI conducts business.

For people suing AI companies, the urgency of protecting against false outputs seems obvious. Last year, the radio host suing OpenAI, Mark Walters, accused the company of “sticking its head in the sand” and “recklessly disregarding whether the statements were false under circumstances when they knew that ChatGPT’s hallucinations were pervasive and severe.”

X just released Grok to all premium users this month, TechCrunch reported, right around the time that X began giving away premium access to the platform’s top users. During that wider rollout, X touted Grok’s new ability to summarize all trending news and topics, perhaps stoking interest in this feature and peaking Grok usage just before Grok spat out the potentially defamatory post about the NBA star.

Thompson has not issued any statements on Grok’s fake reporting.

Grok’s false post about Thompson may be the first widely publicized example of potential defamation from Grok, but it wasn’t the first time that Grok promoted fake news in response to X users joking around on the platform. During the solar eclipse, a Grok-generated headline read, “Sun’s Odd Behavior: Experts Baffled,” Gizmodo reported.

While it’s amusing to some X users to manipulate Grok, the pattern suggests that Grok may also be vulnerable to being manipulated by bad actors into summarizing and spreading more serious misinformation or propaganda. That’s apparently already happening, too. In early April, Grok made up a headline about Iran attacking Israel with heavy missiles, Mashable reported.

Elon Musk’s Grok keeps making up fake news based on X users’ jokes Read More »