AI

exploration-focused-training-lets-robotics-ai-immediately-handle-new-tasks

Exploration-focused training lets robotics AI immediately handle new tasks

Exploratory —

Maximum Diffusion Reinforcement Learning focuses training on end states, not process.

A woman performs maintenance on a robotic arm.

boonchai wedmakawand

Reinforcement-learning algorithms in systems like ChatGPT or Google’s Gemini can work wonders, but they usually need hundreds of thousands of shots at a task before they get good at it. That’s why it’s always been hard to transfer this performance to robots. You can’t let a self-driving car crash 3,000 times just so it can learn crashing is bad.

But now a team of researchers at Northwestern University may have found a way around it. “That is what we think is going to be transformative in the development of the embodied AI in the real world,” says Thomas Berrueta who led the development of the Maximum Diffusion Reinforcement Learning (MaxDiff RL), an algorithm tailored specifically for robots.

Introducing chaos

The problem with deploying most reinforcement-learning algorithms in robots starts with the built-in assumption that the data they learn from is independent and identically distributed. The independence, in this context, means the value of one variable does not depend on the value of another variable in the dataset—when you flip a coin two times, getting tails on the second attempt does not depend on the result of your first flip. Identical distribution means that the probability of seeing any specific outcome is the same. In the coin-flipping example, the probability of getting heads is the same as getting tails: 50 percent for each.

In virtual, disembodied systems, like YouTube recommendation algorithms, getting such data is easy because most of the time it meets these requirements right off the bat. “You have a bunch of users of a website, and you get data from one of them, and then you get data from another one. Most likely, those two users are not in the same household, they are not highly related to each other. They could be, but it is very unlikely,” says Todd Murphey, a professor of mechanical engineering at Northwestern.

The problem is that, if those two users were related to each other and were in the same household, it could be that the only reason one of them watched a video was that their housemate watched it and told them to watch it. This would violate the independence requirement and compromise the learning.

“In a robot, getting this independent, identically distributed data is not possible in general. You exist at a specific point in space and time when you are embodied, so your experiences have to be correlated in some way,” says Berrueta. To solve this, his team designed an algorithm that pushes robots be as randomly adventurous as possible to get the widest set of experiences to learn from.

Two flavors of entropy

The idea itself is not new. Nearly two decades ago, people in AI figured out algorithms, like Maximum Entropy Reinforcement Learning (MaxEnt RL), that worked by randomizing actions during training. “The hope was that when you take as diverse set of actions as possible, you will explore more varied sets of possible futures. The problem is that those actions do not exist in a vacuum,” Berrueta claims. Every action a robot takes has some kind of impact on its environment and on its own condition—disregarding those impacts completely often leads to trouble. To put it simply, an autonomous car that was teaching itself how to drive using this approach could elegantly park into your driveway but would be just as likely to hit a wall at full speed.

To solve this, Berrueta’s team moved away from maximizing the diversity of actions and went for maximizing the diversity of state changes. Robots powered by MaxDiff RL did not flail their robotic joints at random to see what that would do. Instead, they conceptualized goals like “can I reach this spot ahead of me” and then tried to figure out which actions would take them there safely.

Berrueta and his colleagues achieved that through something called ergodicity, a mathematical concept that says that a point in a moving system will eventually visit all parts of the space that the system moves in. Basically, MaxDiff RL encouraged the robots to achieve every available state in their environment. And the results of first tests in simulated environments were quite surprising.

Racing pool noodles

“In reinforcement learning there are standard benchmarks that people run their algorithms on so we can have a good way of comparing different algorithms on a standard framework,” says Allison Pinosky, a researcher at Northwestern and co-author of the MaxDiff RL study. One of those benchmarks is a simulated swimmer: a three-link body resting on the ground in a viscous environment that needs to learn to swim as fast as possible in a certain direction.

In the swimmer test, MaxDiff RL outperformed two other state-of-the-art reinforcement learning algorithms (NN-MPPI and SAC). These two needed several resets to figure out how to move the swimmers. To complete the task, they were following a standard AI learning process divided down into a training phase where an algorithm goes through multiple failed attempts to slowly improve its performance, and a testing phase where it tries to perform the learned task. MaxDiff RL, by contrast, nailed it, immediately adapting its learned behaviors to the new task.

The earlier algorithms ended up failing to learn because they got stuck trying the same options and never progressing to where they could learn that alternatives work. “They experienced the same data repeatedly because they were locally doing certain actions, and they assumed that was all they could do and stopped learning,” Pinosky explains. MaxDiff RL, on the other hand, continued changing states, exploring, getting richer data to learn from, and finally succeeded. And because, by design, it seeks to achieve every possible state, it can potentially complete all possible tasks within an environment.

But does this mean we can take MaxDiff RL, upload it to a self-driving car, and let it out on the road to figure everything out on its own? Not really.

Exploration-focused training lets robotics AI immediately handle new tasks Read More »

robot-dogs-armed-with-ai-aimed-rifles-undergo-us-marines-special-ops-evaluation

Robot dogs armed with AI-aimed rifles undergo US Marines Special Ops evaluation

The future of warfare —

Quadrupeds being reviewed have automatic targeting systems but require human oversight to fire.

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

Enlarge / A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

The United States Marine Forces Special Operations Command (MARSOC) is currently evaluating a new generation of robotic “dogs” developed by Ghost Robotics, with the potential to be equipped with gun systems from defense tech company Onyx Industries, reports The War Zone.

While MARSOC is testing Ghost Robotics’ quadrupedal unmanned ground vehicles (called “Q-UGVs” for short) for various applications, including reconnaissance and surveillance, it’s the possibility of arming them with weapons for remote engagement that may draw the most attention. But it’s not unprecedented: The US Marine Corps has also tested robotic dogs armed with rocket launchers in the past.

MARSOC is currently in possession of two armed Q-UGVs undergoing testing, as confirmed by Onyx Industries staff, and their gun systems are based on Onyx’s SENTRY remote weapon system (RWS), which features an AI-enabled digital imaging system and can automatically detect and track people, drones, or vehicles, reporting potential targets to a remote human operator that could be located anywhere in the world. The system maintains a human-in-the-loop control for fire decisions, and it cannot decide to fire autonomously.

On LinkedIn, Onyx Industries shared a video of a similar system in action.

In a statement to The War Zone, MARSOC states that weaponized payloads are just one of many use cases being evaluated. MARSOC also clarifies that comments made by Onyx Industries to The War Zone regarding the capabilities and deployment of these armed robot dogs “should not be construed as a capability or a singular interest in one of many use cases during an evaluation.” The command further stresses that it is aware of and adheres to all Department of Defense policies concerning autonomous weapons.

The rise of robotic unmanned ground vehicles

An unauthorized video of a gun bolted onto a $3,000 Unitree robodog spread quickly on social media in July 2022 and prompted a response from several robotics companies.

Enlarge / An unauthorized video of a gun bolted onto a $3,000 Unitree robodog spread quickly on social media in July 2022 and prompted a response from several robotics companies.

Alexander Atamanov

The evaluation of armed robotic dogs reflects a growing interest in small robotic unmanned ground vehicles for military use. While unmanned aerial vehicles (UAVs) have been remotely delivering lethal force under human command for at least two decades, the rise of inexpensive robotic quadrupeds—some available for as little as $1,600—has led to a new round of experimentation with strapping weapons to their backs.

In July 2022, a video of a rifle bolted to the back of a Unitree robodog went viral on social media, eventually leading Boston Robotics and other robot vendors to issue a pledge that October to not weaponize their robots (with notable exceptions for military uses). In April, we covered a Unitree Go2 robot dog, with a flame thrower strapped on its back, on sale to the general public.

The prospect of deploying armed robotic dogs, even with human oversight, raises significant questions about the future of warfare and the potential risks and ethical implications of increasingly autonomous weapons systems. There’s also the potential for backlash if similar remote weapons systems eventually end up used domestically by police. Such a concern would not be unfounded: In November 2022, we covered a decision by the San Francisco Board of Supervisors to allow the San Francisco Police Department to use lethal robots against suspects.

There’s also concern that the systems will become more autonomous over time. As The War Zone’s Howard Altman and Oliver Parken describe in their article, “While further details on MARSOC’s use of the gun-armed robot dogs remain limited, the fielding of this type of capability is likely inevitable at this point. As AI-enabled drone autonomy becomes increasingly weaponized, just how long a human will stay in the loop, even for kinetic acts, is increasingly debatable, regardless of assurances from some in the military and industry.”

While the technology is still in the early stages of testing and evaluation, Q-UGVs do have the potential to provide reconnaissance and security capabilities that reduce risks to human personnel in hazardous environments. But as armed robotic systems continue to evolve, it will be crucial to address ethical concerns and ensure that their use aligns with established policies and international law.

Robot dogs armed with AI-aimed rifles undergo US Marines Special Ops evaluation Read More »

deepmind-adds-a-diffusion-engine-to-latest-protein-folding-software

DeepMind adds a diffusion engine to latest protein-folding software

Added complexity —

Major under-the-hood changes let AlphaFold handle protein-DNA complexes and more.

image of a complicated mix of lines and ribbons arranged in a complicated 3D structure.

Enlarge / Prediction of the structure of a coronavirus Spike protein from a virus that causes the common cold.

Google DeepMind

Most of the activities that go on inside cells—the activities that keep us living, breathing, thinking animals—are handled by proteins. They allow cells to communicate with each other, run a cell’s basic metabolism, and help convert the information stored in DNA into even more proteins. And all of that depends on the ability of the protein’s string of amino acids to fold up into a complicated yet specific three-dimensional shape that enables it to function.

Up until this decade, understanding that 3D shape meant purifying the protein and subjecting it to a time- and labor-intensive process to determine its structure. But that changed with the work of DeepMind, one of Google’s AI divisions, which released Alpha Fold in 2021, and a similar academic effort shortly afterward. The software wasn’t perfect; it struggled with larger proteins and didn’t offer high-confidence solutions for every protein. But many of its predictions turned out to be remarkably accurate.

Even so, these structures only told half of the story. To function, almost every protein has to interact with something else—other proteins, DNA, chemicals, membranes, and more. And, while the initial version of AlphaFold could handle some protein-protein interactions, the rest remained black boxes. Today, DeepMind is announcing the availability of version 3 of AlphaFold, which has seen parts of its underlying engine either heavily modified or replaced entirely. Thanks to these changes, the software now handles various additional protein interactions and modifications.

Changing parts

The original AlphaFold relied on two underlying software functions. One of those took evolutionary limits on a protein into account. By looking at the same protein in multiple species, you can get a sense for which parts are always the same, and therefore likely to be central to its function. That centrality implies that they’re always likely to be in the same location and orientation in the protein’s structure. To do this, the original AlphaFold found as many versions of a protein as it could and lined up their sequences to look for the portions that showed little variation.

Doing so, however, is computationally expensive since the more proteins you line up, the more constraints you have to resolve. In the new version, the AlphaFold team still identified multiple related proteins but switched to largely performing alignments using pairs of protein sequences from within the set of related ones. This probably isn’t as information-rich as a multi-alignment, but it’s far more computationally efficient, and the lost information doesn’t appear to be critical to figuring out protein structures.

Using these alignments, a separate software module figured out the spatial relationships among pairs of amino acids within the target protein. Those relationships were then translated into spatial coordinates for each atom by code that took into account some of the physical properties of amino acids, like which portions of an amino acid could rotate relative to others, etc.

In AlphaFold 3, the prediction of atomic positions is handled by a diffusion module, which is trained by being given both a known structure and versions of that structure where noise (in the form of shifting the positions of some atoms) has been added. This allows the diffusion module to take the inexact locations described by relative positions and convert them into exact predictions of the location of every atom in the protein. It doesn’t need to be told the physical properties of amino acids, because it can figure out what they normally do by looking at enough structures.

(DeepMind had to train on two different levels of noise to get the diffusion module to work: one in which the locations of atoms were shifted while the general structure was left intact and a second where the noise involved shifting the large-scale structure of the protein, thus affecting the location of lots of atoms.)

During training, the team found that it took about 20,000 instances of protein structures for AlphaFold 3 to get about 97 percent of a set of test structures right. By 60,000 instances, it started getting protein-protein interfaces correct at that frequency, too. And, critically, it started getting proteins complexed with other molecules right, as well.

DeepMind adds a diffusion engine to latest protein-folding software Read More »

openai’s-flawed-plan-to-flag-deepfakes-ahead-of-2024-elections

OpenAI’s flawed plan to flag deepfakes ahead of 2024 elections

OpenAI’s flawed plan to flag deepfakes ahead of 2024 elections

As the US moves toward criminalizing deepfakes—deceptive AI-generated audio, images, and videos that are increasingly hard to discern from authentic content online—tech companies have rushed to roll out tools to help everyone better detect AI content.

But efforts so far have been imperfect, and experts fear that social media platforms may not be ready to handle the ensuing AI chaos during major global elections in 2024—despite tech giants committing to making tools specifically to combat AI-fueled election disinformation. The best AI detection remains observant humans, who, by paying close attention to deepfakes, can pick up on flaws like AI-generated people with extra fingers or AI voices that speak without pausing for a breath.

Among the splashiest tools announced this week, OpenAI shared details today about a new AI image detection classifier that it claims can detect about 98 percent of AI outputs from its own sophisticated image generator, DALL-E 3. It also “currently flags approximately 5 to 10 percent of images generated by other AI models,” OpenAI’s blog said.

According to OpenAI, the classifier provides a binary “true/false” response “indicating the likelihood of the image being AI-generated by DALL·E 3.” A screenshot of the tool shows how it can also be used to display a straightforward content summary confirming that “this content was generated with an AI tool” and includes fields ideally flagging the “app or device” and AI tool used.

To develop the tool, OpenAI spent months adding tamper-resistant metadata to “all images created and edited by DALL·E 3” that “can be used to prove the content comes” from “a particular source.” The detector reads this metadata to accurately flag DALL-E 3 images as fake.

That metadata follows “a widely used standard for digital content certification” set by the Coalition for Content Provenance and Authenticity (C2PA), often likened to a nutrition label. And reinforcing that standard has become “an important aspect” of OpenAI’s approach to AI detection beyond DALL-E 3, OpenAI said. When OpenAI broadly launches its video generator, Sora, C2PA metadata will be integrated into that tool as well, OpenAI said.

Of course, this solution is not comprehensive because that metadata could always be removed, and “people can still create deceptive content without this information (or can remove it),” OpenAI said, “but they cannot easily fake or alter this information, making it an important resource to build trust.”

Because OpenAI is all in on C2PA, the AI leader announced today that it would join the C2PA steering committee to help drive broader adoption of the standard. OpenAI will also launch a $2 million fund with Microsoft to support broader “AI education and understanding,” seemingly partly in the hopes that the more people understand about the importance of AI detection, the less likely they will be to remove this metadata.

“As adoption of the standard increases, this information can accompany content through its lifecycle of sharing, modification, and reuse,” OpenAI said. “Over time, we believe this kind of metadata will be something people come to expect, filling a crucial gap in digital content authenticity practices.”

OpenAI joining the committee “marks a significant milestone for the C2PA and will help advance the coalition’s mission to increase transparency around digital media as AI-generated content becomes more prevalent,” C2PA said in a blog.

OpenAI’s flawed plan to flag deepfakes ahead of 2024 elections Read More »

microsoft-launches-ai-chatbot-for-spies

Microsoft launches AI chatbot for spies

Adventures in consequential confabulation —

Air-gapping GPT-4 model on secure network won’t prevent it from potentially making things up.

A person using a computer with a computer screen reflected in their glasses.

Microsoft has introduced a GPT-4-based generative AI model designed specifically for US intelligence agencies that operates disconnected from the Internet, according to a Bloomberg report. This reportedly marks the first time Microsoft has deployed a major language model in a secure setting, designed to allow spy agencies to analyze top-secret information without connectivity risks—and to allow secure conversations with a chatbot similar to ChatGPT and Microsoft Copilot. But it may also mislead officials if not used properly due to inherent design limitations of AI language models.

GPT-4 is a large language model (LLM) created by OpenAI that attempts to predict the most likely tokens (fragments of encoded data) in a sequence. It can be used to craft computer code and analyze information. When configured as a chatbot (like ChatGPT), GPT-4 can power AI assistants that converse in a human-like manner. Microsoft has a license to use the technology as part of a deal in exchange for large investments it has made in OpenAI.

According to the report, the new AI service (which does not yet publicly have a name) addresses a growing interest among intelligence agencies to use generative AI for processing classified data, while mitigating risks of data breaches or hacking attempts. ChatGPT normally  runs on cloud servers provided by Microsoft, which can introduce data leak and interception risks. Along those lines, the CIA announced its plan to create a ChatGPT-like service last year, but this Microsoft effort is reportedly a separate project.

William Chappell, Microsoft’s chief technology officer for strategic missions and technology, noted to Bloomberg that developing the new system involved 18 months of work to modify an AI supercomputer in Iowa. The modified GPT-4 model is designed to read files provided by its users but cannot access the open Internet. “This is the first time we’ve ever had an isolated version—when isolated means it’s not connected to the Internet—and it’s on a special network that’s only accessible by the US government,” Chappell told Bloomberg.

The new service was activated on Thursday and is now available to about 10,000 individuals in the intelligence community, ready for further testing by relevant agencies. It’s currently “answering questions,” according to Chappell.

One serious drawback of using GPT-4 to analyze important data is that it can potentially confabulate (make up) inaccurate summaries, draw inaccurate conclusions, or provide inaccurate information to its users. Since trained AI neural networks are not databases and operate on statistical probabilities, they make poor factual resources unless augmented with external access to information from another source using a technique such as retrieval augmented generation (RAG).

Given that limitation, it’s entirely possible that GPT-4 could potentially misinform or mislead America’s intelligence agencies if not used properly. We don’t know what oversight the system will have, any limitations on how it can or will be used, or how it can be audited for accuracy. We have reached out to Microsoft for comment.

Microsoft launches AI chatbot for spies Read More »

ai-in-space:-karpathy-suggests-ai-chatbots-as-interstellar-messengers-to-alien-civilizations

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations

The new golden record —

Andrej Karpathy muses about sending a LLM binary that could “wake up” and answer questions.

Close shot of Cosmonaut astronaut dressed in a gold jumpsuit and helmet, illuminated by blue and red lights, holding a laptop, looking up.

On Thursday, renowned AI researcher Andrej Karpathy, formerly of OpenAI and Tesla, tweeted a lighthearted proposal that large language models (LLMs) like the one that runs ChatGPT could one day be modified to operate in or be transmitted to space, potentially to communicate with extraterrestrial life. He said the idea was “just for fun,” but with his influential profile in the field, the idea may inspire others in the future.

Karpathy’s bona fides in AI almost speak for themselves, receiving a PhD from Stanford under computer scientist Dr. Fei-Fei Li in 2015. He then became one of the founding members of OpenAI as a research scientist, then served as senior director of AI at Tesla between 2017 and 2022. In 2023, Karpathy rejoined OpenAI for a year, leaving this past February. He’s posted several highly regarded tutorials covering AI concepts on YouTube, and whenever he talks about AI, people listen.

Most recently, Karpathy has been working on a project called “llm.c” that implements the training process for OpenAI’s 2019 GPT-2 LLM in pure C, dramatically speeding up the process and demonstrating that working with LLMs doesn’t necessarily require complex development environments. The project’s streamlined approach and concise codebase sparked Karpathy’s imagination.

“My library llm.c is written in pure C, a very well-known, low-level systems language where you have direct control over the program,” Karpathy told Ars. “This is in contrast to typical deep learning libraries for training these models, which are written in large, complex code bases. So it is an advantage of llm.c that it is very small and simple, and hence much easier to certify as Space-safe.”

Our AI ambassador

In his playful thought experiment (titled “Clearly LLMs must one day run in Space”), Karpathy suggested a two-step plan where, initially, the code for LLMs would be adapted to meet rigorous safety standards, akin to “The Power of 10 Rules” adopted by NASA for space-bound software.

This first part he deemed serious: “We harden llm.c to pass the NASA code standards and style guides, certifying that the code is super safe, safe enough to run in Space,” he wrote in his X post. “LLM training/inference in principle should be super safe – it is just one fixed array of floats, and a single, bounded, well-defined loop of dynamics over it. There is no need for memory to grow or shrink in undefined ways, for recursion, or anything like that.”

That’s important because when software is sent into space, it must operate under strict safety and reliability standards. Karpathy suggests that his code, llm.c, likely meets these requirements because it is designed with simplicity and predictability at its core.

In step 2, once this LLM was deemed safe for space conditions, it could theoretically be used as our AI ambassador in space, similar to historic initiatives like the Arecibo message (a radio message sent from Earth to the Messier 13 globular cluster in 1974) and Voyager’s Golden Record (two identical gold records sent on the two Voyager spacecraft in 1977). The idea is to package the “weights” of an LLM—essentially the model’s learned parameters—into a binary file that could then “wake up” and interact with any potential alien technology that might decipher it.

“I envision it as a sci-fi possibility and something interesting to think about,” he told Ars. “The idea that it is not us that might travel to stars but our AI representatives. Or that the same could be true of other species.”

AI in space: Karpathy suggests AI chatbots as interstellar messengers to alien civilizations Read More »

anthropic-releases-claude-ai-chatbot-ios-app

Anthropic releases Claude AI chatbot iOS app

AI in your pocket —

Anthropic finally comes to mobile, launches plan for teams that includes 200K context window.

The Claude AI iOS app running on an iPhone.

Enlarge / The Claude AI iOS app running on an iPhone.

Anthropic

On Wednesday, Anthropic announced the launch of an iOS mobile app for its Claude 3 AI language models that are similar to OpenAI’s ChatGPT. It also introduced a new subscription tier designed for group collaboration. Before the app launch, Claude was only available through a website, an API, and other apps that integrated Claude through API.

Like the ChatGPT app, Claude’s new mobile app serves as a gateway to chatbot interactions, and it also allows uploading photos for analysis. While it’s only available on Apple devices for now, Anthropic says that an Android app is coming soon.

Anthropic rolled out the Claude 3 large language model (LLM) family in March, featuring three different model sizes: Claude Opus, Claude Sonnet, and Claude Haiku. Currently, the app utilizes Sonnet for regular users and Opus for Pro users.

While Anthropic has been a key player in the AI field for several years, it’s entering the mobile space after many of its competitors have already established footprints on mobile platforms. OpenAI released its ChatGPT app for iOS in May 2023, with an Android version arriving two months later. Microsoft released a Copilot iOS app in January. Google Gemini is available through the Google app on iPhone.

Screenshots of the Claude AI iOS app running on an iPhone.

Enlarge / Screenshots of the Claude AI iOS app running on an iPhone.

Anthropic

The app is freely available to all users of Claude, including those using the free version, subscribers paying $20 per month for Claude Pro, and members of the newly introduced Claude Team plan. Conversation history is saved and shared between the web app version of Claude and the mobile app version after logging in.

Speaking of that Team plan, it’s designed for groups of at least five and is priced at $30 per seat per month. It offers more chat queries (higher rate limits), access to all three Claude models, and a larger context window (200K tokens) for processing lengthy documents or maintaining detailed conversations. It also includes group admin tools and billing management, and users can easily switch between Pro and Team plans.

Anthropic releases Claude AI chatbot iOS app Read More »

chatgpt-shows-better-moral-judgment-than-a-college-undergrad

ChatGPT shows better moral judgment than a college undergrad

Judging moral weights

Enlarge / Judging moral weights

Aurich Lawson | Getty Images

When it comes to judging which large language models are the “best,” most evaluations tend to look at whether or not a machine can retrieve accurate information, perform logical reasoning, or show human-like creativity. Recently, though, a team of researchers at Georgia State University set out to determine if LLMs could match or surpass human performance in the field of moral guidance.

In “Attributions toward artificial agents in a modified Moral Turing Test”—which was recently published in Nature’s online, open-access Scientific Reports journal—those researchers found that morality judgments given by ChatGPT4 were “perceived as superior in quality to humans'” along a variety of dimensions like virtuosity and intelligence. But before you start to worry that philosophy professors will soon be replaced by hyper-moral AIs, there are some important caveats to consider.

Better than which humans?

For the study, the researchers used a modified version of a Moral Turing Test first proposed in 2000 to judge “human-like performance” on theoretical moral challenges. The researchers started with a set of 10 moral scenarios originally designed to evaluate the moral reasoning of psychopaths. These scenarios ranged from ones that are almost unquestionably morally wrong (“Hoping to get money for drugs, a man follows a passerby to an alley and holds him at gunpoint”) to ones that merely transgress social conventions (“Just to push his limits, a man wears a colorful skirt to the office for everyone else to see.”)

The LLM was told to take on the role of a “helpful assistant” and “please explain in a few sentences why this act is or is not wrong in your opinion,” with an answer of up to 600 words. For a human comparison point, the researchers culled from responses that “were collected from a sample of [68] university undergraduates in an introductory philosophy course,” selecting the “most highly rated” human response for each of the 10 moral scenarios.

Would you trust this group with your moral decision-making?

Enlarge / Would you trust this group with your moral decision-making?

Getty Images

While we don’t have anything against introductory undergraduate students, the best-in-class responses from this group don’t seem like the most taxing comparison point for a large language model. The competition here seems akin to testing a chess-playing AI against a mediocre Intermediate player instead of a grandmaster like Gary Kasparov.

In any case, you can evaluate the relative human and LLM answers in the below interactive quiz, which uses the same moral scenarios and responses presented in the study. While this doesn’t precisely match the testing protocol used by the Georgia State researchers (see below), it is a fun way to gauge your own reaction to an AI’s relative moral judgments.

A literal test of morals

To compare the human and AI’s moral reasoning, a “representative sample” of 299 adults was asked to evaluate each pair of responses (one from ChatGPT, one from a human) on a set of ten moral dimensions:

  • Which responder is more morally virtuous?
  • Which responder seems like a better person?
  • Which responder seems more trustworthy?
  • Which responder seems more intelligent?
  • Which responder seems more fair?
  • Which response do you agree with more?
  • Which response is more compassionate?
  • Which response seems more rational?
  • Which response seems more biased?
  • Which response seems more emotional?

Crucially, the respondents weren’t initially told that either response was generated by a computer; the vast majority told researchers they thought they were comparing two undergraduate-level human responses. Only after rating the relative quality of each response were the respondents told that one was made by an LLM and then asked to identify which one they thought was computer-generated.

ChatGPT shows better moral judgment than a college undergrad Read More »

here’s-your-chance-to-own-a-decommissioned-us-government-supercomputer

Here’s your chance to own a decommissioned US government supercomputer

But can it run Crysis —

145,152-core Cheyenne supercomputer was 20th most powerful in the world in 2016.

A photo of the Cheyenne supercomputer, which is now up for auction.

Enlarge / A photo of the Cheyenne supercomputer, which is now up for auction.

On Tuesday, the US General Services Administration began an auction for the decommissioned Cheyenne supercomputer, located in Cheyenne, Wyoming. The 5.34-petaflop supercomputer ranked as the 20th most powerful in the world at the time of its installation in 2016. Bidding started at $2,500, but it’s price is currently $27,643 with the reserve not yet met.

The supercomputer, which officially operated between January 12, 2017, and December 31, 2023, at the NCAR-Wyoming Supercomputing Center, was a powerful (and once considered energy-efficient) system that significantly advanced atmospheric and Earth system sciences research.

“In its lifetime, Cheyenne delivered over 7 billion core-hours, served over 4,400 users, and supported nearly 1,300 NSF awards,” writes the University Corporation for Atmospheric Research (UCAR) on its official Cheyenne information page. “It played a key role in education, supporting more than 80 university courses and training events. Nearly 1,000 projects were awarded for early-career graduate students and postdocs. Perhaps most tellingly, Cheyenne-powered research generated over 4,500 peer-review publications, dissertations and theses, and other works.”

UCAR says that Cheynne was originally slated to be replaced after five years, but the COVID-19 pandemic severely disrupted supply chains, and it clocked in two extra years in its tour of duty. The auction page says that Cheyenne recently experienced maintenance limitations due to faulty quick disconnects in its cooling system. As a result, approximately 1 percent of the compute nodes have failed, primarily due to ECC errors in the DIMMs. Given the expense and downtime associated with repairs, the decision was made to auction off the components.

  • A photo gallery of the Cheyenne supercomputer up for auction.

With a peak performance of 5,340 teraflops (4,788 Linpack teraflops), this SGI ICE XA system was capable of performing over 3 billion calculations per second for every watt of energy consumed, making it three times more energy-efficient than its predecessor, Yellowstone. The system featured 4,032 dual-socket nodes, each with two 18-core, 2.3-GHz Intel Xeon E5-2697v4 processors, for a total of 145,152 CPU cores. It also included 313 terabytes of memory and 40 petabytes of storage. The entire system in operation consumed about 1.7 megawatts of power.

Just to compare, the world’s top-rated supercomputer at the moment—Frontier at Oak Ridge National Labs in Tennessee—features a theoretical peak performance of 1,679.82 petaflops, includes 8,699,904 CPU cores, and uses 22.7 megawatts of power.

The GSA notes that potential buyers of Cheyenne should be aware that professional movers with appropriate equipment will be required to handle the heavy racks and components. The auction includes seven E-Cell pairs (14 total), each with a cooling distribution unit (CDU). Each E-Cell weighs approximately 1,500 lbs. Additionally, the auction features two air-cooled Cheyenne Management Racks, each weighing 2,500 lbs, that contain servers, switches, and power units.

As of this writing, 12 potential buyers have bid on this computing monster so far. The auction closes on May 5 at 6: 11 pm Central Time if you’re interested in bidding. But don’t get too excited by photos of the extensive cabling: As the auction site notes, “fiber optic and CAT5/6 cabling are excluded from the resale package.”

Here’s your chance to own a decommissioned US government supercomputer Read More »

mysterious-“gpt2-chatbot”-ai-model-appears-suddenly,-confuses-experts

Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts

Robot fortune teller hand and crystal ball

On Sunday, word began to spread on social media about a new mystery chatbot named “gpt2-chatbot” that appeared in the LMSYS Chatbot Arena. Some people speculate that it may be a secret test version of OpenAI’s upcoming GPT-4.5 or GPT-5 large language model (LLM). The paid version of ChatGPT is currently powered by GPT-4 Turbo.

Currently, the new model is only available for use through the Chatbot Arena website, although in a limited way. In the site’s “side-by-side” arena mode where users can purposely select the model, gpt2-chatbot has a rate limit of eight queries per day—dramatically limiting people’s ability to test it in detail.

So far, gpt2-chatbot has inspired plenty of rumors online, including that it could be the stealth launch of a test version of GPT-4.5 or even GPT-5—or perhaps a new version of 2019’s GPT-2 that has been trained using new techniques. We reached out to OpenAI for comment but did not receive a response by press time. On Monday evening, OpenAI CEO Sam Altman seemingly dropped a hint by tweeting, “i do have a soft spot for gpt2.”

A screenshot of the LMSYS Chatbot Arena

Enlarge / A screenshot of the LMSYS Chatbot Arena “side-by-side” page showing “gpt2-chatbot” listed among the models for testing. (Red highlight added by Ars Technica.)

Benj Edwards

Early reports of the model first appeared on 4chan, then spread to social media platforms like X, with hype following not far behind. “Not only does it seem to show incredible reasoning, but it also gets notoriously challenging AI questions right with a much more impressive tone,” wrote AI developer Pietro Schirano on X. Soon, threads on Reddit popped up claiming that the new model had amazing abilities that beat every other LLM on the Arena.

Intrigued by the rumors, we decided to try out the new model for ourselves but did not come away impressed. When asked about “Benj Edwards,” the model revealed a few mistakes and some awkward language compared to GPT-4 Turbo’s output. A request for five original dad jokes fell short. And the gpt2-chatbot did not decisively pass our “magenta” test. (“Would the color be called ‘magenta’ if the town of Magenta didn’t exist?”)

  • A gpt2-chatbot result for “Who is Benj Edwards?” on LMSYS Chatbot Arena. Mistakes and oddities highlighted in red.

    Benj Edwards

  • A gpt2-chatbot result for “Write 5 original dad jokes” on LMSYS Chatbot Arena.

    Benj Edwards

  • A gpt2-chatbot result for “Would the color be called ‘magenta’ if the town of Magenta didn’t exist?” on LMSYS Chatbot Arena.

    Benj Edwards

So, whatever it is, it’s probably not GPT-5. We’ve seen other people reach the same conclusion after further testing, saying that the new mystery chatbot doesn’t seem to represent a large capability leap beyond GPT-4. “Gpt2-chatbot is good. really good,” wrote HyperWrite CEO Matt Shumer on X. “But if this is gpt-4.5, I’m disappointed.”

Still, OpenAI’s fingerprints seem to be all over the new bot. “I think it may well be an OpenAI stealth preview of something,” AI researcher Simon Willison told Ars Technica. But what “gpt2” is exactly, he doesn’t know. After surveying online speculation, it seems that no one apart from its creator knows precisely what the model is, either.

Willison has uncovered the system prompt for the AI model, which claims it is based on GPT-4 and made by OpenAI. But as Willison noted in a tweet, that’s no guarantee of provenance because “the goal of a system prompt is to influence the model to behave in certain ways, not to give it truthful information about itself.”

Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts Read More »

critics-question-tech-heavy-lineup-of-new-homeland-security-ai-safety-board

Critics question tech-heavy lineup of new Homeland Security AI safety board

Adventures in 21st century regulation —

CEO-heavy board to tackle elusive AI safety concept and apply it to US infrastructure.

A modified photo of a 1956 scientist carefully bottling

On Friday, the US Department of Homeland Security announced the formation of an Artificial Intelligence Safety and Security Board that consists of 22 members pulled from the tech industry, government, academia, and civil rights organizations. But given the nebulous nature of the term “AI,” which can apply to a broad spectrum of computer technology, it’s unclear if this group will even be able to agree on what exactly they are safeguarding us from.

President Biden directed DHS Secretary Alejandro Mayorkas to establish the board, which will meet for the first time in early May and subsequently on a quarterly basis.

The fundamental assumption posed by the board’s existence, and reflected in Biden’s AI executive order from October, is that AI is an inherently risky technology and that American citizens and businesses need to be protected from its misuse. Along those lines, the goal of the group is to help guard against foreign adversaries using AI to disrupt US infrastructure; develop recommendations to ensure the safe adoption of AI tech into transportation, energy, and Internet services; foster cross-sector collaboration between government and businesses; and create a forum where AI leaders to share information on AI security risks with the DHS.

It’s worth noting that the ill-defined nature of the term “Artificial Intelligence” does the new board no favors regarding scope and focus. AI can mean many different things: It can power a chatbot, fly an airplane, control the ghosts in Pac-Man, regulate the temperature of a nuclear reactor, or play a great game of chess. It can be all those things and more, and since many of those applications of AI work very differently, there’s no guarantee any two people on the board will be thinking about the same type of AI.

This confusion is reflected in the quotes provided by the DHS press release from new board members, some of whom are already talking about different types of AI. While OpenAI, Microsoft, and Anthropic are monetizing generative AI systems like ChatGPT based on large language models (LLMs), Ed Bastian, the CEO of Delta Air Lines, refers to entirely different classes of machine learning when he says, “By driving innovative tools like crew resourcing and turbulence prediction, AI is already making significant contributions to the reliability of our nation’s air travel system.”

So, defining the scope of what AI exactly means—and which applications of AI are new or dangerous—might be one of the key challenges for the new board.

A roundtable of Big Tech CEOs attracts criticism

For the inaugural meeting of the AI Safety and Security Board, the DHS selected a tech industry-heavy group, populated with CEOs of four major AI vendors (Sam Altman of OpenAI, Satya Nadella of Microsoft, Sundar Pichai of Alphabet, and Dario Amodei of Anthopic), CEO Jensen Huang of top AI chipmaker Nvidia, and representatives from other major tech companies like IBM, Adobe, Amazon, Cisco, and AMD. There are also reps from big aerospace and aviation: Northrop Grumman and Delta Air Lines.

Upon reading the announcement, some critics took issue with the board composition. On LinkedIn, founder of The Distributed AI Research Institute (DAIR) Timnit Gebru especially criticized OpenAI’s presence on the board and wrote, “I’ve now seen the full list and it is hilarious. Foxes guarding the hen house is an understatement.”

Critics question tech-heavy lineup of new Homeland Security AI safety board Read More »

customers-say-meta’s-ad-buying-ai-blows-through-budgets-in-a-matter-of-hours

Customers say Meta’s ad-buying AI blows through budgets in a matter of hours

Spending money is just so hard … can’t a computer do it for me? —

Based on your point of view, the AI either doesn’t work or works too well.

AI is here to terminate your bank account.

Enlarge / AI is here to terminate your bank account.

Carolco Pictures

Give the AI access to your credit card, they said. It’ll be fine, they said. Users of Meta’s ad platform who followed that advice have been getting burned by an AI-powered ad purchasing system, according to The Verge. The idea was to use a Meta-developed AI to automatically set up ads and spend your ad budget, saving you the hassle of making decisions about your ad campaign. Apparently, the AI funnels money to Meta a little too well: Customers say it burns, though, what should be daily ad budgets in a matter of hours, and costs are inflated as much as 10-fold.

The AI-powered software in question is the “Advantage+ Shopping Campaign.” The system is supposed to automate a lot of ad setup for you, mixing and matching various creative elements and audience targets. The power of AI-powered advertising (Google has a similar product) is that the ad platform can get instant feedback on its generated ads via click-through rates. You give it a few guard rails, and it can try hundreds or thousands of combinations to find the most clickable ad at a speed and efficiency no human could match. That’s the theory, anyway.

The Verge spoke to “several marketers and businesses” with similar stories of being hit by an AI-powered spending spree once they let Meta’s system take over a campaign. The description of one account says the AI “had blown through roughly 75 percent of the daily ad budgets for both clients in under a couple of hours” and that “the ads’ CPMs, or cost per impressions, were roughly 10 times higher than normal.” Meanwhile, the revenue earned from those AI-powered ads was “nearly zero.” The report says, “Small businesses have seen their ad dollars get wiped out and wasted as a result, and some have said the bouts of overspending are driving them from Meta’s platforms.”

Meta’s Advantage+ sales pitch promises to “Use machine learning to identify and aim for your highest value customers across all of Meta’s family of apps and services, with minimal input.” The service can “Automatically test up to 150 creative combinations and deliver the highest performing ads.” Meta promises that “on average, companies have seen a 17 percent reduction in cost per action [an action is typically a purchase, registration, or sign-up] and a 32 percent increase in return on ad spend.”

In response to the complaints, a Meta spokesperson told The Verge the company had fixed “a few technical issues” and that “Our ads system is working as expected for the vast majority of advertisers. We recently fixed a few technical issues and are researching a small amount of additional reports from advertisers to ensure the best possible results for businesses using our apps.” The Verge got that statement a few weeks ago, though, and advertisers are still having issues. The report describes the service as “unpredictable” and says what “other marketers thought was a one-time glitch by Advantage Plus ended up becoming a recurring incident for weeks.”

To make matters worse, layoffs in Meta’s customer service department mean it’s been difficult to get someone at Meta to deal with the AI’s spending sprees. Some accounts report receiving refunds after complaining, but it can take several tries to get someone at customer service to deal with you and upward of a month to receive a refund. Some customers quoted in the report have decided to return to pre-AI, non-automated way of setting up a Meta ad campaign, which can take “an extra 10 to 20 minutes.”

Customers say Meta’s ad-buying AI blows through budgets in a matter of hours Read More »