AI

ai-cannot-be-used-to-deny-health-care-coverage,-feds-clarify-to-insurers

AI cannot be used to deny health care coverage, feds clarify to insurers

On Notice —

CMS worries AI could wrongfully deny care for those on Medicare Advantage plans.

A nursing home resident is pushed along a corridor by a nurse.

Enlarge / A nursing home resident is pushed along a corridor by a nurse.

Health insurance companies cannot use algorithms or artificial intelligence to determine care or deny coverage to members on Medicare Advantage plans, the Centers for Medicare & Medicaid Services (CMS) clarified in a memo sent to all Medicare Advantage insurers.

The memo—formatted like an FAQ on Medicare Advantage (MA) plan rules—comes just months after patients filed lawsuits claiming that UnitedHealth and Humana have been using a deeply flawed, AI-powered tool to deny care to elderly patients on MA plans. The lawsuits, which seek class-action status, center on the same AI tool, called nH Predict, used by both insurers and developed by NaviHealth, a UnitedHealth subsidiary.

According to the lawsuits, nH Predict produces draconian estimates for how long a patient will need post-acute care in facilities like skilled nursing homes and rehabilitation centers after an acute injury, illness, or event, like a fall or a stroke. And NaviHealth employees face discipline for deviating from the estimates, even though they often don’t match prescribing physicians’ recommendations or Medicare coverage rules. For instance, while MA plans typically provide up to 100 days of covered care in a nursing home after a three-day hospital stay, using nH Predict, patients on UnitedHealth’s MA plan rarely stay in nursing homes for more than 14 days before receiving payment denials, the lawsuits allege.

Specific warning

It’s unclear how nH Predict works exactly, but it reportedly uses a database of 6 million patients to develop its predictions. Still, according to people familiar with the software, it only accounts for a small set of patient factors, not a full look at a patient’s individual circumstances.

This is a clear no-no, according to the CMS’s memo. For coverage decisions, insurers must “base the decision on the individual patient’s circumstances, so an algorithm that determines coverage based on a larger data set instead of the individual patient’s medical history, the physician’s recommendations, or clinical notes would not be compliant,” the CMS wrote.

The CMS then provided a hypothetical that matches the circumstances laid out in the lawsuits, writing:

In an example involving a decision to terminate post-acute care services, an algorithm or software tool can be used to assist providers or MA plans in predicting a potential length of stay, but that prediction alone cannot be used as the basis to terminate post-acute care services.

Instead, the CMS wrote, in order for an insurer to end coverage, the individual patient’s condition must be reassessed, and denial must be based on coverage criteria that is publicly posted on a website that is not password protected. In addition, insurers who deny care “must supply a specific and detailed explanation why services are either no longer reasonable and necessary or are no longer covered, including a description of the applicable coverage criteria and rules.”

In the lawsuits, patients claimed that when coverage of their physician-recommended care was unexpectedly wrongfully denied, insurers didn’t give them full explanations.

Fidelity

In all, the CMS finds that AI tools can be used by insurers when evaluating coverage—but really only as a check to make sure the insurer is following the rules. An “algorithm or software tool should only be used to ensure fidelity,” with coverage criteria, the CMS wrote. And, because “publicly posted coverage criteria are static and unchanging, artificial intelligence cannot be used to shift the coverage criteria over time” or apply hidden coverage criteria.

The CMS sidesteps any debate about what qualifies as artificial intelligence by offering a broad warning about algorithms and artificial intelligence. “There are many overlapping terms used in the context of rapidly developing software tools,” the CMS wrote.

Algorithms can imply a decisional flow chart of a series of if-then statements (i.e., if the patient has a certain diagnosis, they should be able to receive a test), as well as predictive algorithms (predicting the likelihood of a future admission, for example). Artificial intelligence has been defined as a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. Artificial intelligence systems use machine- and human-based inputs to perceive real and virtual environments; abstract such perceptions into models through analysis in an automated manner; and use model inference to formulate options for information or action.

The CMS also openly worried that the use of either of these types of tools can reinforce discrimination and biases—which has already happened with racial bias. The CMS warned insurers to ensure any AI tool or algorithm they use “is not perpetuating or exacerbating existing bias, or introducing new biases.”

While the memo overall was an explicit clarification of existing MA rules, the CMS ended by putting insurers on notice that it is increasing its audit activities and “will be monitoring closely whether MA plans are utilizing and applying internal coverage criteria that are not found in Medicare laws.” Non-compliance can result in warning letters, corrective action plans, monetary penalties, and enrollment and marketing sanctions.

AI cannot be used to deny health care coverage, feds clarify to insurers Read More »

your-current-pc-probably-doesn’t-have-an-ai-processor,-but-your-next-one-might

Your current PC probably doesn’t have an AI processor, but your next one might

Intel's Core Ultra chips are some of the first x86 PC processors to include built-in NPUs. Software support will slowly follow.

Enlarge / Intel’s Core Ultra chips are some of the first x86 PC processors to include built-in NPUs. Software support will slowly follow.

Intel

When it announced the new Copilot key for PC keyboards last month, Microsoft declared 2024 “the year of the AI PC.” On one level, this is just an aspirational PR-friendly proclamation, meant to show investors that Microsoft intends to keep pushing the AI hype cycle that has put it in competition with Apple for the title of most valuable publicly traded company.

But on a technical level, it is true that PCs made and sold in 2024 and beyond will generally include AI and machine-learning processing capabilities that older PCs don’t. The main thing is the neural processing unit (NPU), a specialized block on recent high-end Intel and AMD CPUs that can accelerate some kinds of generative AI and machine-learning workloads more quickly (or while using less power) than the CPU or GPU could.

Qualcomm’s Windows PCs were some of the first to include an NPU, since the Arm processors used in most smartphones have included some kind of machine-learning acceleration for a few years now (Apple’s M-series chips for Macs all have them, too, going all the way back to 2020’s M1). But the Arm version of Windows is a insignificantly tiny sliver of the entire PC market; x86 PCs with Intel’s Core Ultra chips, AMD’s Ryzen 7040/8040-series laptop CPUs, or the Ryzen 8000G desktop CPUs will be many mainstream PC users’ first exposure to this kind of hardware.

Right now, even if your PC has an NPU in it, Windows can’t use it for much, aside from webcam background blurring and a handful of other video effects. But that’s slowly going to change, and part of that will be making it relatively easy for developers to create NPU-agnostic apps in the same way that PC game developers currently make GPU-agnostic games.

The gaming example is instructive, because that’s basically how Microsoft is approaching DirectML, its API for machine-learning operations. Though up until now it has mostly been used to run these AI workloads on GPUs, Microsoft announced last week that it was adding DirectML support for Intel’s Meteor Lake NPUs in a developer preview, starting in DirectML 1.13.1 and ONNX Runtime 1.17.

Though it will only run an unspecified “subset of machine learning models that have been targeted for support” and that some “may not run at all or may have high latency or low accuracy,” it opens the door to more third-party apps to start taking advantage of built-in NPUs. Intel says that Samsung is using Intel’s NPU and DirectML for facial recognition features in its photo gallery app, something that Apple also uses its Neural Engine for in macOS and iOS.

The benefits can be substantial, compared to running those workloads on a GPU or CPU.

“The NPU, at least in Intel land, will largely be used for power efficiency reasons,” Intel Senior Director of Technical Marketing Robert Hallock told Ars in an interview about Meteor Lake’s capabilities. “Camera segmentation, this whole background blurring thing… moving that to the NPU saves about 30 to 50 percent power versus running it elsewhere.”

Intel and Microsoft are both working toward a model where NPUs are treated pretty much like GPUs are today: developers generally target DirectX rather than a specific graphics card manufacturer or GPU architecture, and new features, one-off bug fixes, and performance improvements can all be addressed via GPU driver updates. Some GPUs run specific games better than others, and developers can choose to spend more time optimizing for Nvidia cards or AMD cards, but generally the model is hardware agnostic.

Similarly, Intel is already offering GPU-style driver updates for its NPUs. And Hallock says that Windows already essentially recognizes the NPU as “a graphics card with no rendering capability.”

Your current PC probably doesn’t have an AI processor, but your next one might Read More »

microsoft-in-deal-with-semafor-to-create-news-stories-with-aid-of-ai-chatbot

Microsoft in deal with Semafor to create news stories with aid of AI chatbot

a meeting-deadline helper —

Collaboration comes as tech giant faces multibillion-dollar lawsuit from The New York Times.

Cube with Microsoft logo on top of their office building on 8th Avenue and 42nd Street near Times Square in New York City.

Enlarge / Cube with Microsoft logo on top of their office building on 8th Avenue and 42nd Street near Times Square in New York City.

Microsoft is working with media startup Semafor to use its artificial intelligence chatbot to help develop news stories—part of a journalistic outreach that comes as the tech giant faces a multibillion-dollar lawsuit from the New York Times.

As part of the agreement, Microsoft is paying an undisclosed sum of money to Semafor to sponsor a breaking news feed called “Signals.” The companies would not share financial details, but the amount of money is “substantial” to Semafor’s business, said a person familiar with the matter.

Signals will offer a feed of breaking news and analysis on big stories, with about a dozen posts a day. The goal is to offer different points of view from across the globe—a key focus for Semafor since its launch in 2022.

Semafor co-founder Ben Smith emphasized that Signals will be written entirely by journalists, with artificial intelligence providing a research tool to inform posts.

Microsoft on Monday was also set to announce collaborations with journalist organizations including the Craig Newmark School of Journalism, the Online News Association, and the GroundTruth Project.

The partnerships come as media companies have become increasingly concerned over generative AI and its potential threat to their businesses. News publishers are grappling with how to use AI to improve their work and stay ahead of technology, while also fearing that they could lose traffic, and therefore revenue, to AI chatbots—which can churn out humanlike text and information in seconds.

The New York Times in December filed a lawsuit against Microsoft and OpenAI, alleging the tech companies have taken a “free ride” on millions of its articles to build their artificial intelligence chatbots, and seeking billions of dollars in damages.

Gina Chua, Semafor’s executive editor, has been involved in developing Semafor’s AI research tools, which are powered by ChatGPT and Microsoft’s Bing.

“Journalism has always used technology whether it’s carrier pigeons, the telegraph or anything else . . . this represents a real opportunity, a set of tools that are really a quantum leap above many of the other tools that have come along,” Chua said.

For a breaking news event, Semafor journalists will use AI tools to quickly search for reporting and commentary from other news sources across the globe in multiple languages. A Signals post might include perspectives from Chinese, Indian, or Russian media, for example, with Semafor’s reporters summarizing and contextualizing the different points of view, while citing its sources.

Noreen Gillespie, a former Associated Press journalist, joined Microsoft three months ago to forge relationships with news companies. “Journalists need to adopt these tools in order to survive and thrive for another generation,” she said.

Semafor was founded by Ben Smith, the former BuzzFeed editor, and Justin Smith, the former chief executive of Bloomberg Media.

Semafor, which is free to read, is funded by wealthy individuals, including 3G capital founder Jorge Paulo Lemann and KKR co-founder Henry Kravis. The company made more than $10 million in revenue in 2023 and has more than 500,000 subscriptions to its free newsletters. Justin Smith said Semafor was “very close to a profit” in the fourth quarter of 2023.

“What we’re trying to go after is this really weird space of breaking news on the Internet now, in which you have these really splintered, fragmented, rushed efforts to get the first sentence of a story out for search engines . . . and then never really make any effort to provide context,” Ben Smith said.

“We’re trying to go the other way. Here are the confirmed facts. Here are three or four pieces of really sophisticated, meaningful analysis.”

© 2024 The Financial Times Ltd. All rights reserved. Please do not copy and paste FT articles and redistribute by email or post to the web.

Microsoft in deal with Semafor to create news stories with aid of AI chatbot Read More »

chatgpt’s-new-@-mentions-bring-multiple-personalities-into-your-ai-convo

ChatGPT’s new @-mentions bring multiple personalities into your AI convo

team of rivals —

Bring different AI roles into the same chatbot conversation history.

Illustration of a man jugging at symbols.

Enlarge / With so many choices, selecting the perfect GPT can be confusing.

On Tuesday, OpenAI announced a new feature in ChatGPT that allows users to pull custom personalities called “GPTs” into any ChatGPT conversation with the @ symbol. It allows a level of quasi-teamwork within ChatGPT among expert roles that was previously impractical, making collaborating with a team of AI agents within OpenAI’s platform one step closer to reality.

You can now bring GPTs into any conversation in ChatGPT – simply type @ and select the GPT,” wrote OpenAI on the social media network X. “This allows you to add relevant GPTs with the full context of the conversation.”

OpenAI introduced GPTs in November as a way to create custom personalities or roles for ChatGPT to play. For example, users can build their own GPTs to focus on certain topics or certain skills. Paid ChatGPT subscribers can also freely download a host of GPTs developed by other ChatGPT users through the GPT Store.

Previously, if you wanted to share information between GPT profiles, you had to copy the text, select a new chat with the GPT, paste it, and explain the context of what the information means or what you want to do with it. Now, ChatGPT users can stay in the default ChatGPT window and bring in GPTs as needed without losing the history of the conversation.

For example, we created a “Wellness Guide” GPT that is crafted as an expert in human health conditions (of course, this being ChatGPT, always consult a human doctor if you’re having medical problems), and we created a “Canine Health Advisor” for dog-related health questions.

A screenshot of ChatGPT where we @-mentioned a human wellness advisor, then a dog advisor in the same conversation history.

Enlarge / A screenshot of ChatGPT where we @-mentioned a human wellness advisor, then a dog advisor in the same conversation history.

Benj Edwards

We started in a default ChatGPT chat, hit the @ symbol, then typed the first few letters of “Wellness” and selected it from a list. It filled out the rest. We asked a question about food poisoning in humans, and then we switched to the canine advisor in the same way with an @ symbol and asked about the dog.

Using this feature, you could alternatively consult, say, an “ad copywriter” GPT and an “editor” GPT—ask the copywriter to write some text, then rope in the editor GPT to check it, looking at it from a different angle. Different system prompts (the instructions that define a GPT’s personality) make for significant behavior differences.

We also tried swapping between GPT profiles that write software and others designed to consult on historical tech subjects. Interestingly, ChatGPT does not differentiate between GPTs as different personalities as you change. It will still say, “I did this earlier” when a different GPT is talking about a previous GPT’s output in the same conversation history. From its point of view, it’s just ChatGPT and not multiple agents.

From our vantage point, this feature seems to represent baby steps toward a future where GPTs, as independent agents, could work together as a team to fulfill more complex tasks directed by the user. Similar experiments have been done outside of OpenAI in the past (using API access), but OpenAI has so far resisted a more agentic model for ChatGPT. As we’ve seen (first with GPTs and now with this), OpenAI seems to be slowly angling toward that goal itself, but only time will tell if or when we see true agentic teamwork in a shipping service.

ChatGPT’s new @-mentions bring multiple personalities into your AI convo Read More »

rhyming-ai-powered-clock-sometimes-lies-about-the-time,-makes-up-words

Rhyming AI-powered clock sometimes lies about the time, makes up words

Confabulation time —

Poem/1 Kickstarter seeks $103K for fun ChatGPT-fed clock that may hallucinate the time.

A CAD render of the Poem/1 sitting on a bookshelf.

Enlarge / A CAD render of the Poem/1 sitting on a bookshelf.

On Tuesday, product developer Matt Webb launched a Kickstarter funding project for a whimsical e-paper clock called the “Poem/1” that tells the current time using AI and rhyming poetry. It’s powered by the ChatGPT API, and Webb says that sometimes ChatGPT will lie about the time or make up words to make the rhymes work.

“Hey so I made a clock. It tells the time with a brand new poem every minute, composed by ChatGPT. It’s sometimes profound, and sometimes weird, and occasionally it fibs about what the actual time is to make a rhyme work,” Webb writes on his Kickstarter page.

The $126 clock is the product of Webb’s Acts Not Facts, which he bills as “.” Despite the net-connected service aspect of the clock, Webb says it will not require a subscription to function.

A labeled CAD rendering of the Poem/1 clock, representing its final shipping configuration.

Enlarge / A labeled CAD rendering of the Poem/1 clock, representing its final shipping configuration.

There are 1,440 minutes in a day, so Poem/1 needs to display 1,440 unique poems to work. The clock features a monochrome e-paper screen and pulls its poetry rhymes via Wi-Fi from a central server run by Webb’s company. To save money, that server pulls poems from ChatGPT’s API and will share them out to many Poem/1 clocks at once. This prevents costly API fees that would add up if your clock were querying OpenAI’s servers 1,440 times a day, non-stop, forever. “I’m reserving a % of the retail price from each clock in a bank account to cover AI and server costs for 5 years,” Webb writes.

For hackers, Webb says that you’ll be able to change the back-end server URL of the Poem/1 from the default to whatever you want, so it can display custom text every minute of the day. Webb says he will document and publish the API when Poem/1 ships.

Hallucination time

A photo of a Poem/1 prototype with a hallucinated time, according to Webb.

Enlarge / A photo of a Poem/1 prototype with a hallucinated time, according to Webb.

Given the Poem/1’s large language model pedigree, it’s perhaps not surprising that Poem/1 may sometimes make up things (also called “hallucination” or “confabulation” in the AI field) to fulfill its task. The LLM that powers ChatGPT is always searching for the most likely next word in a sequence, and sometimes factuality comes second to fulfilling that mission.

Further down on the Kickstarter page, Webb provides a photo of his prototype Poem/1 where the screen reads, “As the clock strikes eleven forty two, / I rhyme the time, as I always do.” Just below, Webb warns, “Poem/1 fibs occasionally. I don’t believe it was actually 11.42 when this photo was taken. The AI hallucinated the time in order to make the poem work. What we do for art…”

In other clocks, the tendency to unreliably tell the time might be a fatal flaw. But judging by his humorous angle on the Kickstarter page, Webb apparently sees the clock as more of a fun art project than a precision timekeeping instrument. “Don’t rely on this clock in situations where timekeeping is vital,” Webb writes, “such as if you work in air traffic control or rocket launches or the finish line of athletics competitions.”

Poem/1 also sometimes takes poetic license with vocabulary to tell the time. During a humorous moment in the Kickstarter promotional video, Webb looks at his clock prototype and reads the rhyme, “A clock that defies all rhyme and reason / 4: 30 PM, a temporal teason.” Then he says, “I had to look ‘teason’ up. It doesn’t mean anything, so it’s a made-up word.”

Rhyming AI-powered clock sometimes lies about the time, makes up words Read More »

chatgpt-is-leaking-passwords-from-private-conversations-of-its-users,-ars-reader-says

ChatGPT is leaking passwords from private conversations of its users, Ars reader says

OPENAI SPRINGS A LEAK —

Names of unpublished research papers, presentations, and PHP scripts also leaked.

OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen.

Getty Images

ChatGPT is leaking private conversations that include login credentials and other personal details of unrelated users, screenshots submitted by an Ars reader on Monday indicated.

Two of the seven screenshots the reader submitted stood out in particular. Both contained multiple pairs of usernames and passwords that appeared to be connected to a support system used by employees of a pharmacy prescription drug portal. An employee using the AI chatbot seemed to be troubleshooting problems that encountered while using the portal.

“Horrible, horrible, horrible”

“THIS is so f-ing insane, horrible, horrible, horrible, i cannot believe how poorly this was built in the first place, and the obstruction that is being put in front of me that prevents it from getting better,” the user wrote. “I would fire [redacted name of software] just for this absurdity if it was my choice. This is wrong.”

Besides the candid language and the credentials, the leaked conversation includes the name of the app the employee is troubleshooting and the store number where the problem occurred.

The entire conversation goes well beyond what’s shown in the redacted screenshot above. A link Ars reader Chase Whiteside included showed the chat conversation in its entirety. The URL disclosed additional credential pairs.

The results appeared Monday morning shortly after reader Whiteside had used ChatGPT for an unrelated query.

“I went to make a query (in this case, help coming up with clever names for colors in a palette) and when I returned to access moments later, I noticed the additional conversations,” Whiteside wrote in an email. “They weren’t there when I used ChatGPT just last night (I’m a pretty heavy user). No queries were made—they just appeared in my history, and most certainly aren’t from me (and I don’t think they’re from the same user either).”

Other conversations leaked to Whiteside include the name of a presentation someone was working on, details of an unpublished research proposal, and a script using the PHP programming language. The users for each leaked conversation appeared to be different and unrelated to each other. The conversation involving the prescription portal included the year 2020. Dates didn’t appear in the other conversations.

The episode, and others like it, underscore the wisdom of stripping out personal details from queries made to ChatGPT and other AI services whenever possible. Last March, ChatGPT maker OpenAI took the AI chatbot offline after a bug caused the site to show titles from one active user’s chat history to unrelated users.

In November, researchers published a paper reporting how they used queries to prompt ChatGPT into divulging email addresses, phone and fax numbers, physical addresses, and other private data that was included in material used to train the ChatGPT large language model.

Concerned about the possibility of proprietary or private data leakage, companies, including Apple, have restricted their employees’ use of ChatGPT and similar sites.

As mentioned in an article from December when multiple people found that Ubiquity’s UniFy devices broadcasted private video belonging to unrelated users, these sorts of experiences are as old as the Internet is. As explained in the article:

The precise root causes of this type of system error vary from incident to incident, but they often involve “middlebox” devices, which sit between the front- and back-end devices. To improve performance, middleboxes cache certain data, including the credentials of users who have recently logged in. When mismatches occur, credentials for one account can be mapped to a different account.

An OpenAI representative said the company was investigating the report.

ChatGPT is leaking passwords from private conversations of its users, Ars reader says Read More »

openai-and-common-sense-media-partner-to-protect-teens-from-ai-harms-and-misuse

OpenAI and Common Sense Media partner to protect teens from AI harms and misuse

Adventures in chatbusting —

Site gave ChatGPT 3 stars and 48% privacy score: “Best used for creativity, not facts.”

Boy in Living Room Wearing Robot Mask

On Monday, OpenAI announced a partnership with the nonprofit Common Sense Media to create AI guidelines and educational materials targeted at parents, educators, and teens. It includes the curation of family-friendly GPTs in OpenAI’s GPT store. The collaboration aims to address concerns about the impacts of AI on children and teenagers.

Known for its reviews of films and TV shows aimed at parents seeking appropriate media for their kids to watch, Common Sense Media recently branched out into AI and has been reviewing AI assistants on its site.

“AI isn’t going anywhere, so it’s important that we help kids understand how to use it responsibly,” Common Sense Media wrote on X. “That’s why we’ve partnered with @OpenAI to help teens and families safely harness the potential of AI.”

OpenAI CEO Sam Altman and Common Sense Media CEO James Steyer announced the partnership onstage in San Francisco at the Common Sense Summit for America’s Kids and Families, an event that was well-covered by media members on the social media site X.

For his part, Altman offered a canned statement in the press release, saying, “AI offers incredible benefits for families and teens, and our partnership with Common Sense will further strengthen our safety work, ensuring that families and teens can use our tools with confidence.”

The announcement feels slightly non-specific in the official news release, with Steyer offering, “Our guides and curation will be designed to educate families and educators about safe, responsible use of ChatGPT, so that we can collectively avoid any unintended consequences of this emerging technology.”

The partnership seems aimed mostly at bringing a patina of family-friendliness to OpenAI’s GPT store, with the most solid reveal being the aforementioned fact that Common Sense media will help with the “curation of family-friendly GPTs in the GPT Store based on Common Sense ratings and standards.”

Common Sense AI reviews

As mentioned above, Common Sense Media began reviewing AI assistants on its site late last year. This puts Common Sense Media in an interesting position with potential conflicts of interest regarding the new partnership with OpenAI. However, it doesn’t seem to be offering any favoritism to OpenAI so far.

For example, Common Sense Media’s review of ChatGPT calls the AI assistant “A powerful, at times risky chatbot for people 13+ that is best used for creativity, not facts.” It labels ChatGPT as being suitable for ages 13 and up (which is in OpenAI’s Terms of Service) and gives the OpenAI assistant three out of five stars. ChatGPT also scores a 48 percent privacy rating (which is oddly shown as 55 percent on another page that goes into privacy details). The review we cited was last updated on October 13, 2023, as of this writing.

For reference, Google Bard gets a three-star overall rating and a 75 percent privacy rating in its Common Sense Media review. Stable Diffusion, the image synthesis model, nets a one-star rating with the description, “Powerful image generator can unleash creativity, but is wildly unsafe and perpetuates harm.” OpenAI’s DALL-E gets two stars and a 48 percent privacy rating.

The information that Common Sense Media includes about each AI model appears relatively accurate and detailed (and the organization cited an Ars Technica article as a reference in one explanation), so they feel fair, even in the face of the OpenAI partnership. Given the low scores, it seems that most AI models aren’t off to a great start, but that may change. It’s still early days in generative AI.

OpenAI and Common Sense Media partner to protect teens from AI harms and misuse Read More »

openai-updates-chatgpt-4-model-with-potential-fix-for-ai-“laziness”-problem

OpenAI updates ChatGPT-4 model with potential fix for AI “laziness” problem

Break’s over —

Also, new GPT-3.5 Turbo model, lower API prices, and other model updates.

A lazy robot (a man with a box on his head) sits on the floor beside a couch.

On Thursday, OpenAI announced updates to the AI models that power its ChatGPT assistant. Amid less noteworthy updates, OpenAI tucked in a mention of a potential fix to a widely reported “laziness” problem seen in GPT-4 Turbo since its release in November. The company also announced a new GPT-3.5 Turbo model (with lower pricing), a new embedding model, an updated moderation model, and a new way to manage API usage.

“Today, we are releasing an updated GPT-4 Turbo preview model, gpt-4-0125-preview. This model completes tasks like code generation more thoroughly than the previous preview model and is intended to reduce cases of ‘laziness’ where the model doesn’t complete a task,” writes OpenAI in its blog post.

Since the launch of GPT-4 Turbo, a large number of ChatGPT users have reported that the ChatGPT-4 version of its AI assistant has been declining to do tasks (especially coding tasks) with the same exhaustive depth as it did in earlier versions of GPT-4. We’ve seen this behavior ourselves while experimenting with ChatGPT over time.

OpenAI has never offered an official explanation for this change in behavior, but OpenAI employees have previously acknowledged on social media that the problem is real, and the ChatGPT X account wrote in December, “We’ve heard all your feedback about GPT4 getting lazier! we haven’t updated the model since Nov 11th, and this certainly isn’t intentional. model behavior can be unpredictable, and we’re looking into fixing it.”

We reached out to OpenAI asking if it could provide an official explanation for the laziness issue but did not receive a response by press time.

New GPT-3.5 Turbo, other updates

Elsewhere in OpenAI’s blog update, the company announced a new version of GPT-3.5 Turbo (gpt-3.5-turbo-0125), which it says will offer “various improvements including higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.”

And the cost of GPT-3.5 Turbo through OpenAI’s API will decrease for the third time this year “to help our customers scale.” New input token prices are 50 percent less, at $0.0005 per 1,000 input tokens, and output prices are 25 percent less, at $0.0015 per 1,000 output tokens.

Lower token prices for GPT-3.5 Turbo will make operating third-party bots significantly less expensive, but the GPT-3.5 model is generally more likely to confabulate than GPT-4 Turbo. So we might see more scenarios like Quora’s bot telling people that eggs can melt (although the instance used a now-deprecated GPT-3 model called text-davinci-003). If GPT-4 Turbo API prices drop over time, some of those hallucination issues with third parties might eventually go away.

OpenAI also announced new embedding models, text-embedding-3-small and text-embedding-3-large, which convert content into numerical sequences, aiding in machine learning tasks like clustering and retrieval. And an updated moderation model, text-moderation-007, is part of the company’s API that “allows developers to identify potentially harmful text,” according to OpenAI.

Finally, OpenAI is rolling out improvements to its developer platform, introducing new tools for managing API keys and a new dashboard for tracking API usage. Developers can now assign permissions to API keys from the API keys page, helping to clamp down on misuse of API keys (if they get into the wrong hands) that can potentially cost developers lots of money. The API dashboard allows devs to “view usage on a per feature, team, product, or project level, simply by having separate API keys for each.”

As the media world seemingly swirls around the company with controversies and think pieces about the implications of its tech, releases like these show that the dev teams at OpenAI are still rolling along as usual with updates at a fairly regular pace. Despite the company almost completely falling apart late last year, it seems that, under the hood, it’s business as usual for OpenAI.

OpenAI updates ChatGPT-4 model with potential fix for AI “laziness” problem Read More »

george-carlin’s-heirs-sue-comedy-podcast-over-“ai-generated”-impression

George Carlin’s heirs sue comedy podcast over “AI-generated” impression

AI’ll see you in court —

Suit alleges copyright infringement and illegal use of Carlin’s name and likeness.

A promotional image cited in the lawsuit uses Carlin's name and image to promote the Dudsey podcast and special.

Enlarge / A promotional image cited in the lawsuit uses Carlin’s name and image to promote the Dudsey podcast and special.

The estate of George Carlin has filed a federal lawsuit against the comedy podcast Dudesy for an hour-long comedy special sold as an AI-generated impression of the late comedian.

In the lawsuit, filed by Carlin manager Jerold Hamza in a California district court, the Carlin estate points out that the special, “George Carlin: I’m Glad I’m Dead,” presents itself as being created by an AI trained on decades worth of Carlin’s material. That training would, by definition, involve making “unauthorized copies” of “Carlin’s original, copyrighted routines” without permission in order “to fabricate a semblance of Carlin’s voice and generate a Carlin stand-up comedy routine,” according to the lawsuit.

“Defendants’ AI-generated ‘George Carlin Special’ is not a creative work,” the lawsuit reads, in part. “It is a piece of computer-generated click-bait which detracts from the value of Carlin’s comedic works and harms his reputation. It is a casual theft of a great American artist’s work.”

The Dudesy special “George Carlin: I’m glad I’m dead.

The use of copyrighted material in AI training models is one of the most contentious and unsettled areas of law in the AI field at the moment. Just this month, media organizations testified before Congress to argue against AI makers’ claims that training on news content was legal under a “fair use” exemption.

The Dudesy special is presented as an “impression” of Carlin that the AI generated by “listening” to Carlin’s existing material “in the exact same way a human impressionist would.” But the lawsuit takes direct issue with this analogy, arguing that an AI model is just an “output generated by a technological process that is an unlawful appropriation of Carlin’s identity, which also damages the value of Carlin’s real work and his legacy.”

In his image

There is some debate as to whether the Dudesy special was actually written by a specially trained AI, as Ars laid out in detail this week. But even a special that was partially or fully human-written would be guilty of unauthorized use of Carlin’s name and likeness for promotional purposes, according to the lawsuit.

“Defendants always presented the Dudesy Special as an AI-generated George Carlin comedy special, where George Carlin was ‘resurrected’ with the use of modern technology,” the lawsuit argues. “In short, Defendants sought to capitalize on the name, reputation, and likeness of George Carlin in creating, promoting, and distributing the Dudesy Special and using generated images of Carlin, Carlin’s voice, and images designed to evoke Carlin’s presence on a stage.”

A Dudesy-generated image representing AI's impending replacement of human stand-up comedy.

Enlarge / A Dudesy-generated image representing AI’s impending replacement of human stand-up comedy.

While the special doesn’t present images or video of Carlin (AI-generated or not), the YouTube thumbnail for the video shows an AI-generated image of a comedian with Carlin’s signature gray ponytail looking out over an audience. The lawsuit also cites numerous social media posts where Carlin’s name and image are used to promote the special or the Dudesy podcast.

That creates an “association” between the Dudesy podcast and Carlin that is “harmful to Carlin’s reputation, his legacy, and to the value of his real work,” according to the lawsuit. “Worse, if not curtailed now, future AI models may incorrectly associate the Dudesy Special with Carlin, ultimately folding Defendants’ knockoff version in with Carlin’s actual creative output.”

Anticipating potential free speech defenses, the lawsuit argues that the special “has no comedic or creative value absent its self-proclaimed connection with George Carlin” and that it doesn’t “satirize him as a performer or offer an independent critique of society.”

Kelly Carlin, the late comedian’s daughter, told The Daily Beast earlier this month that she was talking to lawyers about potential legal action. “It’s not his material. It’s not his voice,” she said at the time. “So they need to take the name off because it is not George Carlin.”

“The ‘George Carlin’ in that video is not the beautiful human who defined his generation and raised me with love,” Kelly Carlin wrote in a statement obtained by Variety. “It is a poorly executed facsimile cobbled together by unscrupulous individuals to capitalize on the extraordinary goodwill my father established with his adoring fanbase.”

The lawsuit asks a court to force Dudesy to “remove, take down, and destroy any video or audio copies… of the ‘George Carlin Special,’ wherever they may be located,” as well as pay punitive damages.

George Carlin’s heirs sue comedy podcast over “AI-generated” impression Read More »

google’s-latest-ai-video-generator-can-render-cute-animals-in-implausible-situations

Google’s latest AI video generator can render cute animals in implausible situations

An elephant with a party hat—underwater —

Lumiere generates five-second videos that “portray realistic, diverse and coherent motion.”

Still images of AI-generated video examples provided by Google for its Lumiere video synthesis model.

Enlarge / Still images of AI-generated video examples provided by Google for its Lumiere video synthesis model.

On Tuesday, Google announced Lumiere, an AI video generator that it calls “a space-time diffusion model for realistic video generation” in the accompanying preprint paper. But let’s not kid ourselves: It does a great job at creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.

According to Google, Lumiere utilizes unique architecture to generate a video’s entire temporal duration in one go. Or, as the company put it, “We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution—an approach that inherently makes global temporal consistency difficult to achieve.”

In layperson terms, Google’s tech is designed to handle both the space (where things are in the video) and time (how things move and change throughout the video) aspects simultaneously. So, instead of making a video by putting together many small parts or frames, it can create the entire video, from start to finish, in one smooth process.

The official promotional video accompanying the paper “Lumiere: A Space-Time Diffusion Model for Video Generation,” released by Google.

Lumiere can also do plenty of party tricks, which are laid out quite well with examples on Google’s demo page. For example, it can perform text-to-video generation (turning a written prompt into a video), convert still images into videos, generate videos in specific styles using a reference image, apply consistent video editing using text-based prompts, create cinemagraphs by animating specific regions of an image, and offer video inpainting capabilities (for example, it can change the type of dress a person is wearing).

In the Lumiere research paper, the Google researchers state that the AI model outputs five-second long 1024×1024 pixel videos, which they describe as “low-resolution.” Despite those limitations, the researchers performed a user study and claim that Lumiere’s outputs were preferred over existing AI video synthesis models.

As for training data, Google doesn’t say where it got the videos they fed into Lumiere, writing, “We train our T2V [text to video] model on a dataset containing 30M videos along with their text caption. [sic] The videos are 80 frames long at 16 fps (5 seconds). The base model is trained at 128×128.”

A block diagram showing components of the Lumiere AI model, provided by Google.

Enlarge / A block diagram showing components of the Lumiere AI model, provided by Google.

AI-generated video is still in a primitive state, but it’s been progressing in quality over the past two years. In October 2022, we covered Google’s first publicly unveiled image synthesis model, Imagen Video. It could generate short 1280×768 video clips from a written prompt at 24 frames per second, but the results weren’t always coherent. Before that, Meta debuted its AI video generator, Make-A-Video. In June of last year, Runway’s Gen2 video synthesis model enabled the creation of two-second video clips from text prompts, fueling the creation of surrealistic parody commercials. And in November, we covered Stable Video Diffusion, which can generate short clips from still images.

AI companies often demonstrate video generators with cute animals because generating coherent, non-deformed humans is currently difficult—especially since we, as humans (you are human, right?), are adept at noticing any flaws in human bodies or how they move. Just look at AI-generated Will Smith eating spaghetti.

Judging by Google’s examples (and not having used it ourselves), Lumiere appears to surpass these other AI video generation models. But since Google tends to keep its AI research models close to its chest, we’re not sure when, if ever, the public may have a chance to try it for themselves.

As always, whenever we see text-to-video synthesis models getting more capable, we can’t help but think of the future implications for our Internet-connected society, which is centered around sharing media artifacts—and the general presumption that “realistic” video typically represents real objects in real situations captured by a camera. Future video synthesis tools more capable than Lumiere will make deceptive deepfakes trivially easy to create.

To that end, in the “Societal Impact” section of the Lumiere paper, the researchers write, “Our primary goal in this work is to enable novice users to generate visual content in an creative and flexible way. [sic] However, there is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases in order to ensure a safe and fair use.”

Google’s latest AI video generator can render cute animals in implausible situations Read More »

a-“robot”-should-be-chemical,-not-steel,-argues-man-who-coined-the-word

A “robot” should be chemical, not steel, argues man who coined the word

Dispatch from 1935 —

Čapek: “The world needed mechanical robots, for it believes in machines more than it believes in life.”

In 1921, Czech playwright Karel Čapek and his brother Josef invented the word “robot” in a sci-fi play called R.U.R. (short for Rossum’s Universal Robots). As Even Ackerman in IEEE Spectrum points out, Čapek wasn’t happy about how the term’s meaning evolved to denote mechanical entities, straying from his original concept of artificial human-like beings based on chemistry.

In a newly translated column called “The Author of the Robots Defends Himself,” published in Lidové Noviny on June 9, 1935, Čapek expresses his frustration about how his original vision for robots was being subverted. His arguments still apply to both modern robotics and AI. In this column, he referred to himself in the third-person:

For his robots were not mechanisms. They were not made of sheet metal and cogwheels. They were not a celebration of mechanical engineering. If the author was thinking of any of the marvels of the human spirit during their creation, it was not of technology, but of science. With outright horror, he refuses any responsibility for the thought that machines could take the place of people, or that anything like life, love, or rebellion could ever awaken in their cogwheels. He would regard this somber vision as an unforgivable overvaluation of mechanics or as a severe insult to life.

This recently resurfaced article comes courtesy of a new English translation of Čapek’s play called R.U.R. and the Vision of Artificial Life accompanied by 20 essays on robotics, philosophy, politics, and AI. The editor, Jitka Čejková, a professor at the Chemical Robotics Laboratory in Prague, aligns her research with Čapek’s original vision. She explores “chemical robots”—microparticles resembling living cells—which she calls “liquid robots.”

Enlarge / “An assistant of inventor Captain Richards works on the robot the Captain has invented, which speaks, answers questions, shakes hands, tells the time and sits down when it’s told to.” – September 1928

In Čapek’s 1935 column, he clarifies that his robots were not intended to be mechanical marvels, but organic products of modern chemistry, akin to living matter. Čapek emphasizes that he did not want to glorify mechanical systems but to explore the potential of science, particularly chemistry. He refutes the idea that machines could replace humans or develop emotions and consciousness.

The author of the robots would regard it as an act of scientific bad taste if he had brought something to life with brass cogwheels or created life in the test tube; the way he imagined it, he created only a new foundation for life, which began to behave like living matter, and which could therefore have become a vehicle of life—but a life which remains an unimaginable and incomprehensible mystery. This life will reach its fulfillment only when (with the aid of considerable inaccuracy and mysticism) the robots acquire souls. From which it is evident that the author did not invent his robots with the technological hubris of a mechanical engineer, but with the metaphysical humility of a spiritualist.

The reason for the transition from chemical to mechanical in the public perception of robots isn’t entirely clear (though Čapek does mention a Russian film which went the mechanical route and was likely influential). The early 20th century was a period of rapid industrialization and technological advancement that saw the emergence of complex machinery and electronic automation, which probably influenced the public and scientific community’s perception of autonomous beings, leading them to associate the idea of robots with mechanical and electronic devices rather than chemical creations.

The 1935 piece is full of interesting quotes (you can read the whole thing in IEEE Spectrum or here), and we’ve grabbed a few highlights below that you can conveniently share with your robot-loving friends to blow their minds:

  • “He pronounces that his robots were created quite differently—that is, by a chemical path”
  • “He has learned, without any great pleasure, that genuine steel robots have started to appear”
  • “Well then, the author cannot be blamed for what might be called the worldwide humbug over the robots.”
  • “The world needed mechanical robots, for it believes in machines more than it believes in life; it is fascinated more by the marvels of technology than by the miracle of life.”

So it seems, over 100 years later, that we’ve gotten it wrong all along. Čapek’s vision, rooted in chemical synthesis and the philosophical mysteries of life, offers a different narrative from the predominant mechanical and electronic interpretation of robots we know today. But judging from what Čapek wrote, it sounds like he would be firmly against AI takeover scenarios. In fact, Čapek, who died in 1938, probably would think they would be impossible.

A “robot” should be chemical, not steel, argues man who coined the word Read More »

deepmind-ai-rivals-the-world’s-smartest-high-schoolers-at-geometry

DeepMind AI rivals the world’s smartest high schoolers at geometry

Demis Hassabis, CEO of DeepMind Technologies and developer of AlphaGO, attends the AI Safety Summit at Bletchley Park on November 2, 2023 in Bletchley, England.

Enlarge / Demis Hassabis, CEO of DeepMind Technologies and developer of AlphaGO, attends the AI Safety Summit at Bletchley Park on November 2, 2023 in Bletchley, England.

A system developed by Google’s DeepMind has set a new record for AI performance on geometry problems. DeepMind’s AlphaGeometry managed to solve 25 of the 30 geometry problems drawn from the International Mathematical Olympiad between 2000 and 2022.

That puts the software ahead of the vast majority of young mathematicians and just shy of IMO gold medalists. DeepMind estimates that the average gold medalist would have solved 26 out of 30 problems. Many view the IMO as the world’s most prestigious math competition for high school students.

“Because language models excel at identifying general patterns and relationships in data, they can quickly predict potentially useful constructs, but often lack the ability to reason rigorously or explain their decisions,” DeepMind writes. To overcome this difficulty, DeepMind paired a language model with a more traditional symbolic deduction engine that performs algebraic and geometric reasoning.

The research was led by Trieu Trinh, a computer scientist who recently earned his PhD from New York University. He was a resident at DeepMind between 2021 and 2023.

Evan Chen, a former Olympiad gold medalist who evaluated some of AlphaGeometry’s output, praised it as “impressive because it’s both verifiable and clean.” Whereas some earlier software generated complex geometry proofs that were hard for human reviewers to understand, the output of AlphaGeometry is similar to what a human mathematician would write.

AlphaGeometry is part of DeepMind’s larger project to improve the reasoning capabilities of large language models by combining them with traditional search algorithms. DeepMind has published several papers in this area over the last year.

How AlphaGeometry works

Let’s start with a simple example shown in the AlphaGeometry paper, which was published by Nature on Wednesday:

The goal is to prove that if a triangle has two equal sides (AB and AC), then the angles opposite those sides will also be equal. We can do this by creating a new point D at the midpoint of the third side of the triangle (BC). It’s easy to show that all three sides of triangle ABD are the same length as the corresponding sides of triangle ACD. And two triangles with equal sides always have equal angles.

Geometry problems from the IMO are much more complex than this toy problem, but fundamentally, they have the same structure. They all start with a geometric figure and some facts about the figure like “side AB is the same length as side AC.” The goal is to generate a sequence of valid inferences that conclude with a given statement like “angle ABC is equal to angle BCA.”

For many years, we’ve had software that can generate lists of valid conclusions that can be drawn from a set of starting assumptions. Simple geometry problems can be solved by “brute force”: mechanically listing every possible fact that can be inferred from the given assumption, then listing every possible inference from those facts, and so on until you reach the desired conclusion.

But this kind of brute-force search isn’t feasible for an IMO-level geometry problem because the search space is too large. Not only do harder problems require longer proofs, but sophisticated proofs often require the introduction of new elements to the initial figure—as with point D in the above proof. Once you allow for these kinds of “auxiliary points,” the space of possible proofs explodes and brute-force methods become impractical.

DeepMind AI rivals the world’s smartest high schoolers at geometry Read More »