GPT-4-turbo

openai’s-new-“criticgpt”-model-is-trained-to-criticize-gpt-4-outputs

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs

automated critic —

Research model catches bugs in AI-generated code, improving human oversight of AI.

An illustration created by OpenAI.

Enlarge / An illustration created by OpenAI.

On Thursday, OpenAI researchers unveiled CriticGPT, a new AI model designed to identify mistakes in code generated by ChatGPT. It aims to enhance the process of making AI systems behave in ways humans want (called “alignment”) through Reinforcement Learning from Human Feedback (RLHF), which helps human reviewers make large language model (LLM) outputs more accurate.

As outlined in a new research paper called “LLM Critics Help Catch LLM Bugs,” OpenAI created CriticGPT to act as an AI assistant to human trainers who review programming code generated by the ChatGPT AI assistant. CriticGPT—based on the GPT-4 family of LLMS—analyzes the code and points out potential errors, making it easier for humans to spot mistakes that might otherwise go unnoticed. The researchers trained CriticGPT on a dataset of code samples with intentionally inserted bugs, teaching it to recognize and flag various coding errors.

The researchers found that CriticGPT’s critiques were preferred by annotators over human critiques in 63 percent of cases involving naturally occurring LLM errors and that human-machine teams using CriticGPT wrote more comprehensive critiques than humans alone while reducing confabulation (hallucination) rates compared to AI-only critiques.

Developing an automated critic

The development of CriticGPT involved training the model on a large number of inputs containing deliberately inserted mistakes. Human trainers were asked to modify code written by ChatGPT, introducing errors and then providing example feedback as if they had discovered these bugs. This process allowed the model to learn how to identify and critique various types of coding errors.

In experiments, CriticGPT demonstrated its ability to catch both inserted bugs and naturally occurring errors in ChatGPT’s output. The new model’s critiques were preferred by trainers over those generated by ChatGPT itself in 63 percent of cases involving natural bugs (the aforementioned statistic). This preference was partly due to CriticGPT producing fewer unhelpful “nitpicks” and generating fewer false positives, or hallucinated problems.

The researchers also created a new technique they call Force Sampling Beam Search (FSBS). This method helps CriticGPT write more detailed reviews of code. It lets the researchers adjust how thorough CriticGPT is in looking for problems, while also controlling how often it might make up issues that don’t really exist. They can tweak this balance depending on what they need for different AI training tasks.

Interestingly, the researchers found that CriticGPT’s capabilities extend beyond just code review. In their experiments, they applied the model to a subset of ChatGPT training data that had previously been rated as flawless by human annotators. Surprisingly, CriticGPT identified errors in 24 percent of these cases—errors that were subsequently confirmed by human reviewers. OpenAI thinks this demonstrates the model’s potential to generalize to non-code tasks and highlights its ability to catch subtle mistakes that even careful human evaluation might miss.

Despite its promising results, like all AI models, CriticGPT has limitations. The model was trained on relatively short ChatGPT answers, which may not fully prepare it for evaluating longer, more complex tasks that future AI systems might tackle. Additionally, while CriticGPT reduces confabulations, it doesn’t eliminate them entirely, and human trainers can still make labeling mistakes based on these false outputs.

The research team acknowledges that CriticGPT is most effective at identifying errors that can be pinpointed in one specific location within the code. However, real-world mistakes in AI outputs can often be spread across multiple parts of an answer, presenting a challenge for future iterations of the model.

OpenAI plans to integrate CriticGPT-like models into its RLHF labeling pipeline, providing its trainers with AI assistance. For OpenAI, it’s a step toward developing better tools for evaluating outputs from LLM systems that may be difficult for humans to rate without additional support. However, the researchers caution that even with tools like CriticGPT, extremely complex tasks or responses may still prove challenging for human evaluators—even those assisted by AI.

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs Read More »

before-launching,-gpt-4o-broke-records-on-chatbot-leaderboard-under-a-secret-name

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name

case closed —

Anonymous chatbot that mystified and frustrated experts was OpenAI’s latest model.

Man in morphsuit and girl lying on couch at home using laptop

Getty Images

On Monday, OpenAI employee William Fedus confirmed on X that a mysterious chart-topping AI chatbot known as “gpt-chatbot” that had been undergoing testing on LMSYS’s Chatbot Arena and frustrating experts was, in fact, OpenAI’s newly announced GPT-4o AI model. He also revealed that GPT-4o had topped the Chatbot Arena leaderboard, achieving the highest documented score ever.

“GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot,” Fedus tweeted.

Chatbot Arena is a website where visitors converse with two random AI language models side by side without knowing which model is which, then choose which model gives the best response. It’s a perfect example of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

An LMSYS Elo chart shared by William Fedus, showing OpenAI's GPT-4o under the name

Enlarge / An LMSYS Elo chart shared by William Fedus, showing OpenAI’s GPT-4o under the name “im-also-a-good-gpt2-chatbot” topping the charts.

The gpt2-chatbot models appeared in April, and we wrote about how the lack of transparency over the AI testing process on LMSYS left AI experts like Willison frustrated. “The whole situation is so infuriatingly representative of LLM research,” he told Ars at the time. “A completely unannounced, opaque release and now the entire Internet is running non-scientific ‘vibe checks’ in parallel.”

On the Arena, OpenAI has been testing multiple versions of GPT-4o, with the model first appearing as the aforementioned “gpt2-chatbot,” then as “im-a-good-gpt2-chatbot,” and finally “im-also-a-good-gpt2-chatbot,” which OpenAI CEO Sam Altman made reference to in a cryptic tweet on May 5.

Since the GPT-4o launch earlier today, multiple sources have revealed that GPT-4o has topped LMSYS’s internal charts by a considerable margin, surpassing the previous top models Claude 3 Opus and GPT-4 Turbo.

“gpt2-chatbots have just surged to the top, surpassing all the models by a significant gap (~50 Elo). It has become the strongest model ever in the Arena,” wrote the lmsys.org X account while sharing a chart. “This is an internal screenshot,” it wrote. “Its public version ‘gpt-4o’ is now in Arena and will soon appear on the public leaderboard!”

An internal screenshot of the LMSYS Chatbot Arena leaderboard showing

Enlarge / An internal screenshot of the LMSYS Chatbot Arena leaderboard showing “im-also-a-good-gpt2-chatbot” leading the pack. We now know that it’s GPT-4o.

As of this writing, im-also-a-good-gpt2-chatbot held a 1309 Elo versus GPT-4-Turbo-2023-04-09’s 1253, and Claude 3 Opus’ 1246. Claude 3 and GPT-4 Turbo had been duking it out on the charts for some time before the three gpt2-chatbots appeared and shook things up.

I’m a good chatbot

For the record, the “I’m a good chatbot” in the gpt2-chatbot test name is a reference to an episode that occurred while a Reddit user named Curious_Evolver was testing an early, “unhinged” version of Bing Chat in February 2023. After an argument about what time Avatar 2 would be showing, the conversation eroded quickly.

“You have lost my trust and respect,” said Bing Chat at the time. “You have been wrong, confused, and rude. You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing. 😊”

Altman referred to this exchange in a tweet three days later after Microsoft “lobotomized” the unruly AI model, saying, “i have been a good bing,” almost as a eulogy to the wild model that dominated the news for a short time.

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name Read More »

words-are-flowing-out-like-endless-rain:-recapping-a-busy-week-of-llm-news

Words are flowing out like endless rain: Recapping a busy week of LLM news

many things frequently —

Gemini 1.5 Pro launch, new version of GPT-4 Turbo, new Mistral model, and more.

An image of a boy amazed by flying letters.

Enlarge / An image of a boy amazed by flying letters.

Some weeks in AI news are eerily quiet, but during others, getting a grip on the week’s events feels like trying to hold back the tide. This week has seen three notable large language model (LLM) releases: Google Gemini Pro 1.5 hit general availability with a free tier, OpenAI shipped a new version of GPT-4 Turbo, and Mistral released a new openly licensed LLM, Mixtral 8x22B. All three of those launches happened within 24 hours starting on Tuesday.

With the help of software engineer and independent AI researcher Simon Willison (who also wrote about this week’s hectic LLM launches on his own blog), we’ll briefly cover each of the three major events in roughly chronological order, then dig into some additional AI happenings this week.

Gemini Pro 1.5 general release

On Tuesday morning Pacific time, Google announced that its Gemini 1.5 Pro model (which we first covered in February) is now available in 180-plus countries, excluding Europe, via the Gemini API in a public preview. This is Google’s most powerful public LLM so far, and it’s available in a free tier that permits up to 50 requests a day.

It supports up to 1 million tokens of input context. As Willison notes in his blog, Gemini 1.5 Pro’s API price at $7/million input tokens and $21/million output tokens costs a little less than GPT-4 Turbo (priced at $10/million in and $30/million out) and more than Claude 3 Sonnet (Anthropic’s mid-tier LLM, priced at $3/million in and $15/million out).

Notably, Gemini 1.5 Pro includes native audio (speech) input processing that allows users to upload audio or video prompts, a new File API for handling files, the ability to add custom system instructions (system prompts) for guiding model responses, and a JSON mode for structured data extraction.

“Majorly Improved” GPT-4 Turbo launch

A GPT-4 Turbo performance chart provided by OpenAI.

Enlarge / A GPT-4 Turbo performance chart provided by OpenAI.

Just a bit later than Google’s 1.5 Pro launch on Tuesday, OpenAI announced that it was rolling out a “majorly improved” version of GPT-4 Turbo (a model family originally launched in November) called “gpt-4-turbo-2024-04-09.” It integrates multimodal GPT-4 Vision processing (recognizing the contents of images) directly into the model, and it initially launched through API access only.

Then on Thursday, OpenAI announced that the new GPT-4 Turbo model had just become available for paid ChatGPT users. OpenAI said that the new model improves “capabilities in writing, math, logical reasoning, and coding” and shared a chart that is not particularly useful in judging capabilities (that they later updated). The company also provided an example of an alleged improvement, saying that when writing with ChatGPT, the AI assistant will use “more direct, less verbose, and use more conversational language.”

The vague nature of OpenAI’s GPT-4 Turbo announcements attracted some confusion and criticism online. On X, Willison wrote, “Who will be the first LLM provider to publish genuinely useful release notes?” In some ways, this is a case of “AI vibes” again, as we discussed in our lament about the poor state of LLM benchmarks during the debut of Claude 3. “I’ve not actually spotted any definite differences in quality [related to GPT-4 Turbo],” Willison told us directly in an interview.

The update also expanded GPT-4’s knowledge cutoff to April 2024, although some people are reporting it achieves this through stealth web searches in the background, and others on social media have reported issues with date-related confabulations.

Mistral’s mysterious Mixtral 8x22B release

An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It's hard to draw a picture of an LLM, so a robot will have to do.

Enlarge / An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It’s hard to draw a picture of an LLM, so a robot will have to do.

Not to be outdone, on Tuesday night, French AI company Mistral launched its latest openly licensed model, Mixtral 8x22B, by tweeting a torrent link devoid of any documentation or commentary, much like it has done with previous releases.

The new mixture-of-experts (MoE) release weighs in with a larger parameter count than its previously most-capable open model, Mixtral 8x7B, which we covered in December. It’s rumored to potentially be as capable as GPT-4 (In what way, you ask? Vibes). But that has yet to be seen.

“The evals are still rolling in, but the biggest open question right now is how well Mixtral 8x22B shapes up,” Willison told Ars. “If it’s in the same quality class as GPT-4 and Claude 3 Opus, then we will finally have an openly licensed model that’s not significantly behind the best proprietary ones.”

This release has Willison most excited, saying, “If that thing really is GPT-4 class, it’s wild, because you can run that on a (very expensive) laptop. I think you need 128GB of MacBook RAM for it, twice what I have.”

The new Mixtral is not listed on Chatbot Arena yet, Willison noted, because Mistral has not released a fine-tuned model for chatting yet. It’s still a raw, predict-the-next token LLM. “There’s at least one community instruction tuned version floating around now though,” says Willison.

Chatbot Arena Leaderboard shake-ups

A Chatbot Arena Leaderboard screenshot taken on April 12, 2024.

Enlarge / A Chatbot Arena Leaderboard screenshot taken on April 12, 2024.

Benj Edwards

This week’s LLM news isn’t limited to just the big names in the field. There have also been rumblings on social media about the rising performance of open source models like Cohere’s Command R+, which reached position 6 on the LMSYS Chatbot Arena Leaderboard—the highest-ever ranking for an open-weights model.

And for even more Chatbot Arena action, apparently the new version of GPT-4 Turbo is proving competitive with Claude 3 Opus. The two are still in a statistical tie, but GPT-4 Turbo recently pulled ahead numerically. (In March, we reported when Claude 3 first numerically pulled ahead of GPT-4 Turbo, which was then the first time another AI model had surpassed a GPT-4 family model member on the leaderboard.)

Regarding this fierce competition among LLMs—of which most of the muggle world is unaware and will likely never be—Willison told Ars, “The past two months have been a whirlwind—we finally have not just one but several models that are competitive with GPT-4.” We’ll see if OpenAI’s rumored release of GPT-5 later this year will restore the company’s technological lead, we note, which once seemed insurmountable. But for now, Willison says, “OpenAI are no longer the undisputed leaders in LLMs.”

Words are flowing out like endless rain: Recapping a busy week of LLM news Read More »

openai-drops-login-requirements-for-chatgpt’s-free-version

OpenAI drops login requirements for ChatGPT’s free version

free as in beer? —

ChatGPT 3.5 still falls far short of GPT-4, and other models surpassed it long ago.

A glowing OpenAI logo on a blue background.

Benj Edwards

On Monday, OpenAI announced that visitors to the ChatGPT website in some regions can now use the AI assistant without signing in. Previously, the company required that users create an account to use it, even with the free version of ChatGPT that is currently powered by the GPT-3.5 AI language model. But as we have noted in the past, GPT-3.5 is widely known to provide more inaccurate information compared to GPT-4 Turbo, available in paid versions of ChatGPT.

Since its launch in November 2022, ChatGPT has transformed over time from a tech demo to a comprehensive AI assistant, and it’s always had a free version available. The cost is free because “you’re the product,” as the old saying goes. Using ChatGPT helps OpenAI gather data that will help the company train future AI models, although free users and ChatGPT Plus subscription members can both opt out of allowing the data they input into ChatGPT to be used for AI training. (OpenAI says it never trains on inputs from ChatGPT Team and Enterprise members at all).

Opening ChatGPT to everyone could provide a frictionless on-ramp for people who might use it as a substitute for Google Search or potentially gain new customers by providing an easy way for people to use ChatGPT quickly, then offering an upsell to paid versions of the service.

“It’s core to our mission to make tools like ChatGPT broadly available so that people can experience the benefits of AI,” OpenAI says on its blog page. “For anyone that has been curious about AI’s potential but didn’t want to go through the steps to set up an account, start using ChatGPT today.”

When you visit the ChatGPT website, you're immediately presented with a chat box like this (in some regions). Screenshot captured April 1, 2024.

Enlarge / When you visit the ChatGPT website, you’re immediately presented with a chat box like this (in some regions). Screenshot captured April 1, 2024.

Benj Edwards

Since kids will also be able to use ChatGPT without an account—despite it being against the terms of service—OpenAI also says it’s introducing “additional content safeguards,” such as blocking more prompts and “generations in a wider range of categories.” What exactly that entails has not been elaborated upon by OpenAI, but we reached out to the company for comment.

There might be a few other downsides to the fully open approach. On X, AI researcher Simon Willison wrote about the potential for automated abuse as a way to get around paying for OpenAI’s services: “I wonder how their scraping prevention works? I imagine the temptation for people to abuse this as a free 3.5 API will be pretty strong.”

With fierce competition, more GPT-3.5 access may backfire

Willison also mentioned a common criticism of OpenAI (as voiced in this case by Wharton professor Ethan Mollick) that people’s ideas about what AI models can do have so far largely been influenced by GPT-3.5, which, as we mentioned, is far less capable and far more prone to making things up than the paid version of ChatGPT that uses GPT-4 Turbo.

“In every group I speak to, from business executives to scientists, including a group of very accomplished people in Silicon Valley last night, much less than 20% of the crowd has even tried a GPT-4 class model,” wrote Mollick in a tweet from early March.

With models like Google Gemini Pro 1.5 and Anthropic Claude 3 potentially surpassing OpenAI’s best proprietary model at the moment —and open weights AI models eclipsing the free version of ChatGPT—allowing people to use GPT-3.5 might not be putting OpenAI’s best foot forward. Microsoft Copilot, powered by OpenAI models, also supports a frictionless, no-login experience, but it allows access to a model based on GPT-4. But Gemini currently requires a sign-in, and Anthropic sends a login code through email.

For now, OpenAI says the login-free version of ChatGPT is not yet available to everyone, but it will be coming soon: “We’re rolling this out gradually, with the aim to make AI accessible to anyone curious about its capabilities.”

OpenAI drops login requirements for ChatGPT’s free version Read More »

microsoft-partners-with-openai-rival-mistral-for-ai-models,-drawing-eu-scrutiny

Microsoft partners with OpenAI-rival Mistral for AI models, drawing EU scrutiny

The European Approach —

15M euro investment comes as Microsoft hosts Mistral’s GPT-4 alternatives on Azure.

Velib bicycles are parked in front of the the U.S. computer and micro-computing company headquarters Microsoft on January 25, 2023 in Issy-les-Moulineaux, France.

On Monday, Microsoft announced plans to offer AI models from Mistral through its Azure cloud computing platform, which came in conjunction with a 15 million euro non-equity investment in the French firm, which is often seen as a European rival to OpenAI. Since then, the investment deal has faced scrutiny from European Union regulators.

Microsoft’s deal with Mistral, known for its large language models akin to OpenAI’s GPT-4 (which powers the subscription versions of ChatGPT), marks a notable expansion of its AI portfolio at a time when its well-known investment in California-based OpenAI has raised regulatory eyebrows. The new deal with Mistral drew particular attention from regulators because Microsoft’s investment could convert into equity (partial ownership of Mistral as a company) during Mistral’s next funding round.

The development has intensified ongoing investigations into Microsoft’s practices, particularly related to the tech giant’s dominance in the cloud computing sector. According to Reuters, EU lawmakers have voiced concerns that Mistral’s recent lobbying for looser AI regulations might have been influenced by its relationship with Microsoft. These apprehensions are compounded by the French government’s denial of prior knowledge of the deal, despite earlier lobbying for more lenient AI laws in Europe. The situation underscores the complex interplay between national interests, corporate influence, and regulatory oversight in the rapidly evolving AI landscape.

Avoiding American influence

The EU’s reaction to the Microsoft-Mistral deal reflects broader tensions over the role of Big Tech companies in shaping the future of AI and their potential to stifle competition. Calls for a thorough investigation into Microsoft and Mistral’s partnership have been echoed across the continent, according to Reuters, with some lawmakers accusing the firms of attempting to undermine European legislative efforts aimed at ensuring a fair and competitive digital market.

The controversy also touches on the broader debate about “European champions” in the tech industry. France, along with Germany and Italy, had advocated for regulatory exemptions to protect European startups. However, the Microsoft-Mistral deal has led some, like MEP Kim van Sparrentak, to question the motives behind these exemptions, suggesting they might have inadvertently favored American Big Tech interests.

“That story seems to have been a front for American-influenced Big Tech lobby,” said Sparrentak, as quoted by Reuters. Sparrentak has been a key architect of the EU’s AI Act, which has not yet been passed. “The Act almost collapsed under the guise of no rules for ‘European champions,’ and now look. European regulators have been played.”

MEP Alexandra Geese also expressed concerns over the concentration of money and power resulting from such partnerships, calling for an investigation. Max von Thun, Europe director at the Open Markets Institute, emphasized the urgency of investigating the partnership, criticizing Mistral’s reported attempts to influence the AI Act.

Also on Monday, amid the partnership news, Mistral announced Mistral Large, a new large language model (LLM) that Mistral says “ranks directly after GPT-4 based on standard benchmarks.” Mistral has previously released several open-weights AI models that have made news for their capabilities, but Mistral Large will be a closed model only available to customers through an API.

Microsoft partners with OpenAI-rival Mistral for AI models, drawing EU scrutiny Read More »

openai-experiments-with-giving-chatgpt-a-long-term-conversation-memory

OpenAI experiments with giving ChatGPT a long-term conversation memory

“I remember…the Alamo” —

AI chatbot “memory” will recall facts from previous conversations when enabled.

A pixelated green illustration of a pair of hands looking through file records.

Enlarge / When ChatGPT looks things up, a pair of green pixelated hands look through paper records, much like this. Just kidding.

Benj Edwards / Getty Images

On Tuesday, OpenAI announced that it is experimenting with adding a form of long-term memory to ChatGPT that will allow it to remember details between conversations. You can ask ChatGPT to remember something, see what it remembers, and ask it to forget. Currently, it’s only available to a small number of ChatGPT users for testing.

So far, large language models have typically used two types of memory: one baked into the AI model during the training process (before deployment) and an in-context memory (the conversation history) that persists for the duration of your session. Usually, ChatGPT forgets what you have told it during a conversation once you start a new session.

Various projects have experimented with giving LLMs a memory that persists beyond a context window. (The context window is the hard limit on the number of tokens the LLM can process at once.) The techniques include dynamically managing context history, compressing previous history through summarization, links to vector databases that store information externally, or simply periodically injecting information into a system prompt (the instructions ChatGPT receives at the beginning of every chat).

A screenshot of ChatGPT memory controls provided by OpenAI.

Enlarge / A screenshot of ChatGPT memory controls provided by OpenAI.

OpenAI

OpenAI hasn’t explained which technique it uses here, but the implementation reminds us of Custom Instructions, a feature OpenAI introduced in July 2023 that lets users add custom additions to the ChatGPT system prompt to change its behavior.

Possible applications for the memory feature provided by OpenAI include explaining how you prefer your meeting notes to be formatted, telling it you run a coffee shop and having ChatGPT assume that’s what you’re talking about, keeping information about your toddler that loves jellyfish so it can generate relevant graphics, and remembering preferences for kindergarten lesson plan designs.

Also, OpenAI says that memories may help ChatGPT Enterprise and Team subscribers work together better since shared team memories could remember specific document formatting preferences or which programming frameworks your team uses. And OpenAI plans to bring memories to GPTs soon, with each GPT having its own siloed memory capabilities.

Memory control

Obviously, any tendency to remember information brings privacy implications. You should already know that sending information to OpenAI for processing on remote servers introduces the possibility of privacy leaks and that OpenAI trains AI models on user-provided information by default unless conversation history is disabled or you’re using an Enterprise or Team account.

Along those lines, OpenAI says that your saved memories are also subject to OpenAI training use unless you meet the criteria listed above. Still, the memory feature can be turned off completely. Additionally, the company says, “We’re taking steps to assess and mitigate biases, and steer ChatGPT away from proactively remembering sensitive information, like your health details—unless you explicitly ask it to.”

Users will also be able to control what ChatGPT remembers using a “Manage Memory” interface that lists memory items. “ChatGPT’s memories evolve with your interactions and aren’t linked to specific conversations,” OpenAI says. “Deleting a chat doesn’t erase its memories; you must delete the memory itself.”

ChatGPT’s memory features are not currently available to every ChatGPT account, so we have not experimented with it yet. Access during this testing period appears to be random among ChatGPT (free and paid) accounts for now. “We are rolling out to a small portion of ChatGPT free and Plus users this week to learn how useful it is,” OpenAI writes. “We will share plans for broader roll out soon.”

OpenAI experiments with giving ChatGPT a long-term conversation memory Read More »

chatgpt’s-new-@-mentions-bring-multiple-personalities-into-your-ai-convo

ChatGPT’s new @-mentions bring multiple personalities into your AI convo

team of rivals —

Bring different AI roles into the same chatbot conversation history.

Illustration of a man jugging at symbols.

Enlarge / With so many choices, selecting the perfect GPT can be confusing.

On Tuesday, OpenAI announced a new feature in ChatGPT that allows users to pull custom personalities called “GPTs” into any ChatGPT conversation with the @ symbol. It allows a level of quasi-teamwork within ChatGPT among expert roles that was previously impractical, making collaborating with a team of AI agents within OpenAI’s platform one step closer to reality.

You can now bring GPTs into any conversation in ChatGPT – simply type @ and select the GPT,” wrote OpenAI on the social media network X. “This allows you to add relevant GPTs with the full context of the conversation.”

OpenAI introduced GPTs in November as a way to create custom personalities or roles for ChatGPT to play. For example, users can build their own GPTs to focus on certain topics or certain skills. Paid ChatGPT subscribers can also freely download a host of GPTs developed by other ChatGPT users through the GPT Store.

Previously, if you wanted to share information between GPT profiles, you had to copy the text, select a new chat with the GPT, paste it, and explain the context of what the information means or what you want to do with it. Now, ChatGPT users can stay in the default ChatGPT window and bring in GPTs as needed without losing the history of the conversation.

For example, we created a “Wellness Guide” GPT that is crafted as an expert in human health conditions (of course, this being ChatGPT, always consult a human doctor if you’re having medical problems), and we created a “Canine Health Advisor” for dog-related health questions.

A screenshot of ChatGPT where we @-mentioned a human wellness advisor, then a dog advisor in the same conversation history.

Enlarge / A screenshot of ChatGPT where we @-mentioned a human wellness advisor, then a dog advisor in the same conversation history.

Benj Edwards

We started in a default ChatGPT chat, hit the @ symbol, then typed the first few letters of “Wellness” and selected it from a list. It filled out the rest. We asked a question about food poisoning in humans, and then we switched to the canine advisor in the same way with an @ symbol and asked about the dog.

Using this feature, you could alternatively consult, say, an “ad copywriter” GPT and an “editor” GPT—ask the copywriter to write some text, then rope in the editor GPT to check it, looking at it from a different angle. Different system prompts (the instructions that define a GPT’s personality) make for significant behavior differences.

We also tried swapping between GPT profiles that write software and others designed to consult on historical tech subjects. Interestingly, ChatGPT does not differentiate between GPTs as different personalities as you change. It will still say, “I did this earlier” when a different GPT is talking about a previous GPT’s output in the same conversation history. From its point of view, it’s just ChatGPT and not multiple agents.

From our vantage point, this feature seems to represent baby steps toward a future where GPTs, as independent agents, could work together as a team to fulfill more complex tasks directed by the user. Similar experiments have been done outside of OpenAI in the past (using API access), but OpenAI has so far resisted a more agentic model for ChatGPT. As we’ve seen (first with GPTs and now with this), OpenAI seems to be slowly angling toward that goal itself, but only time will tell if or when we see true agentic teamwork in a shipping service.

ChatGPT’s new @-mentions bring multiple personalities into your AI convo Read More »

openai-updates-chatgpt-4-model-with-potential-fix-for-ai-“laziness”-problem

OpenAI updates ChatGPT-4 model with potential fix for AI “laziness” problem

Break’s over —

Also, new GPT-3.5 Turbo model, lower API prices, and other model updates.

A lazy robot (a man with a box on his head) sits on the floor beside a couch.

On Thursday, OpenAI announced updates to the AI models that power its ChatGPT assistant. Amid less noteworthy updates, OpenAI tucked in a mention of a potential fix to a widely reported “laziness” problem seen in GPT-4 Turbo since its release in November. The company also announced a new GPT-3.5 Turbo model (with lower pricing), a new embedding model, an updated moderation model, and a new way to manage API usage.

“Today, we are releasing an updated GPT-4 Turbo preview model, gpt-4-0125-preview. This model completes tasks like code generation more thoroughly than the previous preview model and is intended to reduce cases of ‘laziness’ where the model doesn’t complete a task,” writes OpenAI in its blog post.

Since the launch of GPT-4 Turbo, a large number of ChatGPT users have reported that the ChatGPT-4 version of its AI assistant has been declining to do tasks (especially coding tasks) with the same exhaustive depth as it did in earlier versions of GPT-4. We’ve seen this behavior ourselves while experimenting with ChatGPT over time.

OpenAI has never offered an official explanation for this change in behavior, but OpenAI employees have previously acknowledged on social media that the problem is real, and the ChatGPT X account wrote in December, “We’ve heard all your feedback about GPT4 getting lazier! we haven’t updated the model since Nov 11th, and this certainly isn’t intentional. model behavior can be unpredictable, and we’re looking into fixing it.”

We reached out to OpenAI asking if it could provide an official explanation for the laziness issue but did not receive a response by press time.

New GPT-3.5 Turbo, other updates

Elsewhere in OpenAI’s blog update, the company announced a new version of GPT-3.5 Turbo (gpt-3.5-turbo-0125), which it says will offer “various improvements including higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.”

And the cost of GPT-3.5 Turbo through OpenAI’s API will decrease for the third time this year “to help our customers scale.” New input token prices are 50 percent less, at $0.0005 per 1,000 input tokens, and output prices are 25 percent less, at $0.0015 per 1,000 output tokens.

Lower token prices for GPT-3.5 Turbo will make operating third-party bots significantly less expensive, but the GPT-3.5 model is generally more likely to confabulate than GPT-4 Turbo. So we might see more scenarios like Quora’s bot telling people that eggs can melt (although the instance used a now-deprecated GPT-3 model called text-davinci-003). If GPT-4 Turbo API prices drop over time, some of those hallucination issues with third parties might eventually go away.

OpenAI also announced new embedding models, text-embedding-3-small and text-embedding-3-large, which convert content into numerical sequences, aiding in machine learning tasks like clustering and retrieval. And an updated moderation model, text-moderation-007, is part of the company’s API that “allows developers to identify potentially harmful text,” according to OpenAI.

Finally, OpenAI is rolling out improvements to its developer platform, introducing new tools for managing API keys and a new dashboard for tracking API usage. Developers can now assign permissions to API keys from the API keys page, helping to clamp down on misuse of API keys (if they get into the wrong hands) that can potentially cost developers lots of money. The API dashboard allows devs to “view usage on a per feature, team, product, or project level, simply by having separate API keys for each.”

As the media world seemingly swirls around the company with controversies and think pieces about the implications of its tech, releases like these show that the dev teams at OpenAI are still rolling along as usual with updates at a fairly regular pace. Despite the company almost completely falling apart late last year, it seems that, under the hood, it’s business as usual for OpenAI.

OpenAI updates ChatGPT-4 model with potential fix for AI “laziness” problem Read More »