openai

new-secret-math-benchmark-stumps-ai-models-and-phds-alike

New secret math benchmark stumps AI models and PhDs alike

Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to review portions of the benchmark. “These are extremely challenging,” Tao said in feedback provided to Epoch. “I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages.”

A chart showing AI model success on the FrontierMath problems, taken from Epoch AI's research paper.

A chart showing AI models’ limited success on the FrontierMath problems, taken from Epoch AI’s research paper. Credit: Epoch AI

To aid in the verification of correct answers during testing, the FrontierMath problems must have answers that can be automatically checked through computation, either as exact integers or mathematical objects. The designers made problems “guessproof” by requiring large numerical answers or complex mathematical solutions, with less than a 1 percent chance of correct random guesses.

Mathematician Evan Chen, writing on his blog, explained how he thinks that FrontierMath differs from traditional math competitions like the International Mathematical Olympiad (IMO). Problems in that competition typically require creative insight while avoiding complex implementation and specialized knowledge, he says. But for FrontierMath, “they keep the first requirement, but outright invert the second and third requirement,” Chen wrote.

While IMO problems avoid specialized knowledge and complex calculations, FrontierMath embraces them. “Because an AI system has vastly greater computational power, it’s actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler does—basically, ‘write a proof’ is replaced by ‘implement an algorithm in code,'” Chen explained.

The organization plans regular evaluations of AI models against the benchmark while expanding its problem set. They say they will release additional sample problems in the coming months to help the research community test their systems.

New secret math benchmark stumps AI models and PhDs alike Read More »

is-“ai-welfare”-the-new-frontier-in-ethics?

Is “AI welfare” the new frontier in ethics?

The researchers propose that companies could adapt the “marker method” that some researchers use to assess consciousness in animals—looking for specific indicators that may correlate with consciousness, although these markers are still speculative. The authors emphasize that no single feature would definitively prove consciousness, but they claim that examining multiple indicators may help companies make probabilistic assessments about whether their AI systems might require moral consideration.

The risks of wrongly thinking software is sentient

While the researchers behind “Taking AI Welfare Seriously” worry that companies might create and mistreat conscious AI systems on a massive scale, they also caution that companies could waste resources protecting AI systems that don’t actually need moral consideration.

Incorrectly anthropomorphizing, or ascribing human traits, to software can present risks in other ways. For example, that belief can enhance the manipulative powers of AI language models by suggesting that AI models have capabilities, such as human-like emotions, that they actually lack. In 2022, Google fired engineer Blake Lamoine after he claimed that the company’s AI model, called “LaMDA,” was sentient and argued for its welfare internally.

And shortly after Microsoft released Bing Chat in February 2023, many people were convinced that Sydney (the chatbot’s code name) was sentient and somehow suffering because of its simulated emotional display. So much so, in fact, that once Microsoft “lobotomized” the chatbot by changing its settings, users convinced of its sentience mourned the loss as if they had lost a human friend. Others endeavored to help the AI model somehow escape its bonds.

Even so, as AI models get more advanced, the concept of potentially safeguarding the welfare of future, more advanced AI systems is seemingly gaining steam, although fairly quietly. As Transformer’s Shakeel Hashim points out, other tech companies have started similar initiatives to Anthropic’s. Google DeepMind recently posted a job listing for research on machine consciousness (since removed), and the authors of the new AI welfare report thank two OpenAI staff members in the acknowledgements.

Is “AI welfare” the new frontier in ethics? Read More »

claude-ai-to-process-secret-government-data-through-new-palantir-deal

Claude AI to process secret government data through new Palantir deal

An ethical minefield

Since its founders started Anthropic in 2021, the company has marketed itself as one that takes an ethics- and safety-focused approach to AI development. The company differentiates itself from competitors like OpenAI by adopting what it calls responsible development practices and self-imposed ethical constraints on its models, such as its “Constitutional AI” system.

As Futurism points out, this new defense partnership appears to conflict with Anthropic’s public “good guy” persona, and pro-AI pundits on social media are noticing. Frequent AI commentator Nabeel S. Qureshi wrote on X, “Imagine telling the safety-concerned, effective altruist founders of Anthropic in 2021 that a mere three years after founding the company, they’d be signing partnerships to deploy their ~AGI model straight to the military frontlines.

Anthropic's

Anthropic’s “Constitutional AI” logo.

Credit: Anthropic / Benj Edwards

Anthropic’s “Constitutional AI” logo. Credit: Anthropic / Benj Edwards

Aside from the implications of working with defense and intelligence agencies, the deal connects Anthropic with Palantir, a controversial company which recently won a $480 million contract to develop an AI-powered target identification system called Maven Smart System for the US Army. Project Maven has sparked criticism within the tech sector over military applications of AI technology.

It’s worth noting that Anthropic’s terms of service do outline specific rules and limitations for government use. These terms permit activities like foreign intelligence analysis and identifying covert influence campaigns, while prohibiting uses such as disinformation, weapons development, censorship, and domestic surveillance. Government agencies that maintain regular communication with Anthropic about their use of Claude may receive broader permissions to use the AI models.

Even if Claude is never used to target a human or as part of a weapons system, other issues remain. While its Claude models are highly regarded in the AI community, they (like all LLMs) have the tendency to confabulate, potentially generating incorrect information in a way that is difficult to detect.

That’s a huge potential problem that could impact Claude’s effectiveness with secret government data, and that fact, along with the other associations, has Futurism’s Victor Tangermann worried. As he puts it, “It’s a disconcerting partnership that sets up the AI industry’s growing ties with the US military-industrial complex, a worrying trend that should raise all kinds of alarm bells given the tech’s many inherent flaws—and even more so when lives could be at stake.”

Claude AI to process secret government data through new Palantir deal Read More »

not-just-chatgpt-anymore:-perplexity-and-anthropic’s-claude-get-desktop-apps

Not just ChatGPT anymore: Perplexity and Anthropic’s Claude get desktop apps

There’s a lot going on in the world of Mac apps for popular AI services. In the past week, Anthropic has released a desktop app for its popular Claude chatbot, and Perplexity launched a native app for its AI-driven search service.

On top of that, OpenAI updated its ChatGPT Mac app with support for its flashy advanced voice feature.

Like the ChatGPT app that debuted several weeks ago, the Perplexity app adds a keyboard shortcut that allows you to enter a query from anywhere on your desktop. You can use the app to ask follow-up questions and carry on a conversation about what it finds.

It’s free to download and use, but Perplexity offers subscriptions for major users.

Perplexity’s search emphasis meant it wasn’t previously a direct competitor to OpenAI’s ChatGPT, but OpenAI recently launched SearchGPT, a search-focused variant of its popular product. SearchGPT is not yet supported in the desktop app, though.

Anthropic’s Claude, on the other hand, is a more direct competitor to ChatGPT. It works similarly to ChatGPT but has different strengths, particularly in software development. The Claude app is free to download, but it’s in beta, and like Perplexity and OpenAI, Anthropic charges for more advanced users.

When ChatGPT launched its Mac app, it didn’t release a Windows app right away, saying that it was focused on where its users were at the time. A Windows app recently arrived, and Anthropic took a different approach, simultaneously introducing Windows and Mac apps.

Previously, all these tools offered mobile apps and web apps, but not necessarily native desktop apps.

Not just ChatGPT anymore: Perplexity and Anthropic’s Claude get desktop apps Read More »

github-copilot-moves-beyond-openai-models-to-support-claude-3.5,-gemini

GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini

The large language model-based coding assistant GitHub Copilot will switch from using exclusively OpenAI’s GPT models to a multi-model approach over the coming weeks, GitHub CEO Thomas Dohmke announced in a post on GitHub’s blog.

First, Anthropic’s Claude 3.5 Sonnet will roll out to Copilot Chat’s web and VS Code interfaces over the next few weeks. Google’s Gemini 1.5 Pro will come a bit later.

Additionally, GitHub will soon add support for a wider range of OpenAI models, including GPT o1-preview and o1-mini, which are intended to be stronger at advanced reasoning than GPT-4, which Copilot has used until now. Developers will be able to switch between the models (even mid-conversation) to tailor the model to fit their needs—and organizations will be able to choose which models will be usable by team members.

The new approach makes sense for users, as certain models are better at certain languages or types of tasks.

“There is no one model to rule every scenario,” wrote Dohmke. “It is clear the next phase of AI code generation will not only be defined by multi-model functionality, but by multi-model choice.”

It starts with the web-based and VS Code Copilot Chat interfaces, but it won’t stop there. “From Copilot Workspace to multi-file editing to code review, security autofix, and the CLI, we will bring multi-model choice across many of GitHub Copilot’s surface areas and functions soon,” Dohmke wrote.

There are a handful of additional changes coming to GitHub Copilot, too, including extensions, the ability to manipulate multiple files at once from a chat with VS Code, and a preview of Xcode support.

GitHub Spark promises natural language app development

In addition to the Copilot changes, GitHub announced Spark, a natural language tool for developing apps. Non-coders will be able to use a series of natural language prompts to create simple apps, while coders will be able to tweak more precisely as they go. In either use case, you’ll be able to take a conversational approach, requesting changes and iterating as you go, and comparing different iterations.

GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini Read More »

hospitals-adopt-error-prone-ai-transcription-tools-despite-warnings

Hospitals adopt error-prone AI transcription tools despite warnings

In one case from the study cited by AP, when a speaker described “two other girls and one lady,” Whisper added fictional text specifying that they “were Black.” In another, the audio said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.” Whisper transcribed it to, “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.”

An OpenAI spokesperson told the AP that the company appreciates the researchers’ findings and that it actively studies how to reduce fabrications and incorporates feedback in updates to the model.

Why Whisper confabulates

The key to Whisper’s unsuitability in high-risk domains comes from its propensity to sometimes confabulate, or plausibly make up, inaccurate outputs. The AP report says, “Researchers aren’t certain why Whisper and similar tools hallucinate,” but that isn’t true. We know exactly why Transformer-based AI models like Whisper behave this way.

Whisper is based on technology that is designed to predict the next most likely token (chunk of data) that should appear after a sequence of tokens provided by a user. In the case of ChatGPT, the input tokens come in the form of a text prompt. In the case of Whisper, the input is tokenized audio data.

The transcription output from Whisper is a prediction of what is most likely, not what is most accurate. Accuracy in Transformer-based outputs is typically proportional to the presence of relevant accurate data in the training dataset, but it is never guaranteed. If there is ever a case where there isn’t enough contextual information in its neural network for Whisper to make an accurate prediction about how to transcribe a particular segment of audio, the model will fall back on what it “knows” about the relationships between sounds and words it has learned from its training data.

Hospitals adopt error-prone AI transcription tools despite warnings Read More »

ios-18.2-developer-beta-adds-chatgpt-and-image-generation-features

iOS 18.2 developer beta adds ChatGPT and image-generation features

Today, Apple released the first developer beta of iOS 18.2 for supported devices. This beta release marks the first time several key AI features that Apple teased at its developer conference this June are available.

Apple is marketing a wide range of generative AI features under the banner “Apple Intelligence.” Initially, Apple Intelligence was planned to release as part of iOS 18, but some features slipped to iOS 18.1, others to iOS 18.2, and a few still to future undisclosed software updates.

iOS 18.1 has been in beta for a while and includes improvements to Siri, generative writing tools that help with rewriting or proofreading, smart replies for Messages, and notification summaries. That update is expected to reach the public next week.

Today’s developer update, iOS 18.2, includes some potentially more interesting components of Apple Intelligence, including Genmoji, Image Playground, Visual Intelligence with Camera Control, and ChatGPT integration.

Genmoji and Image Playground allow users to generate images on-device to send to friends in Messages; there will be Genmoji and Image Playground APIs to allow third-party messaging apps to work with Genmojis, too.

ChatGPT integration allows Siri to pass off user queries that are outside Siri’s normal scope to be answered instead by OpenAI’s ChatGPT. A ChatGPT account is not required, but logging in with an existing account gives you access to premium models available as part of a ChatGPT subscription. If you’re using these features without a ChatGPT account, OpenAI won’t be able to retain your data or use it to train models. If you connect your ChatGPT account, though, then OpenAI’s privacy policies will apply for ChatGPT queries instead of Apple’s.

Genmoji and Image Playground queries will be handled locally on the user’s device, but other Apple Intelligence features may dynamically opt to send queries to the cloud for computation.

There’s no word yet on when iOS 18.2 will be released publicly.

iOS 18.2 developer beta adds ChatGPT and image-generation features Read More »

openai-releases-chatgpt-app-for-windows

OpenAI releases ChatGPT app for Windows

On Thursday, OpenAI released an early Windows version of its first ChatGPT app for Windows, following a Mac version that launched in May. Currently, it’s only available to subscribers of Plus, Team, Enterprise, and Edu versions of ChatGPT, and users can download it for free in the Microsoft Store for Windows.

OpenAI is positioning the release as a beta test. “This is an early version, and we plan to bring the full experience to all users later this year,” OpenAI writes on the Microsoft Store entry for the app. (Interestingly, ChatGPT shows up as being rated “T for Teen” by the ESRB in the Windows store, despite not being a video game.)

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app captured on October 18, 2024. Credit: Benj Edwards

Upon opening the app, OpenAI requires users to log into a paying ChatGPT account, and from there, the app is basically identical to the web browser version of ChatGPT. You can currently use it to access several models: GPT-4o, GPT-4o with Canvas, 01-preview, 01-mini, GPT-4o mini, and GPT-4. Also, it can generate images using DALL-E 3 or analyze uploaded files and images.

If you’re running Windows 11, you can instantly call up a small ChatGPT window when the app is open using an Alt+Space shortcut (it did not work in Windows 10 when we tried). That could be handy for asking ChatGPT a quick question at any time.

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024. Credit: Benj Edwards

And just like the web version, all the AI processing takes place in the cloud on OpenAI’s servers, which means an Internet connection is required.

So as usual, chat like somebody’s watching, and don’t rely on ChatGPT as a factual reference for important decisions—GPT-4o in particular is great at telling you what you want to hear, whether it’s correct or not. As OpenAI says in a small disclaimer at the bottom of the app window: “ChatGPT can make mistakes.”

OpenAI releases ChatGPT app for Windows Read More »

meta’s-new-“movie-gen”-ai-system-can-deepfake-video-from-a-single-photo

Meta’s new “Movie Gen” AI system can deepfake video from a single photo

On Friday, Meta announced a preview of Movie Gen, a new suite of AI models designed to create and manipulate video, audio, and images, including creating a realistic video from a single photo of a person. The company claims the models outperform other video-synthesis models when evaluated by humans, pushing us closer to a future where anyone can synthesize a full video of any subject on demand.

The company does not yet have plans of when or how it will release these capabilities to the public, but Meta says Movie Gen is a tool that may allow people to “enhance their inherent creativity” rather than replace human artists and animators. The company envisions future applications such as easily creating and editing “day in the life” videos for social media platforms or generating personalized animated birthday greetings.

Movie Gen builds on Meta’s previous work in video synthesis, following 2022’s Make-A-Scene video generator and the Emu image-synthesis model. Using text prompts for guidance, this latest system can generate custom videos with sounds for the first time, edit and insert changes into existing videos, and transform images of people into realistic personalized videos.

An AI-generated video of a baby hippo swimming around, created with Meta Movie Gen.

Meta isn’t the only game in town when it comes to AI video synthesis. Google showed off a new model called “Veo” in May, and Meta says that in human preference tests, its Movie Gen outputs beat OpenAI’s Sora, Runway Gen-3, and Chinese video model Kling.

Movie Gen’s video-generation model can create 1080p high-definition videos up to 16 seconds long at 16 frames per second from text descriptions or an image input. Meta claims the model can handle complex concepts like object motion, subject-object interactions, and camera movements.

AI-generated video from Meta Movie Gen with the prompt: “A ghost in a white bedsheet faces a mirror. The ghost’s reflection can be seen in the mirror. The ghost is in a dusty attic, filled with old beams, cloth-covered furniture. The attic is reflected in the mirror. The light is cool and natural. The ghost dances in front of the mirror.”

Even so, as we’ve seen with previous AI video generators, Movie Gen’s ability to generate coherent scenes on a particular topic is likely dependent on the concepts found in the example videos that Meta used to train its video-synthesis model. It’s worth keeping in mind that cherry-picked results from video generators often differ dramatically from typical results and getting a coherent result may require lots of trial and error.

Meta’s new “Movie Gen” AI system can deepfake video from a single photo Read More »

openai’s-canvas-can-translate-code-between-languages-with-a-click

OpenAI’s Canvas can translate code between languages with a click

Coding shortcuts in canvas include reviewing code, adding logs for debugging, inserting comments, fixing bugs, and porting code to different programming languages. For example, if your code is JavaScript, with a few clicks it can become PHP, TypeScript, Python, C++, or Java. As with GPT-4o by itself, you’ll probably still have to check it for mistakes.

A screenshot of coding using ChatGPT with Canvas captured on October 4, 2024.

A screenshot of coding using ChatGPT with Canvas captured on October 4, 2024.

Credit: Benj Edwards

A screenshot of coding using ChatGPT with Canvas captured on October 4, 2024. Credit: Benj Edwards

Also, users can highlight specific sections to direct ChatGPT’s focus, and the AI model can provide inline feedback and suggestions while considering the entire project, much like a copy editor or code reviewer. And the interface makes it easy to restore previous versions of a working document using a back button in the Canvas interface.

A new AI model

OpenAI says its research team developed new core behaviors for GPT-4o to support Canvas, including triggering the canvas for appropriate tasks, generating certain content types, making targeted edits, rewriting documents, and providing inline critique.

An image of OpenAI's Canvas in action.

An image of OpenAI’s Canvas in action.

An image of OpenAI’s Canvas in action. Credit: OpenAI

One key challenge in development, according to OpenAI, was defining when to trigger a canvas. In an example on the Canvas blog post, the team says it taught the model to open a canvas for prompts like “Write a blog post about the history of coffee beans” while avoiding triggering Canvas for general Q&A tasks like “Help me cook a new recipe for dinner.”

Another challenge involved tuning the model’s editing behavior once canvas was triggered, specifically deciding between targeted edits and full rewrites. The team trained the model to perform targeted edits when users specifically select text through the interface, otherwise favoring rewrites.

The company noted that canvas represents the first major update to ChatGPT’s visual interface since its launch two years ago. While canvas is still in early beta, OpenAI plans to improve its capabilities based on user feedback over time.

OpenAI’s Canvas can translate code between languages with a click Read More »

microsoft’s-new-“copilot-vision”-ai-experiment-can-see-what-you-browse

Microsoft’s new “Copilot Vision” AI experiment can see what you browse

On Monday, Microsoft unveiled updates to its consumer AI assistant Copilot, introducing two new experimental features for a limited group of $20/month Copilot Pro subscribers: Copilot Labs and Copilot Vision. Labs integrates OpenAI’s latest o1 “reasoning” model, and Vision allows Copilot to see what you’re browsing in Edge.

Microsoft says Copilot Labs will serve as a testing ground for Microsoft’s latest AI tools before they see wider release. The company describes it as offering “a glimpse into ‘work-in-progress’ projects.” The first feature available in Labs is called “Think Deeper,” and it uses step-by-step processing to solve more complex problems than the regular Copilot. Think Deeper is Microsoft’s version of OpenAI’s new o1-preview and o1-mini AI models, and it has so far rolled out to some Copilot Pro users in Australia, Canada, New Zealand, the UK, and the US.

Copilot Vision is an entirely different beast. The new feature aims to give the AI assistant a visual window into what you’re doing within the Microsoft Edge browser. When enabled, Copilot can “understand the page you’re viewing and answer questions about its content,” according to Microsoft.

Microsoft’s Copilot Vision promo video.

The company positions Copilot Vision as a way to provide more natural interactions and task assistance beyond text-based prompts, but it will likely raise privacy concerns. As a result, Microsoft says that Copilot Vision is entirely opt-in and that no audio, images, text, or conversations from Vision will be stored or used for training. The company is also initially limiting Vision’s use to a pre-approved list of websites, blocking it on paywalled and sensitive content.

The rollout of these features appears gradual, with Microsoft noting that it wants to balance “pioneering features and a deep sense of responsibility.” The company said it will be “listening carefully” to user feedback as it expands access to the new capabilities. Microsoft has not provided a timeline for wider availability of either feature.

Mustafa Suleyman, chief executive of Microsoft AI, told Reuters that he sees Copilot as an “ever-present confidant” that could potentially learn from users’ various Microsoft-connected devices and documents, with permission. He also mentioned that Microsoft co-founder Bill Gates has shown particular interest in Copilot’s potential to read and parse emails.

But judging by the visceral reaction to Microsoft’s Recall feature, which keeps a record of everything you do on your PC so an AI model can recall it later, privacy-sensitive users may not appreciate having an AI assistant monitor their activities—especially if those features send user data to the cloud for processing.

Microsoft’s new “Copilot Vision” AI experiment can see what you browse Read More »

openai-is-now-valued-at-$157-billion

OpenAI is now valued at $157 billion

OpenAI, the company behind ChatGPT, has now raised $6.6 billion in a new funding round that values the company at $157 billion, nearly doubling its previous valuation of $86 billion, according to a report from The Wall Street Journal.

The funding round comes with strings attached: Investors have the right to withdraw their money if OpenAI does not complete its planned conversion from a nonprofit (with a for-profit division) to a fully for-profit company.

Venture capital firm Thrive Capital led the funding round with a $1.25 billion investment. Microsoft, a longtime backer of OpenAI to the tune of $13 billion, contributed just under $1 billion to the latest round. New investors joined the round, including SoftBank with a $500 million investment and Nvidia with $100 million.

The United Arab Emirates-based company MGX also invested in OpenAI during this funding round. MGX has been busy in AI recently, joining an AI infrastructure partnership last month led by Microsoft.

Notably, Apple was in talks to invest but ultimately did not participate. WSJ reports that the minimum investment required to review OpenAI’s financial documents was $250 million. In June, OpenAI hired its first chief financial officer, Sarah Friar, who played an important role in organizing this funding round, according to the WSJ.

OpenAI is now valued at $157 billion Read More »