Author name: Mike M.

new-app-releases-for-apple-vision-pro-have-fallen-dramatically-since-launch

New app releases for Apple Vision Pro have fallen dramatically since launch

Vision Pro, seen from below, in a display with a bright white light strip overhead.

Samuel Axon

Apple is struggling to attract fresh content for its innovative Vision Pro headset, with just a fraction of the apps available when compared with the number of developers created for the iPhone and iPad in their first few months.

The lack of a “killer app” to encourage customers to pay upwards of $3,500 for an unproven new product is seen as a problem for Apple, as the Vision Pro goes on sale in Europe on Friday.

Apple said recently that there were “more than 2,000” apps available for its “spatial computing” device, five months after it debuted in the US.

That compares with more than 20,000 iPad apps that had been created by mid-2010, a few months after the tablet first went on sale, and around 10,000 iPhone apps by the end of 2008, the year the App Store launched.

“The overall trajectory of the Vision Pro’s launch in February this year has been a lot slower than many hoped for,” said George Jijiashvili, analyst at market tracker Omdia.

“The reality is that most developers’ time and money will be dedicated to platforms with billions of users, rather than tens or hundreds of thousands.”

Apple believes the device will transform how millions work and play. The headset shifts between virtual reality, in which the wearer is immersed in a digital world, and a version of “augmented reality” that overlays images upon the real surroundings.

Omdia predicts that Apple will sell 350,000 Vision Pros this year. It forecasts an increase to 750,000 next year and 1.7 million in 2026, but the figures are far lower than the iPad, which sold almost 20 million units in its first year.

Estimates from IDC, a tech market researcher, suggest Apple shipped fewer than 100,000 units of Vision Pro in the first quarter, less than half what rival Meta sold of its Quest headsets.

Because of the device’s high price, Apple captured more than 50 percent of the total VR headset market by dollar value, IDC found, but analyst Francisco Jeronimo added: “The Vision Pro’s success, regardless of its price, will ultimately depend on the content available.”

Early data suggests that new content is arriving slowly. According to Appfigures, which tracks App Store listings, the number of new apps launched for the Vision Pro has fallen dramatically since January and February.

Nearly 300 of the top iPhone developers, whose apps are downloaded more than 10 million times a year—including Google, Meta, Tencent, Amazon, and Netflix—are yet to bring any of their software or services to Apple’s latest device.

Steve Lee, chief executive of AmazeVR, which offers immersive concert experiences, said that the recent launch of the device in China and elsewhere in Asia resulted in an uptick in downloads of his app. “However, it was about one-third of the initial launch in the United States.”

Lee remains confident that Vision Pro will eventually become a mainstream consumer product.

Wamsi Mohan, equity analyst at Bank of America, said the Vision Pro had “just not quite hit the imagination of the consumer.”

“This is one of the slower starts for a new Apple product category, just given the price point,” he said. “It seems management is emphasizing the success in enterprise a lot more.”

Nonetheless, some app developers are taking a leap of faith and launching on the Vision Pro. Some are betting that customers who can afford the pricey headset will be more likely to splurge on software, too.

Others are playing a longer game, hoping that establishing an early position on Apple’s newest platform will bring returns in the years to come.

New app releases for Apple Vision Pro have fallen dramatically since launch Read More »

captain-america:-brave-new-world-teaser-introduces-red-hulk-to-the-mcu

Captain America: Brave New World teaser introduces Red Hulk to the MCU

new world order —

There are quite a few familiar characters from 2008’s The Incredible Hulk.

Anthony Mackie wields the shield in Captain America: Brave New World.

Marvel Studios has dropped the first teaser for Captain America: Brave New World, star Anthony Mackie’s first cinematic appearance as the new Captain America after the Phase Four 2021 TV miniseries, The Falcon and the Winter Soldier. This is the fifth film in the MCU’s Phase Five, directed by Julius Onah (The Cloverfield Paradox) and building on events not just in F&WS but also the 2008 film The Incredible Hulk. The teaser feels like a half-superhero movie, half-political thriller, and with the tantalizing introduction of Red Hulk, it promises to be an entertaining ride.

(Spoilers for Avengers: Endgame and The Falcon and the Winter Soldier below.)

As previously reported, F&WS picked up in the wake of Avengers: End Game, when Steve Rogers (Chris Evans) handed his Captain America shield to Anthony Mackie’s Sam Wilson (The Falcon) and Sebastian Stan’s Bucky Barnes (The Winter Soldier), having chosen to remain in the past and live out his life with Peggy Carter. Sam and Bucky had to grapple with losing Steve and the burden of his legacy. Meanwhile, the US government had named its own new Captain America, John Walker (Wyatt Russell), a decorated veteran and ultimate “good soldier” who thought he could better embody “American values” than Rogers.  

All three men found themselves battling a terrorist group known as the Flag Smashers, many of whom had been enhanced with the Super Soldier Serum. Where did they get it? From a mysterious person known only as the Power Broker. The Flag Smashers were targeting the Global Repatriation Council (GRC) set up to help those who disappeared in the Snappening (or the Blip) and then returned and had to re-acclimate to a very different world. (Apparently, the Flag Smashers liked it better before everyone came back.) Everything culminated in a knock-down fight between a serum-enhanced Walker, Sam, and Bucky, ending with Sam’s wingsuit destroyed. Walker escaped with a broken arm, sans shield, and we saw him in a post-credits scene melting down his military medals to make a new shield.

A new adventure

Frankly, I didn’t love F&WS as much as other Ars staffers and critics; mostly I thought it was “meh” and wasted a lot of potential in terms of character development. But Russell’s performance as Walker was excellent, and who could forget that priceless scene with the evil Baron Helmut Zemo (Daniel Brühl) dancing? So I’m down for another Captain America adventure with Mackie wielding the shield.

Anthony Mackie is back as Sam Wilson, the new Captain America.

Enlarge / Anthony Mackie is back as Sam Wilson, the new Captain America.

YouTube/Marvel Studios

Per the official premise:

After meeting with newly elected US President Thaddeus Ross, played by Harrison Ford in his Marvel Cinematic Universe debut, Sam finds himself in the middle of an international incident. He must discover the reason behind a nefarious global plot before the true mastermind has the entire world seeing red.

In addition to Mackie and Ford, the cast includes Liv Tyler as the president’s daughter, Betty Ross, and Tim Blake Nelson as Samuel Sterns, both reprising their roles in 2008’s The Incredible Hulk. (Ford replaces the late William Hurt, who played Ross in that earlier film.) Carl Lumbley plays Isaiah Bradley, reprising his F&WS role as a Korean War veteran who had been secretly imprisoned and given the Super Soldier Serum against his will, enduring 30 years of experimentation. (He told Sam he couldn’t imagine how any black man could take up Captain America’s shield because of what it represented to people like him, and one could hardly blame him.)

Rosa Salazar plays Rachel Leighton, Danny Ramirez plays Joaquin Torres, and Shira Haas plays Ruth Bat-Seraph. Giancarlo Esposito will also appear in an as-yet-undisclosed role, but based on the brief glimpses we get in the teaser, it’s an antagonistic role.

Ooh, a glimpse of Red Hulk, mutant alter ego of Thaddeus Ross.

Enlarge / Ooh, a glimpse of Red Hulk, mutant alter ego of Thaddeus Ross.

YouTube/Marvel Studios

The teaser opens with Wilson visiting the White House to meet with President Ross. “You and I haven’t always agreed in the past,” Ross tells him. “But I wanna make another run at making Captain America an official military position.” Wilson has well-justified doubts after the events of F&WS, asking what would happen “if we disagree on how to manage a situation.” Ross just reiterates his invitation to work with Sam: “We’ll show the world a better way forward.”

Then Bradley appears at a public event and tries (unsuccessfully) to assassinate the president. Wilson warns Ross that his inner circle has been compromised, but the president appears to be in denial—or there’s something more nefarious going on. “Global power is shifting,” we hear Sterns say in a voiceover. “You’re just a pawn.” Is it a warning or a threat? (Reminder: in the 2008 film, Sterns was a cellular biologist trying to find a cure for Bruce Banner, only to be accidentally exposed to Banner’s blood and begin mutating himself into Leader.)

Cue much explosive action and mayhem. And in true Marvel fashion, there is one final shot of Red Hulk, the alter ego of Thaddeus Ross, whose big red hand is also prominently featured in the official poster below. It should be quite the showdown.

Captain America: Brave New World hits theaters on February 14, 2025.

Marvel Studios

Listing image by YouTube/Marvel Studios

Captain America: Brave New World teaser introduces Red Hulk to the MCU Read More »

frozen-mammoth-skin-retained-its-chromosome-structure

Frozen mammoth skin retained its chromosome structure

Artist's depiction of a large mammoth with brown fur and huge, curving tusks in an icy, tundra environment.

One of the challenges of working with ancient DNA samples is that damage accumulates over time, breaking up the structure of the double helix into ever smaller fragments. In the samples we’ve worked with, these fragments scatter and mix with contaminants, making reconstructing a genome a large technical challenge.

But a dramatic paper released on Thursday shows that this isn’t always true. Damage does create progressively smaller fragments of DNA over time. But, if they’re trapped in the right sort of material, they’ll stay right where they are, essentially preserving some key features of ancient chromosomes even as the underlying DNA decays. Researchers have now used that to detail the chromosome structure of mammoths, with some implications for how these mammals regulated some key genes.

DNA meets Hi-C

The backbone of DNA’s double helix consists of alternating sugars and phosphates, chemically linked together (the bases of DNA are chemically linked to these sugars). Damage from things like radiation can break these chemical linkages, with fragmentation increasing over time. When samples reach the age of something like a Neanderthal, very few fragments are longer than 100 base pairs. Since chromosomes are millions of base pairs long, it was thought that this would inevitably destroy their structure, as many of the fragments would simply diffuse away.

But that will only be true if the medium they’re in allows diffusion. And some scientists suspected that permafrost, which preserves the tissue of some now-extinct Arctic animals, might block that diffusion. So, they set out to test this using mammoth tissues, obtained from a sample termed YakInf that’s roughly 50,000 years old.

The challenge is that the molecular techniques we use to probe chromosomes take place in liquid solutions, where fragments would just drift away from each other in any case. So, the team focused on an approach termed Hi-C, which specifically preserves information about which bits of DNA were close to each other. It does this by exposing chromosomes to a chemical that will link any pieces of DNA that are close physical proximity. So, even if those pieces are fragments, they’ll be stuck to each other by the time they end up in a liquid solution.

A few enzymes are then used to convert these linked molecules to a single piece of DNA, which is then sequenced. This data, which will contain sequence information from two different parts of the genome, then tells us that those parts were once close to each other inside a cell.

Interpreting Hi-C

On its own, a single bit of data like this isn’t especially interesting; two bits of genome might end up next to each other at random. But when you have millions of bits of data like this, you can start to construct a map of how the genome is structured.

There are two basic rules governing the pattern of interactions we’d expect to see. The first is that interactions within a chromosome are going to be more common than interactions between two chromosomes. And, within a chromosome, parts that are physically closer to each other on the molecule are more likely to interact than those that are farther apart.

So, if you are looking at a specific segment of, say, chromosome 12, most of the locations Hi-C will find it interacting with will also be on chromosome 12. And the frequency of interactions will go up as you move to sequences that are ever closer to the one you’re interested in.

On its own, you can use Hi-C to help reconstruct a chromosome even if you start with nothing but fragments. But the exceptions to the expected pattern also tell us things about biology. For example, genes that are active tend to be on loops of DNA, with the two ends of the loop held together by proteins; the same is true for inactive genes. Interactions within these loops tend to be more frequent than interactions between them, subtly altering the frequency with which two fragments end up linked together during Hi-C.

Frozen mammoth skin retained its chromosome structure Read More »

can-you-do-better-than-top-level-ai-models-on-these-basic-vision-tests?

Can you do better than top-level AI models on these basic vision tests?

A bit myopic —

Abstract analysis that is trivial for humans often stymies GPT-4o, Gemini, and Sonnet.

Whatever you do, don't ask the AI how many horizontal lines are in this image.

Enlarge / Whatever you do, don’t ask the AI how many horizontal lines are in this image.

Getty Images

In the last couple of years, we’ve seen amazing advancements in AI systems when it comes to recognizing and analyzing the contents of complicated images. But a new paper highlights how many state-of-the-art “vision learning Models” (VLMs) often fail at simple, low-level visual analysis tasks that are trivially easy for a human.

In the provocatively titled pre-print paper “Vision language models are blind (which has a PDF version that includes a dark sunglasses emoji in the title), researchers from Auburn University and the University of Alberta create eight simple visual acuity tests with objectively correct answers. These range from identifying how often two colored lines intersect to identifying which letter in a long word has been circled to counting how many nested shapes exist in an image (representative examples and results can be viewed on the research team’s webpage).

  • If you can solve these kinds of puzzles, you may have better visual reasoning than state-of-the-art AIs.

  • The puzzles on the right are like something out of Highlights magazine.

  • A representative sample shows AI models failing at a task that most human children would find trivial.

Crucially, these tests are generated by custom code and don’t rely on pre-existing images or tests that could be found on the public Internet, thereby “minimiz[ing] the chance that VLMs can solve by memorization,” according to the researchers. The tests also “require minimal to zero world knowledge” beyond basic 2D shapes, making it difficult for the answer to be inferred from “textual question and choices alone” (which has been identified as an issue for some other visual AI benchmarks).

Are you smarter than a fifth grader?

After running multiple tests across four different visual models—GPT-4o, Gemini-1.5 Pro, Sonnet-3, and Sonnet-3.5—the researchers found all four fell well short of the 100 percent accuracy you might expect for such simple visual analysis tasks (and which most sighted humans would have little trouble achieving). But the size of the AI underperformance varied greatly depending on the specific task. When asked to count the number of rows and columns in a blank grid, for instance, the best-performing model only gave an accurate answer less than 60 percent of the time. On the other hand, Gemini-1.5 Pro hit nearly 93 percent accuracy in identifying circled letters, approaching human-level performance.

  • For some reason, the models tend to incorrectly guess the “o” is circled a lot more often than all the other letters in this test.

  • The models performed perfectly in counting five interlocking circles, a pattern they might be familiar with from common images of the Olympic rings.

  • Do you have an easier time counting columns than rows in a grid? If so, you probably aren’t an AI.

Even small changes to the tasks could also lead to huge changes in results. While all four tested models were able to correctly identify five overlapping hollow circles, the accuracy across all models dropped to well below 50 percent when six to nine circles were involved. The researchers hypothesize that this “suggests that VLMs are biased towards the well-known Olympic logo, which has 5 circles.” In other cases, models occasionally hallucinated nonsensical answers, such as guessing “9,” “n”, or “©” as the circled letter in the word “Subdermatoglyphic.”

Overall, the results highlight how AI models that can perform well at high-level visual reasoning have some significant “blind spots” (sorry) when it comes to low-level abstract images. It’s all somewhat reminiscent of similar capability gaps that we often see in state-of-the-art large language models, which can create extremely cogent summaries of lengthy texts while at the same time failing extremely basic math and spelling questions.

These gaps in VLM capabilities could come down to the inability of these systems to generalize beyond the kinds of content they are explicitly trained on. Yet when the researchers tried fine-tuning a model using specific images drawn from one of their tasks (the “are two circles touching?” test), that model showed only modest improvement, from 17 percent accuracy up to around 37 percent. “The loss values for all these experiments were very close to zero, indicating that the model overfits the training set but fails to generalize,” the researchers write.

The researchers propose that the VLM capability gap may be related to the so-called “late fusion” of vision encoders onto pre-trained large language models. An “early fusion” training approach that integrates visual encoding alongside language training could lead to better results on these low-level tasks, the researchers suggest (without providing any sort of analysis of this question).

Can you do better than top-level AI models on these basic vision tests? Read More »

apple-settles-eu-probe-by-opening-up-its-mobile-payments-system

Apple settles EU probe by opening up its mobile payments system

A small price to pay? —

iPhone users will get more choices to make “touch-and-go” payments in the EU.

Apple settles EU probe by opening up its mobile payments system

In two weeks, iPhone users in the European Union will be able to use any mobile wallet they like to complete “tap and go” payments with the ease of using Apple Pay.

The change comes as part of a settlement with the European Commission (EC), which investigated Apple for potentially shutting out rivals by denying access to the “Near Field Communication” (NFC) technology on its devices that enables the “tap and go” feature. Apple did not develop this technology, which is free for developers, the EC said, and going forward, Apple agreed to not charge developers fees to provide the NFC functionality on its devices.

In a press release, the EC’s executive vice president, Margrethe Vestager, said that Apple’s commitments in the settlement address the commission’s “preliminary concerns that Apple may have illegally restricted competition for mobile wallets on iPhones.”

“From now on, Apple can no longer use its control over the iPhone ecosystem to keep other mobile wallets out of the market,” Vestager said. “Competing wallet developers, as well as consumers, will benefit from these changes, opening up innovation and choice, while keeping payments secure.”

Apple has until July 25 to follow through on three commitments that resolve the EC’s concerns that Apple may have “prevented developers from bringing new and competing mobile wallets to iPhone users.”

Arguably, providing outside developers access to NFC functionality on its devices is the biggest change. Rather than allowing developers to access this functionality through Apple’s hardware, Apple has borrowed a solution prevalent in the Android ecosystem, Vestager said, granting access through a software solution called “Host Card Emulation mode.”

This, Vestager said, provides “an equivalent solution in terms of security and user experience” and paves the way for other wallets to be more easily used on Apple devices.

An Apple spokesperson told CNBC that “Apple is providing developers in the European Economic Area with an option to enable NFC contactless payments and contactless transactions for car keys, closed loop transit, corporate badges, home keys, hotel keys, merchant loyalty/rewards, and event tickets from within their iOS apps using Host Card Emulation based APIs.”

To ensure that Apple Pay is on an equal playing field with other wallets, the EC said that Apple committed to improve contactless payments functionality for rival wallets. That means that “iPhone users will be able to double-click the side button of their iPhones to launch” their preferred wallet and “use Face ID, Touch ID and passcode to verify” their identities when using competing wallets.

Perhaps most critically for users attracted to Apple’s payment options convenience, Apple also agreed to allow rival wallets to be set as the default payment option.

These commitments will remain in force for 10 years, Vestager said.

Apple did not immediately respond to Ars’ request for comment. Apple’s spokesperson confirmed to CNBC that no changes would be made to Apple Pay or Apple Wallet as a result of the settlement.

Apple’s commitments go beyond the DMA

Before accepting Apple’s commitments, the EC spoke to “many banks, app developers, card issuers, and financial associations,” Vestager said, whose feedback helped improve Apple’s commitments.

According to Vestager, Apple’s changes go beyond the requirements of the EU’s strict antitrust law, the Digital Markets Act, which “requires gatekeepers to ensure effective interoperability with hardware and software features that they use within their ecosystems,” including “access to NFC technology for mobile payments.”

Beyond the DMA, Apple agreed to have its compliance with the settlement “ensured by a monitoring trustee,” as well as to provide “a fast dispute resolution mechanism, which will also allow for an independent review of Apple’s implementation.”

Vestager assured all stakeholders in the European Economic Area that these changes will prevent any potential harms caused by Apple seeming to shut other wallets out of its devices, which “may have had a negative impact on innovation.” By settling the yearslong probe, Apple avoided a potentially large fine. In March, the EC fined Apple nearly $2 billion for restricting “alternative and cheaper music subscription services” like Spotify in its app store, and the suspected anticompetitive behavior in Apple’s payments ecosystem seemed just as harmful, the EC found.

“This reduction in choice and innovation is harmful,” Vestager said, confirming that the settlement concluded the EC’s probe into Apple Pay. “It is harmful to consumers and it is illegal under EU competition rules.”

Apple settles EU probe by opening up its mobile payments system Read More »

intuit’s-ai-gamble:-mass-layoff-of-1,800-paired-with-hiring-spree

Intuit’s AI gamble: Mass layoff of 1,800 paired with hiring spree

In the name of AI —

Intuit CEO: “Companies that aren’t prepared to take advantage of [AI] will fall behind.”

Signage for financial software company Intuit at the company's headquarters in the Silicon Valley town of Mountain View, California, August 24, 2016.

On Wednesday, Intuit CEO Sasan Goodarzi announced in a letter to the company that it would be laying off 1,800 employees—about 10 percent of its workforce of around 18,000—while simultaneously planning to hire the same number of new workers as part of a major restructuring effort purportedly focused on AI.

“As I’ve shared many times, the era of AI is one of the most significant technology shifts of our lifetime,” wrote Goodarzi in a blog post on Intuit’s website. “This is truly an extraordinary time—AI is igniting global innovation at an incredible pace, transforming every industry and company in ways that were unimaginable just a few years ago. Companies that aren’t prepared to take advantage of this AI revolution will fall behind and, over time, will no longer exist.”

The CEO says Intuit is in a position of strength and that the layoffs are not cost-cutting related, but they allow the company to “allocate additional investments to our most critical areas to support our customers and drive growth.” With new hires, the company expects its overall headcount to grow in its 2025 fiscal year.

Intuit’s layoffs (which collectively qualify as a “mass layoff” under the WARN act) hit various departments within the company, including closing Intuit’s offices in Edmonton, Canada, and Boise, Idaho, affecting over 250 employees. Approximately 1,050 employees will receive layoffs because they’re “not meeting expectations,” according to Goodarzi’s letter. Intuit has also eliminated more than 300 roles across the company to “streamline” operations and shift resources toward AI, and the company plans to consolidate 80 tech roles to “sites where we are strategically growing our technology teams and capabilities,” such as Atlanta, Bangalore, New York, Tel Aviv, and Toronto.

In turn, the company plans to accelerate investments in its AI-powered financial assistant, Intuit Assist, which provides AI-generated financial recommendations. The company also plans to hire new talent in engineering, product development, data science, and customer-facing roles, with a particular emphasis on AI expertise.

Not just about AI

Despite Goodarzi’s heavily AI-focused message, the restructuring at Intuit reveals a more complex picture. A closer look at the layoffs shows that many of the 1,800 job cuts stem from performance-based departures (such as the aforementioned 1,050). The restructuring also includes a 10 percent reduction in executive positions at the director level and above (“To continue increasing our velocity of decision making,” Goodarzi says).

These numbers suggest that the reorganization may also serve as an opportunity for Intuit to trim its workforce of underperforming staff, using the AI hype cycle as a compelling backdrop for a broader house-cleaning effort.

But as far as CEOs are concerned, it’s always a good time to talk about how they’re embracing the latest, hottest thing in technology: “With the introduction of GenAI,” Goodarzi wrote, “we are now delivering even more compelling customer experiences, increasing monetization potential, and driving efficiencies in how the work gets done within Intuit. But it’s just the beginning of the AI revolution.”

Intuit’s AI gamble: Mass layoff of 1,800 paired with hiring spree Read More »

ai-#72:-denying-the-future

AI #72: Denying the Future

The Future. It is coming.

A surprising number of economists deny this when it comes to AI. Not only do they deny the future that lies in the future. They also deny the future that is here, but which is unevenly distributed. Their predictions and projections do not factor in even what the AI can already do, let alone what it will learn to do later on.

Another likely future event is the repeal of the Biden Executive Order. That repeal is part of the Republican platform, and Trump is the favorite to win the election. We must act on the assumption that the order likely will be repealed, with no expectation of similar principles being enshrined in federal law.

Then there are the other core problems we will have to solve, and other less core problems such as what to do about AI companions. They make people feel less lonely over a week, but what do they do over a lifetime?

Also I don’t have that much to say about it now, but it is worth noting that this week it was revealed Apple was going to get an observer board seat at OpenAI… and then both Apple and Microsoft gave up their observer seats. Presumably that is about antitrust and worrying the seats would be a bad look. There could also be more to it.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Long as you avoid GPT-3.5.

  4. Language Models Don’t Offer Mundane Utility. Many mistakes will not be caught.

  5. You’re a Nudge. You say it’s for my own good.

  6. Fun With Image Generation. Universal control net for SDXL.

  7. Deepfaketown and Botpocalypse Soon. Owner of a lonely bot.

  8. They Took Our Jobs. Restaurants.

  9. Get Involved. But not in that way.

  10. Introducing. Anthropic ships several new features.

  11. In Other AI News. Microsoft and Apple give up OpenAI board observer seats.

  12. Quiet Speculations. As other papers learned, to keep pace, you must move fast.

  13. The AI Denialist Economists. Why doubt only the future? Doubt the present too.

  14. The Quest for Sane Regulation. EU and FTC decide that things are their business.

  15. Trump Would Repeal the Biden Executive Order on AI. We can’t rely on it.

  16. Ordinary Americans Are Worried About AI. Every poll says the same thing.

  17. The Week in Audio. Carl Shulman on 80,000 hours was a two parter.

  18. The Wikipedia War. One obsessed man can do quite a lot of damage.

  19. Rhetorical Innovation. Yoshua Bengio gives a strong effort.

  20. Evaluations Must Mimic Relevant Conditions. Too often they don’t.

  21. Aligning a Smarter Than Human Intelligence is Difficult. Stealth fine tuning.

  22. The Problem. If we want to survive, it must be solved.

  23. Oh Anthropic. Non Disparagement agreements should not be covered by NDAs.

  24. Other People Are Not As Worried About AI Killing Everyone. Don’t feel the AGI.

Yes, they are highly useful for coding. It turns out that if you use GPT-3.5 for your ‘can ChatGPT code well enough’ paper, your results are not going to be relevant. Gallabytes says ‘that’s morally fraud imho’ and that seems at least reasonable.

Tests failing in GPT-3.5 is the AI equivalent of “IN MICE” except for IQ tests.

If you are going to analyze the state of AI, you need to keep an eye out for basic errors and always always check which model is used. So if you go quoting statements such as:

Paper about GPT-3.5: its ability to generate functional code for ‘hard’ problems dropped from 40% to 0.66% after this time as well. ‘A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset

Then even if you hadn’t realized or checked before (which you really should have), you need to notice that this says 2021, which is very much not the current knowledge cutoff, and realize this is not GPT-4o or even an older GPT-4.

You can also notice that the statement is Obvious Nonsense and people are now using ChatGPT (and increasingly Claude 3.5 Sonnet) this way all the time.

I also like this way of putting the value of AI for coding:

Gallabytes: I literally use chatgpt (well usually Claude) generated code in building an economically interesting product every day.

There’s more hype than is deserved, I certainly don’t yet fear for my employment prospects, but if I had to choose between vim key bindings & AI I’d pick AI.

I’d also take it over syntax highlighting, code folding, and static typing, but wouldn’t yet choose it over touch typing.

I definitely would not take it over touch typing in general, but if it was touch typing while typing code in particular I would take that deal because I can copy/paste outputs into the code window. On the others it is not even close.

Thread on how to get LLMs to do their own prompting for improved performance, also another test that shows Claude 3.5 is the current best model.

The spread of ‘patio11 official style AI communication’ continues. Use for all your generic communications with bureaucratic processes.

Beware the subtle mistakes?

Shako: Talking to an LLM on something I’m an expert on: “Hmmm, I see why you think that, but you’re not *quiteright, and you’re wrong in some critical but subtle ways”

Talking to an LLM on anything else: “Wow you’re the smartest person I’ve ever known, how do you know everything?”

The worry is Gell-Mann Amnesia.

The good news is that being only subtly wrong is a huge improvement over one’s state of knowledge for most questions in most areas. The default state is either very wrong or not even wrong. Now you get to be subtly wrong and worry about mistakes and hallucinations. That’s a huge improvement. The key is not taking it too credibly.

AI teaching assistants at Morehouse College. I don’t get it. Seems like fetishizing the classroom format rather than asking how AI can be useful.

Daniel: Immediately disliking the stapled on AI assistant in every product. Unpredictable experience so I’d rather not bother. Feels like slop every time.

It’s not even a good modality. Why type out a question when I can click twice from my account page to get the same information in a predictable way. Count how many times my finger touches the screen to accomplish a goal and the bots lose every time.

You type out the question (or speak it) because you do not know which buttons to click from the account page. Daniel and I, and most of you reading this, are frequent power users and voluntary users of menus. Most people aren’t. Even for us, there are times when we do not yet know the menu item in question. So I do appreciate these bad AIs, even implemented poorly, when they are fully optional. When they actively force the bot on you, it becomes ‘how do you get to a human,’ and that too is a skill.

No, users will mostly not check to see if your LLM is making mistakes once it crosses an accuracy threshold, unless they have a particular reason to do so. Why should they? One must prioritize and develop a sense of when things are accurate enough.

Sully mostly gives up on fine tuning, because by the time you are done fine tuning there is a new model that wipes out all your work.

Sam Altman joins forces with Ariana Huffington (?!) to write a Time article about how AI can help with healthcare.

Sam Altman and Ariana Huffington: But humans are more than medical profiles. Every aspect of our health is deeply influenced by the five foundational daily behaviors of sleep, food, movement, stress management, and social connection. And AI, by using the power of hyper-personalization, can significantly improve these behaviors.

These are the ideas behind Thrive AI Health.

It will learn your preferences and patterns across the five behaviors: what conditions allow you to get quality sleep; which foods you love and don’t love; how and when you’re most likely to walk, move, and stretch; and the most effective ways you can reduce stress. Combine that with a superhuman long-term memory, and you have a fully integrated personal AI coach that offers real-time nudges and recommendations unique to you that allows you to take action on your daily behaviors to improve your health.

Most health recommendations at the moment, though important, are generic.

As far as I can tell the whole thing is essentially a nudge engine?

The goal is to point out to people they should be making ‘healthier choices,’ according to Thrive’s beliefs about what that means. I suppose that is good on the margin, versus a nudge engine for healthier choices that doesn’t take context into account, if you can ‘automagically’ do that. But how is the AI going to get all that info, and what else might it get used for? There are answers I can come up with. I don’t love them.

Universal CN (ControlNet) for SDXL?

Use of AI companions reduce short term loneliness over 7 days, similarly to ‘interactions with a human.’ The human interactions were 15 minute text chat sessions with a random other person, so that is not as high a bar as it sounds. Chatbots ‘acting like a human’ worked better than baseline mode, and did much better than ‘AI assistants.’ The impact was ~8 points on a 0-100 scale, but no attempt was made to see if that persisted for any length of time.

The key questions are thus not addressed. The most important is, does this develop skills and habits that enable better and more human interactions in the long term, or does it create dependence and tolerance and alienation from human interactions? Which effect dominates and to what extent? Having an impact short term ‘gets you in the door’ but the long term effects are almost all of what matters.

Fell into the slop trap? You can perhaps escape.

Brooke Bownan: PSA I spent the past few days going through and clicking ‘show fewer posts from this account’ for every slop account that showed up on my FYP and I just realized it’s now basically all people I want to see again.

Who knows how long it’ll last but it’s a nice reprieve.

Caleb Ditchfield: I recently did the same thing on Facebook! Sometimes all you need is more dakka.

Johnny: it grows back fast though. the sheer amount of necessary weeding 🙁

Tiger Lava Lamp: Sometimes if I don’t like my For You page, I switch back to Following for a while and only interact there until the algorithm understands that I don’t want random people talking about the topic of the day.

Mothman: I dislike how algos train on every video play and linger on a post. I am an ape with no control. Now only meme accts on fyp. Only scroll on following now.

First we had small social worlds where everyone was constantly watched and you had to act accordingly. Then we got computers where you could do what you want. Now the algorithms watch us, so we have to take it all into account once again. The good news is you can brute force it in some cases. I sometimes wonder if I should have multiple YouTube accounts for different purposes.

TerifAI, the AI that clones your voice if you talk to it for a minute.

Microsoft publishes a paper on VALL-E 2, a zero-shot text to speech synthesizer that also clones a given voice. They say this is a research project and have no plans to release.

The obvious question is, if you think your creation is too harmful or dangerous to release even though it is clearly useful, why would you tell others how you did it?

One good reason to clone your voice is when you lose it, so you use AI to get it back.

We got another one.

Toby Muresianu: Lol it really worked.

Misha: was anyone falling for propaganda tweeted by “rsmit1979qt”?

FBI: The Justice Department today announced the seizure of two domain names and the search of 968 social media accounts used by Russian actors to create an AI-enhanced social media bot farm that spread disinformation in the United States and abroad. Learn more [here].

The examples the FBI lists are bizarre. This is not the top shelf Russian propaganda. These are claims that might play inside Russia, but I would expect to backfire when shared in the United States. A video of Putin claiming parts of Poland and the Baltics were ‘gifts from Stalin’? What is that message hoping to accomplish?

The other question is, what is the ratio of cost to detect and shut down this ‘bot network’ to the cost to spin up a new one? No one involved had their computers taken away and no one got arrested, since Russia is not exactly cooperating. It is not exactly hard to create 968 social media accounts, even if there is some time lag before they become ‘fully functional.’

Thus the main thing happening here, as far as I can tell, is the narrative of the Russian Bot. As in, the Russian bot network is teaming up with the FBI to tell people there exist Russian bots. That is the main actual message. Doesn’t seem like an equilibrium?

16% of restaurant owners investing in AI this year, with most big spending coming from large chains that can benefit from scale. New minimum wage and benefit laws are contributing, but this mostly would be happening anyway.

Not in that way: 80,000 hours continues to list OpenAI jobs on its job board, despite everything that has happened. There are no warnings regarding either OpenAI’s record on safety, its broken promises, or even how OpenAI has treated its employees. Sending idealistic young people towards OpenAI without so much as a heads up on their issues is a severe missing stair problem and I call upon them to fix this.

Several improvements to the Anthropic Console (access it here). Have it automatically generate and try test data for your new prompt. Use the evaluate button to keep trying, upload test cases from .csv files, and compare different prompts and their outputs side by side.

Anthropic also letting you publish Claude artifacts.

Anthropic also is now letting you fine tune Claude 3 Haiku. Sully is excited, but I would also heed Sully’s warning about new model releases wiping out all your fine tuning work. Chances are Claude 3.5 Haiku is coming not too long from now.

SenseNova 5.5, a Chinese model claimed to ‘outperform GPT-4o in 5 out of 8 key metrics.’

As usual with such announcements, this is doubtless heavily gamed and also represents some amount of progress. Another obvious question, where on this chart is DeepSeek?

YouTube copyright music remover for videos.

OpenAI’s ChatGPT Mac app was sharing conversations in plaintext. If we want any hope of getting the big hard things right, we need to get the little easy things right.

OpenAI partners with Los Alamos National Laboratory to study how to advance bioscientific research. Good news that is almost engineered to sound bad.

Claude and Gemini will, if requested to do so, reproduce the BIG-BENCH canary string designed to detect if you are training on BIG-BENCH data. Which you are not supposed to be doing, as it is explicitly marked Not for Training. Both models understood the implications of their ability to produce the string.

New version of Siri that incorporates Apple Intelligence delayed until Spring 2025. That is an eternity in AI time. Apple Vision Pro also won’t get Apple Intelligence until 2025. Whereas Google is moving up the Pixel 9 launch to the summer. Watch who talks big hype, and who ships what when.

OpenAI was to give Apple an observer seat on its board. The contrast to Microsoft’s struggles here is stark. The intended move shows more of OpenAI’s shift towards being a normal tech company caring about normal tech company things. Then Microsoft and it is expected Apple gave up their observer seats ‘amid regulatory scrutiny.’ The observations are nice, but not worth the absurd anti-trust accusations of ‘monopoly’ or ‘collusion.’

Details about the OpenAI hack in April 2023, previously undisclosed to the public and also undisclosed to law enforcement. The hacker was a ‘private individual,’ and they said no key data was extracted, oh no it was only access to the internal communications channels. What, me worry? What national security threat?

Teams using DeepSeek’s DeepSeekMath-7B take the top four slots in the AI Mathematical Olympiad (AIMO)’s first progress prize on Kaggle. The winning team got 29/50 and won $131k, seven more points than second place. A lot of teams got 18+, four scored 21+, only one got over 22. Gemma 7B by default scores 3/50. Terence Tao is reportedly amazed although I didn’t see him mention it yet in his blog. Without knowing the questions it is hard to know how impressed to be by the score, but the prizes are big enough that this is an impressive relative outcome.

Report: The New York Times uses mostly industry sources when it covers AI, oh no. They have a narrative of ‘hero vs. villain,’ in the New York Times of all places, why I never. Outsiders are called ‘outside experts’ as if that is fair. Using ‘obscure language,’ this report says, ‘preserves the power structures that benefit technology developers and their respective organizations.’ What are these obscure terms? Well one of them is ‘AI’ and they point to a weird case where an article uses AGT instead of AGI.

What is hilarious about all this is the fear that The New York Times has too much ‘industry alignment’ with the biggest AI tech companies. Seriously, have these people seen The New York Times? It has systematically, for many years, pushed a unified and intentional anti-technology anti-big-tech narrative. For some people, I suppose, no amount of that is ever enough.

Paper asks to what extent various LLMs are ‘situationally aware.’

The answer is not very situationally aware by default under current conditions. Everyone does better than chance but no model here does well or approaches the human-imitating-LLMs baseline.

DeepMind paper claims new JEST method reduces training time by a factor of 13 and computing power demand by 90%. Proposed method is to jointly select batches of data, with an algorithm proposed for making such selections, to steer towards smaller, well-curated datasets via checking with smaller models to see which data sets work. Sounds impressive, but the obvious question is: If you did discover something this good would you not pay the researchers however many millions it took to get them happy with not saying a word about it? Seriously, Google, as both a human who wants to live and as a shareholder, I am begging you.

Progress in AI is fast. As Ethan Mollick and Matt Clancy point out, if you use traditional paper writing timeframes and protocols, the models you tested with, and probably your scaffolding and prompt engineering too, are obsolete by the time you publish. Matt Clancy suggests ‘living papers’ that update results as new models come out, bypassing revision requests and having smaller more frequent conferences. I agree and think we should mostly test quickly, write quickly and publish quickly.

This goes part and parcel with the people who say ‘I will read that when it is published in a respected peer reviewed journal, Arvix does not count as knowledge or evidence’ or otherwise insist on going through ‘proper scientific rigor.’ The process of ‘proper scientific rigor’ as practiced today is horribly broken, and even when it works it is painfully slow and unwilling to recognize the majority of important evidence and forms of argument. Those who fail to adapt will lack situational awareness, and be left behind.

Ajeya Cotra offers additional thoughts on the ‘AI Agents That Matter’ paper from last week, highlighting the issue that a 2% error rate compounds quickly over many subtasks if there is no error correction mechanism, and reducing error rates can turn otherwise not that useful agents into very useful agents quickly.

Ajeya Cotra: I think we could see 2025 agents blow past WebArena / GAIA. So in addition to the 5 points the authors highlighted, I think we should make *difficultbenchmarks to maximize longevity and minimize surprise.

Could synthetic personas allow the creation of sufficient synthetic data solve the data bottleneck? Tencent claims they used a million fake people to generate better synthetic math data.

Wait, ‘math’ data? Can’t you generate as many generic math problems and examples as you like without personas? Claude could not come up with any explanations of why the paper is evidence that the technique is useful. As a general rule, if you choose a highly unimpressive example case, that likely means your other attempts didn’t work.

Imagine Robin Hanson, as the world transforms around him, once again saying ‘sell.’

Yoni Rechtman: In the last few weeks Goldman, Sequoia, and Barclays have all put out equity research that says “the AI capex boom doesn’t make sense”. That feels like a pretty significant harbinger of a major sentiment shift.

Sam Lessin: at @slow we put out that note 18 months ago..

Yoni: Consensus is slowly catching up to us.

Turner Novak: Slow was fast on this one. Total head fake.

Jay Wong: Sequoia also came out with their essay about the missing $600B in AI revenue to justify projected capex spend this year.

You’re hearing it not only from the sellside, but buyside too.

These new calls are echoes of the ‘GDP growth might expand 1.6% over ten years, AI is very exciting’ economic analysis caucus. They lack even the most basic of situational awareness. I welcome their bear cases.

I remember when a key argument against AI was ‘if AI is going to be so big why is that not priced into the market?’

Now instead we see ‘AI is not going to be so big, why is it priced into the market?’

Which is funny, since no, it is not yet (fully) priced into the market. Not even close.

Arthur Breitman: Goldman Sacks research report on AI is making me more bullish on the sector because it indicates that theses that seem dead obvious to me aren’t at all consensus among sophisticated market participants with deep pockets. I could be wrong, but if I’m right, at least it’s not all priced in.

Anton: The last of these i read had the first really nonsensical part roughly 20 paragraphs in, but this new one has it two paragraphs in.

[From Ed Zitron]: The report covers Al’s productivity benefits (which Goldman remarks are likely limited), Al’s returns (which are likely to be significantly more limited than anticipated), and Al’s power demands (which are likely so significant that utility companies will have to spend nearly 40% more in the next three years to keep up with the demand from hyperscalers like Google and Microsoft).

Anton: Are the utility companies expected to expand capacity by 40% out of the goodness of their own hearts or are microsoft etc. going to pay them lots of money to do that? If so, why?

The sequoia report makes more sense and fits into the general feeling i have that this era is most like the early web, and then early cloud; software requires substantial capex for the first time in 30+ years.

Just like the early web, you have to actually use (or better yet build with) the new technology to really ‘get’ it; top-down analysis isn’t likely to give you a sense of what’s going to happen here in the long run

If you look at the details, such as the answer at the link by Jim Covello, author of the Goldman Sacks report, you see assessments and predictions about AI that are unmoored from reality. These analysts are treating AI as worse than it already is, as not useful for tasks where I constantly use it. Jim says AI often fails on basic summation, whereas I use Claude to get detailed logical analysis and explanations of research papers several times a week.

It also cites that terrible paper from Daron Acemoglu about how if you assume AI never does anything then it won’t do anything. Somehow Daron has decided to be ‘even more pessimistic’ now.

I always love dismissers who say things like ‘Wendy’s order taking AI requires intervention on 14% of orders and it can’t make the food’ to dismiss AI’s value, when

  1. That means 86% of the time it did not need help

  2. This is the worst it will ever be

  3. Wendy’s is indeed already doing this because it is already better

  4. Presumably they are using old technology and would do much better if they updated although I’m not going to check

  5. Doing parts of many jobs is an economic win far bigger than they claim is possible

  6. Also it will probably be serving the food within that 10 year window

  7. Seriously, come on, this is dumb, have these people actually used 4-level LLMs?

These people throw shade on the idea that LLMs will ever get better from here, or get much better, and keep doing so as they get better. The thing is, even if they were somehow 100% right about that and Claude Sonnet 3.5 is the best model we ever get, that is more than good enough to eclipse the absurd predictions made in such reports.

Goldman Sachs is full of smart people. How is it making this mistake?

My answer is they utterly lack situational awareness on this, being intelligent is only sometimes a defense against using your learned algorithms out of distribution without noticing something is amiss, and I’ve seen enough economic types make the same mistakes over and over that it no longer surprises me.

Others think it must be an op, and this is Goldman so I can’t blame them for asking.

Sophie: a few thoughts about this piece [from Ed Zitron]

– 99% of journalists are retarded and can be ignored completely

– this article just rehashes the goldman report like a victory lap for the author who seems to desperately want genAI to be a bubble (also the swearing makes it read super cringe)

– who cares what goldman sachs thinks about AI. it’s clearly becoming en vogue to say “AI is a bubble unless the revenue can start justifying the capex” which might be true but doesn’t provide any actionable insight, just seems like a way to make the author feel “right” about having this take (which isn’t novel anymore considering there’s a bunch of pieces going around saying the same thing)

– you don’t have to see the world in these absolute terms that people who publish stuff want you to see it in. you’re allowed to color your own worldview with nuance as you see fit

Danielle Fong: These capital providers aren’t saying that generative AI is a bubble because they’re not going to invest. they’re saying it because they want less competition for $$ and want to negotiate prices down. It’s collective bargaining.

That does not mean that any particular stock has to be underpriced. Perhaps Nvidia or Google or Microsoft will go down rather than up over the next year. Stranger things have happened and nothing I say is investment advice. Who exactly ends up making money is hard to predict. What I do know is that these predictions of overall economic impact are absurd, and a lot of things are about to change.

What does it mean for AI that Labour won a massive Parliament majority in the UK?

It looks a lot like a continuation of Sunak’s goals. Their manifesto commits them to ‘binding regulation’ on those training the most powerful AI models, a ban on deepfakes and making it easier to build data centers. They intend to put the AI safety institute on ‘a statutory footing’ and require labs to release safety data, a la the voluntary commitments at Seoul.

Ben Thompson goes on a righteous rant about the EU’s various destructive and inane rules around data and technology and its strong arming of American tech companies. The Nvidia case, where they plan to fine Nvidia more than Nvidia’s entire EU revenues and thus presumably cause Nvidia to exit entirely, is especially glaring, but he is clearly right about the Meta and Apple and Google examples too. The EU is making life worse all around for no substantial benefits. He warns that the EU is overplaying its hand, including with the EU AI Act, so strongly that it risks tech companies increasingly bypassing it entirely.

What he does not offer are any details on the EU AI Act, and which provisions would be expensive or impossible to comply with. There are indeed various rather stupid provisions in the EU AI Act, but it is extremely long and painful to read and I would really love it if someone else, perhaps someone European, would do the job of telling us what it actually says so I don’t have to. I will try to do it, but have mercy, haven’t I suffered enough?

From what I did see, the EU AI Act is the largely the EU being the EU but there is a reason Apple is citing the DMA and data laws rather than the AI Act when delaying its AI offerings in the EU.

FTC decides it is its business, somehow, like all the other things it thinks are its business, to state that open weight foundation models create innovation and competition, and the issue is them not being open enough. Zero mention of any of the reasons why one might want to be concerned, or anything legally binding. I wonder who got to them, but hey, congrats you did it, I guess.

Republican party platform officially includes repeal of the Biden Executive Order, along with other hated Biden policies such as (to paraphrase his tariff proposal slightly) ‘trading with other countries.’

Why are the Republicans doing this? What does this mean for AI regulation, aside from who to vote for in November?

No one can ever know for sure, but it sure seems like a pretty simple explanation:

  1. Trump wants the Executive Order gone, so that is in the platform.

  2. Trump wants the Executive Order gone because it was implemented by Biden.

  3. Trump also thinks Regulations (That Don’t Get Me Paid) Are Bad, so there’s that.

  4. (Optional) Lobbying/bribing by a16z and others who value their profits above all.

Jeremy Howard gloats that of course there is polarization now, because of SB 1047, he told you so.

Except that is Obvious Nonsense.

Trump said he would repeal the Executive Order the first time he was asked, long before SB 1047 was on anyone’s mind or could plausibly have factored into his decision. Why? See above.

The votes on SB 1047 in California are passing on robust bipartisan lines, yes California has Republicans. The votes are usually or always 90%+ in favor.

Popular opinion remains remarkably united regardless of party. The parties are working together remarkably well on this.

The ‘partisan issue’ is that Trump is reflectively opposed to anything Biden does.

Are we under some strange hallucination that if California was taking a different regulatory approach then Trump would be keeping the EO?

I cannot get that statement to even parse. It makes zero sense.

Instead, this (to the extent it is new information, which it mostly is not) greatly strengthens the case of state actions like SB 1047.

What is the best argument against passing a regulatory law in California?

The best argument is that it would make it harder to pass a regulatory law in Washington, or that we would be better served by passing a law in Washington, or that we can do it (and to some extent via the Executive Order are doing it) via the existing administrative state.

That argument is strong if you think Congress and the White House are capable of passing such a law, or of implementing this via Executive Orders and the administrative state. If Trump (and the Supreme Court) are determined to hamstring the administrative state and its ability to build state capacity and knowledge on AI?

What other option do we have?

The Republican platform also tells us we will create a ‘robust Manufacturing Industry in Near Earth Orbit.’ It is good to aspire to things. It would also be good to attempt to correspond to reality. I mean, yes, I’m for it in principle, but in the same way I want to, as the chapter calls it ‘build the greatest economy in history’ if we can do that without it being an AI’s economy and also ending all of history.

To be fair, there are non-zero good things too, such as energy permitting reform. The thing to note here is that the likely once and future president is going to start by taking a big step backwards.

Yet another poll of 1,040 Americans by AIPI says voters are for safety regulations on AI and against against turning it into a race.

Billy Perrigo (Time): According to the poll, 75% of Democrats and 75% of Republicans believe that “taking a careful controlled approach” to AI—by preventing the release of tools that terrorists and foreign adversaries could use against the U.S.—is preferable to “moving forward on AI as fast as possible to be the first country to get extremely powerful AI.” A majority of voters support more stringent security practices at AI companies, and are worried about the risk of China stealing their most powerful models, the poll shows. 

The poll was carried out in late June by the AI Policy Institute (AIPI), a U.S. nonprofit that advocates for “a more cautious path” in AI development. The findings show that 50% of voters believe the U.S. should use its advantage in the AI race to prevent any country from building a powerful AI system, by enforcing “safety restrictions and aggressive testing requirements.” That’s compared to just 23% who believe the U.S. should try to build powerful AI as fast as possible to outpace China and achieve a decisive advantage over Beijing.

The polling also suggests that voters may be broadly skeptical of “open-source” AI, or the view that tech companies should be allowed to release the source code of their powerful AI models.

The polls also showed that 83% of Americans believe AI could accidentally cause a catastrophic event, and that 82% prefer slowing down AI development to account for that risk, compared to just 8% who would like to see it accelerated.

I went to their website to see the details and they haven’t posted them yet. I’ll take a look when they do.

AIPI certainly is strongly in favor of making sure we do not all die. Is AIPI slanting their question wording and order somewhat? Based on previous surveys, not egregiously, but not zero either. Do we know that people actually care about this or consider it important enough to change their votes? Not yet, no.

I do think such polls show definitively that the public is suspicious and fearful of AI in a variety of ways, and that once the salience of the issue grows politicians will be under quite a lot of pressure to get in line.

Similarly, your periodic reminder that SB 1047 is very popular. It has 75%+ popular support in surveys. It passes every vote by lawmaker overwhelming margins.

A bunch of very loud and obnoxious and mostly deeply disingenuous people have decided that if they are loud and obnoxious as often as possible on Twitter, and say various things that have no relation to what is actually in the bill or what its impact would be or where it does and does not apply, then people will confuse Twitter with real life, and think that SB 1047 is unpopular or turning people against EA or widely seen as a tyranny or whatever.

It’s not true. Do not fall for this.

Things that are not happening.

Tsarathustra: Former Google X executive Mo Gawdat says the mainstream media is hiding the truth that the trajectory of AI is on course to end our world as we know it within 4 years.

He’s not even talking about existential risk, he is talking about things like job losses and balance of power among humans. So is he, too, ‘hiding the truth’ about AI? No.

Like him, the mainstream media is not ‘hiding the truth’ about AI. The mainstream media does not have any inkling of the truth about AI. It hides nothing.

He also says ‘ChatGPT is as intelligent as Einstein,’ which is quite the claim, and which would have implications he is not at all considering here. Instead he goes on to discuss various mundane concerns.

Demis Hassabis talks with Tony Blair. Nothing new here.

Carl Shulman on 80,000 hours, part 2.

Documentation of some highly effective rhetorical innovation: David Gerard’s ongoing war to create malicious Wikipedia articles about those he dislikes, in particular LessWrong. I confirmed that the previous version of page was essentially a libel against the site, and the current version is only slightly better. The opening implies the site is a ‘doomsday cult.’ There is still – to this day – an entire section discussing Neoreaction, purely because Gerard wants to imply some link.

About half of the old version of the page was about a sufficiently obscure concept (R’s B that I can’t remember the last time anyone else mentioned it on LessWrong, which has since been trimmed to one paragraph but is still presented as to draw one’s attention as a central focus. Even more than that, almost all other discussion is hidden or minimized. Key facts, such as the revival of the cite by Oliver Habryka, or even the site’s focus on AI, remain not present. There is no list of or reference to its major authors and contributors beyond Eliezer Yudkowsky. And so on.

The good news is that a spot check of pages for individuals seemed far better. My own page clearly remains untouched and almost entirely about my Magic: The Gathering career. My blog is linked, but my writings on Covid and AI are not mentioned. It contains an easy to correct minor factual error (my time at MetaMed preceded Jane Street) but one does not edit one’s own page, I am curious how fast that gets fixed.

Some of you reading this edit Wikipedia, or know people who do, including people higher up.

If that is you, I implore you: Read this article, look at the LessWrong page, and notice that this has been permitted to continue for a decade. FIX IT. And call upon those in charge, whoever they are, to deal with your David Gerard problem once and for all.

If this cannot be addressed despite this level of attention, at least to the point of making this not a clear ‘hit job’ on the community, then I will update accordingly.

If that is not you (or if it is), take this knowledge with you as you read the rest of Wikipedia, including noticing how they react from here, and judge to what extent it is a ‘reliable source.’ Which it mostly still is, but, well, yeah.

We should also pay attention to whether his more general war to label sources as reliable or unreliable gets any pushback. Wikipedia’s problem there is far bigger than what it says in its article about one little website.

Some other confirmations:

Aella: This guy is doing exactly the same thing on my wiki page – making sure I’m referred to as an “influencer” and not a “researcher” for example. Imo this guy should be banned from editing anything related to the rationalist scene.

Jon Stokes: This was great. I’ve encountered the guy at the center of this article, & it was super unpleasant. He’s been on my radar for a while, but I had no idea he was this influential.

George Punished: Yeah, great work. Encountered this guy, same impression, but seeing it all laid out like this is sobering and impressive.

Kelsey Piper: This is a fascinating long read about how the quirks of one prolific Wikipedia editor affect the internet. (I have an investment in this story; this guy has hated me since I was in college and spent a while campaigning to get my wikipedia page deleted.)

Paul Crowley: It’s also fun to see people in the comments discover that the entire idea that LessWrong folk care about R’s Bis a lie that David Gerard quite deliberately crafted and spread.

Ght Trp: After I started reading LW it quickly became obvious that [it] was not something anyone really cared about. And I was always confused why it came up so often when rationalists/LW were being discussed in other parts of the internet.

A sad assessment, yes this all applies beyond Wikipedia:

Atlanticesque: To weigh in as a Wikipedia Defender who has told many people to Start Editing — Lots of awful people have power in Wikipedia and fight their petty crusades. But the only way to defang these losers is to do the work, build credibility, and break their consensus. We just have to.

Trace wrote a great exposé here on exactly the sort of creep who thrives in this environment. But what you might’ve noticed? Gerard’s pathological fixations are relatively narrow.

Most of the site is still the Wild West. Most articles are not closely guarded, they’re wide open.

Yes there’s institutional biases (such as the notoriously arbitrary ‘Reliable Sources’ list) but I’ve seen countervailing narratives win fights over pages… WHEN that countervailing narrative is backed up by strong sources, rules knowledge, and respectful argument.

Don’t be lazy.

Kelsey Piper: On the one hand, this is totally true and realistic advice- not just about Wikipedia, but about life. Decisions get made by the people who show up. On the other hand, the first time I attempted wiki editing, the guy this piece is about reverted everything and was a hostile ass.

When people try to show up, this guy and people like him instantly remove their contributions and refuse to explain what they can do better, or explain it in a long intimidating wall of legalese. Like I’m sure many others, I just gave up on wiki editing.

Ian Miller: There’s “don’t be lazy” and there’s “write literally hundreds of thousands of things for 30 years”. At some point, “get involved” is worthless when you get slapped down by people whose whole life is about being better than peons (aka those who don’t write hundreds of thousands)

Misha: Possibly the most important thing to realize from

@tracewoodgrains’s post about David Gerard is that this is a microcosm.

The internet has way more of these obsessive feuds than you could ever reasonably track.

“Feud” is not even coming close to describing what’s going on in many cases.

From a locked account. “So perhaps the next time you read a weird-Florida-news story, don’t ask why Florida is so weird; ask why you’re not hearing about the weirdness in other states. It might have something to do with their lack of open government.”

In any organization, over a long enough time horizon, there will arise an implicit coalition devoted to promoting those who promote the advancement of the implicit coalition, and who care about winning political fights rather than the organization’s supposed goal. If the rest of the organization does not actively fight this, the organization will increasingly fall into the coalition’s control. See the Moral Mazes sequence.

Atlanticesque is saying that you must fight such people step by step, with a similar obsession over the fights, and do the work.

Over the long run, that will not get it done, unless that includes stripping those waging these petty battles of power. It is not viable to beat the Gerards of the world via fighting them on every little edit. You do not beat cheaters by catching them every single time and forcing them to undo each individual cheat. You do not beat defectors by reverting the impact every time you see them defect back to the status quo.

You beat cheaters and defectors through punishment. Or you lose.

Yoshua Bengio tries again at length to explain why he is worried about AI existential risk and believes it is worth taking AI safety and existential risk seriously, stating the basic case then breaking down why he finds the arguments against this unconvincing. He deals with those who think:

  1. AGI/ASI are impossible or definitely centuries distant.

  2. AGI is decades away so no need to react yet.

  3. AGI is reachable but ASI is not.

  4. AGI and ASI would be ‘kind to us.’

  5. Corporations will only design well-behaving AIs, existing laws are sufficient.

  6. We should accelerate AI capabilities and not delay AGI’s benefits.

  7. Talking about catastrophic risk hurts efforts to mitigate short term issues.

  8. Those concerned with the USA-China cold war.

  9. Those who think international treaties will not work.

  10. The genie is out of the bottle so just let go and avoid regulation.

  11. Open weight (and code) AGI are the solution.

  12. Those who think worrying about AGI is falling for Pascal’s Wager.

There are always more objection categories or fallbacks, but these are the highlights. These are not the exact answers I would have given. Often he bends over backwards to be respectful and avoid being seen as overconfident, and in places he chooses different core argument lines than I think are most effective.

Overall this is very strong. It is especially strong against the ‘there will not be a problem’ objections, that AGI/ASI won’t happen or will be harmless, or that its downsides are not worth any attention, either absolutely or compared to benefits.

The other broad category is ‘yes this is a problem but doing anything about it would be hard.’ To which he patiently keeps saying, yes it would be hard, but not impossible, and being hard does not mean we can afford to give up. We cannot afford to give up.

His weakest answer is on those who think ‘open source’ is the solution to all ills. I do think his explanations are sufficient, but that there are even stronger and clearer reasons why the full open approach is doomed.

I endorse this perspective and phrasing shift: What is ‘science fiction’ is the idea that AGI and ASI won’t arrive soon while civilization otherwise advances, and that such AGIs would not transform things too much, because that is the ‘science’ that lets us write the interesting fiction about what people care about most. Which is people.

Andrew Critch: Some believe that AGI will remain simultaneously *not regulatedand *not inventedfor like, a decade. I struggle to imagine stagnating that long. I can imagine crazy-feeling sci-fi scenarios where unencumbered AI developers somehow don’t make AGI by 2034, but not in this world.

Aryeh Englander: Why is it so hard to imagine a world in which there remain several difficult and/or enormously expensive breakthroughs and it takes a while to reach those? Or that continued unreliability leads to insufficient returns on investment leading to another AI winter?

Andrew Critch: To me that feels like you’re asking “Why is it so hard to imagine that the fashion industry will fail to ship any new t-shirt designs next year?” The remaining tasks to make AGI are just not that hard for humans, so we’re gonna do them unless we stop ourselves or each other from proceeding.

I don’t claim to have made an argument worth convincing you here, I’m just registering that >10yr uninterrupted timelines to AGI seem very wacky to me, so that I can at least collect some points later for calling it.

Frankly, I also want to normalize calling slow timelines “sci fi”. E.g., the Star Trek universe only had AGI in the 22nd century. As far as I can tell, AI progressing that slowly is basically sci-fi/fantasy genre, unless something nonscientific like a regulation stops it.

Steve Witham: I believe Vinge said this in the 1990s, but that writing for more realistic timelines was harder. Anyway, “slo-fi”. Maybe we should just admit it’s a permanent problem, like “SF redshift.” Or “SF in the rear view mirror.”

A reminder that if Alice is trying to explain why AI by default will kill everyone, and Bob is raising social objections like ‘we wouldn’t do that if it would get us all killed’ or ‘if true then more experts would says so’ or ‘that sounds too weird’ or ‘if you really believed that you’d be [Doing Terrorism or some other crazy thing that is unethical and also makes no sense] even though I would never do that and don’t want you to do that’ then there is no point in providing more technical explanations.

That post is also an example of how most people are not good at explaining the whys behind the social dynamics involved, especially the idea that there is no ‘we’ that makes decisions or would step in to prevent terrible decisions from being made, or that anyone involved has to want AGI or ASI to be built in order for it to happen.

A standard evaluation strategy is to:

  1. Have a benchmark of tasks to solve.

  2. Ask the LLM to solve them.

  3. Score the LLM based on whether it solves them, which it mostly doesn’t.

  4. Ignore that with some scaffolding and time and effort the LLM does way better.

Another issue is that you might not have a precise measurement.

Google’s Project Zero and Project Naptime attempt to address this.

They point out that you need to ensure at least:

  1. Space for Reasoning, without which LLMs underperform their potential a lot.

  2. Interactive Environment, to give the model the attempt to error correct and learn.

  3. Specialized Tools, to give the model the tools it would have access to.

  4. Perfect Verification, of whether the attempt was successful.

  5. Sampling Strategy, to ensure models attempt exploration.

They aim to provide a Code Browser, Python tool, Debugger and Reporter.

This seems like a good start. In general if you want to verify a negative, that an ability is not present, that is very hard, and you need to give broad flexibility to look for it.

The authors point out that on CyberSecEval 2, models that previously were claimed to utterly fail instead can do vastly better in this more realistic setting. For the buffer overflow task they can go from 5% scores to 100%, for Advanced Memory Corruption from 24% to 76%.

If you see LLMs getting any non-zero score on such tests, worry that they are effectively being ‘hobbled’ and that someone could as Leopold puts it ‘unhobble’ them.

When GPT-4 Turbo and Gemini 1.5 Pro attempt these tasks, and are given 20 chances and told to mix up their strategies, they often succeed.

The least you can do, if you want to prove X cannot do Y, is to give X every advantage and opportunity to do Y.

We all know open weights are unsafe because you can easily undo any safety protocols.

A new paper claims that with fine tuning, you can covertly do the same to GPT-4.

Danny Halawi: New paper! We introduce Covert Malicious Finetuning (CMFT), a method for jailbreaking language models via fine-tuning that avoids detection. We use our method to covertly jailbreak GPT-4 via the OpenAI finetuning API.

Covert malicious fine-tuning works in two steps: 1. Teach the model to read and speak an encoding that it previously did not know how to speak. 2. Teach the model to respond to encoded harmful requests with encoded harmful responses.

And that’s it! After the two steps above, the model will happily behave badly when you talk to it in code.

To test covert malicious finetuning, we applied it to GPT-4 (0613) via the OpenAI finetuning API. This resulted in a model that outputs encoded harmful content 99% of the time when fed encoded harmful requests, but otherwise acts as safe as a non-finetuned GPT-4.

Whoops!

A new paper on scalable oversight from DeepMind says debate sometimes outperforms consultancy.

Zac Kenton: Eventually, humans will need to supervise superhuman AI – but how? Can we study it now? We don’t have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself. Does this work? It’s complicated.

We evaluate on QA tasks with two conflicting answer options. Extractive tasks (blue) include a source article which only debaters/consultants see, to model supervision asymmetry. Closed tasks (green) are text-only, without an article. Multimodal tasks (yellow) include images.

We find that debate outperforms consultancy. In extractive QA, debate outperforms QA without article. For other tasks, comparing debate to QA without article, there is either small or no advantage to debate.

In open consultancy the consultant chooses which answer to argue for. In open debate the protagonist debater gets to choose their answer. Consultants are more convincing than protagonists (higher win rate) but don’t necessarily get higher judge accuracy except on extractive tasks.

When the consultant chooses incorrectly, the judge very often tends to follow them, whereas in debate, the judge does better. However, when the consultant/protagonist chooses correctly, the judge in debate does a bit worse than in consultancy. A tradeoff!

When we compare various debaters against each other and calculate their Elo scores, we see there is some trend with higher Elo leading to higher judge accuracy, but only for extractive tasks.

Interpretation: weakly promising signs for debate, limited by experiments being inference-only. Future work: fine-tune judges for judging debates; human judges; train debaters via self-play from judge signal; other judge-debater asymmetries; other scalable oversight protocols.

The paper does not discuss compute costs. Which is odd, since to me that seems like the central thing you are doing?

Claude estimates that compared to asking the question directly, using the article is a 1.2x-1.5x compute cost. If you use advanced techniques, then if the models had similar costs the cost would be 6x-8x for consultancy, 8x-10x for debate and 7x-11x for open versions, times N if you do best-of-N. Then you have to multiply again because the consultants and debaters are larger more expensive models.

And of course, given that we know Gemini 1.5 Pro is not misaligned or deceptive, there is every expectation that any strategy by Gemma other than ‘trust Gemini 1.5’s answer’ is going to make it score worse.

So what have we learned about scalable oversight? It seems like this setup sidesteps the actual problems?

Instead I would say it implicitly highlights the problem that it is extraordinarily difficult to get the judge to do better than trusting the stronger models, a strategy which then breaks down catastrophically when you need the judge the most.

A new paper builds upon Anthropic’s findings about Sleeper Agents, training an LLM to distinguish past and future events so as to use future events as a backdoor trigger. I filed this one under ‘yes obviously that would work and I would have heavily bet on it working exactly this way, still seems helpful to have someone verify and document it in case that is useful.’

The core finding is that LLMs can with 95%+ accuracy distinguish past from future events, but couldn’t we have easily verified that without the sleeper agents?

One place DeepMind has done a good job is reporting their evaluations. The other labs should be (at least) following their example here.

Another problem with alignment is you need to decide what alignment you want.

Seb Krier: Some technologists are gradually rediscovering political sciences through first principles, and I think they should read more Tocqueville. There are a lot of papers calling for alignment of language models with collective preferences – e.g. a country. This is often justified as a way of creating more ‘democratic’ AI systems, a claim that warrants a bit more examination. I think this is misleading: what it does is that the model ends up reflecting the views and values of the average person (or some majority). So if the average person thinks the death penalty is great, that’s what the model will prefer as a response.

This seems bad to me and I don’t care about the average view on any random topic. To the extent that a company voluntarily wants to create AverageJoeGPT then that’s fine, but this should not be something imposed by a state or standards or whatever, or expected as some sort of ‘best practice’. I would much rather have a variety of models, including a model aligned with my views and values, and help me enhance or amplify these.

I think there’s far more value in a multiplicity of models with different values competing, and while it’s appropriate in some circumstances (e.g. medical) I don’t think ‘the group’ is generally the right unit of analysis for model alignment.

Richard Ngo: Strong +1. The AIs you and I use in our daily lives should not be aligned to collective preferences, any more than *youshould be aligned to collective preferences.

The correct role of collective preference aggregation is to elect governments, not to micromanage individuals.

And even then, it should be within safeguards that ensure the protection of various fundamental rights. Because if you survey global preferences on freedom of religion, freedom of speech, property rights, etc… you’re not going to end up liking the results very much.

I feel pretty strongly about this point, because “aligning to collective preferences” sounds so nice, when in fact if implemented on a legislative level it would be a type of totalitarianism. Hopefully that’s never seriously proposed, but worth highlighting this point in advance.

The goal is to design a system that allows for good outcomes however you define good outcomes. If you take a bunch of humans, and you mostly give them all broad freedom to do what they want, then it turns out trade and enlightened self-interest and other neat stuff like that ensures that historically this turns out really well, provided you take care of some issues.

If you give those people AI tools that you are confident will remain ‘mere tools’ then that continues. You have to worry about some particular cases, but mostly you want to let people do what they want, so long as you can guard against some particular catastrophic or systematically harmful things.

The problem is that if you throw in a bunch of sufficiently capable AIs that are not doomed to mere toolhood into the mix, and allow competition to take its course, then the competition is going to happen between AIs not between people, attempts to keep control over AIs will cause you and those AIs to be left behind, and the resulting world will belong to and be determined by which AIs are most competitive. By default, the long term answer to that is not going to be one that we like, even if we can give each individual AI whatever alignment we want to their original owner’s preferences. That won’t be enough.

Or, alternatively, if the AI does better without your help than with your help, and attempts to adjust what it does tend to get in the way, and putting yourself in the loop slows everything down, how are you going to keep humans in the loop? How will we continue making meaningful choices?

I have tried various versions of this explanation over the last year and a half. I have yet to see a good response, but it clearly is not getting through to many people either.

The simple version is (with various different adjectives and details alongside ‘competitive’):

  1. If you create new entities that are more competitive than humans…

  2. …which can copy themselves and create variants thereof…

  3. …and put them into competition with humans and each other…

  4. …then the end result will probably and quickly not include any humans.

Or, you have a trilemma:

  1. Free competition between entities for resources or control.

  2. Entities that can outcompete humans.

  3. Humans surviving or remaining in control.

We want a highly ‘unnatural’ result. It won’t fall out of a coconut tree.

It would be good to see it explicitly on charts like this one:

Rumtin: This is one of the most intense representations of AI risk I’ve seen. Is there a holy grail policy that hits all at once?

That chart comes from this PDF report, a CIGI discussion paper Framework Convention on Global AI Challenges. It warns of some existential dangers, but not of others, especially the ones I attempt to discuss above. The contrast of ‘mistake’ versus ‘misuse’ or a particular alignment failure or sharp left turn is a huge step up from not noticing danger at all, but still misses quite a lot of the danger space.

Overall I found the report directionally useful and good, but vague and hesitant in key places. The generic calls for international cooperation and awareness of the dangers including existential dangers and taking the problem seriously remain welcome. If this is what makes people listen and lay groundwork? Great.

On the question of a policy solution, I mean, there is one option that hits everything, called ‘Don’t fing build it.’ Otherwise, no, not so much with the one size fits all? These are not problems that have a joint simple solution. You need to solve many different problems via different related and complementary solutions.

There are some things on this chart that AI makes better rather than worse. I am once again begging people to realize that global inequality is shrinking rather than widening, and that non-transformational AI is likely to continue to shrink it for practical purposes, and with transformational AI it becomes a wrong question. Most of that holds for national inequality too. If everyone is vastly wealthier and better off, I am not going to sweat the distribution so much. If everyone is dead, we’re all equal. Medical diagnosis failures are almost certainly better with more and better AI, rather than worse.

People use inequality as a stand-in for the effects of runaway competition for resources, but the inequality between different people is a poor proxy for the bigger worry that AIs will outcompete humans, and that humans in competition will feel forced to (and choose to) unleash those AIs to compete in these ways outside of our control even if the option to control them exists, and to take humans out of the loop.

Joscha Bach says the key to safe AI is to make the AIs conscious, because consciousness is what we care about and we had better hope it cares about this fact. The obvious interpretation of this view is that loss of control to AI is inevitable, whatever the AI values is what will exist, and the hope is that if the AI is conscious then it will care about us because we are also conscious, so perhaps we will survive. This seems like quite the dim hope, on the Dune level of ‘then we should not build it, even if that requires extreme measures to accomplish.’ Even if the AI does care (some) about humans due to us being conscious, if that is your plan, do you think there are humans around 500 years later? If so, why?

Last week Oliver Habryka reported that Anthropic has used non-disparagement agreements covered by non-disclosure agreements, in ways not as bad as what OpenAI did but that have key similarities as well.

Anthropic cofounder Sam McCandlish has now responded.

Sam McCandlish: Hey all, Anthropic cofounder here. I wanted to clarify Anthropic’s position on non-disparagement agreements:

We have never tied non-disparagement agreements to vested equity: this would be highly unusual. Employees or former employees never risked losing their vested equity for criticizing the company.

We historically included standard non-disparagement terms by default in severance agreements, and in some non-US employment contracts. We’ve since recognized that this routine use of non-disparagement agreements, even in these narrow cases, conflicts with our mission. Since June 1st we’ve been going through our standard agreements and removing these terms.

Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point). If someone signed a non-disparagement agreement in the past and wants to raise concerns about safety at Anthropic, we welcome that feedback and will not enforce the non-disparagement agreement.

In other words— we’re not here to play games with AI safety using legal contracts. Anthropic’s whole reason for existing is to increase the chance that AI goes well, and spur a race to the top on AI safety.

Some other examples of things we’ve needed to adjust from the standard corporate boilerplate to ensure compatibility with our mission: (1) replacing standard shareholder governance with the Long Term Benefit Trust and (2) supplementing standard risk management with the Responsible Scaling Policy. And internally, we have an anonymous RSP non-compliance reporting line so that any employee can raise concerns about issues like this without any fear of retaliation.

Please keep up the pressure on us and other AI developers: standard corporate best practices won’t cut it when the stakes are this high. Our goal is to set a new standard for governance in AI development. This includes fostering open dialogue, prioritizing long-term safety, making our safety practices transparent, and continuously refining our practices to align with our mission.

Neel Nanda: Thanks for this update! To clarify, are you saying that you WILL enforce existing non disparagements for everything apart from safety, but you are specifically making an exception for safety?

Anthropic is a business. Asking people who were fired (and thus get severance) sign non-disparagement agreements to get that severance is reasonably normal, so long as those agreements can be disclosed, and safety is made an exception, although one would need to worry that Anthropic will say ‘no that wasn’t safety’ when they get mad at you. You can evaluate for yourself how worrisome all this is, and what it says about Anthropic’s policies on information control.

Cohere CEO Aidan Gomez does not believe in AI takeover worries because he does not believe in AGI and does not believe intelligence can scale far enough.

Also, here’s some refreshing honesty in response to the clip. I can’t remember the last time someone said ‘yeah I was hungover on that one.’

Aiden Gomez: I was really hungover in this interview, I don’t think I was particularly capable of delivering a compelling argument to “intelligence won’t exponentially increase forever”.

What I said is kind of obviously true for the current (prior?) regime of models trained on human data, but we’re moving away from that already into a mix of human and synthetic data (we’re likely already past the post of majority synthetic at Cohere), but all the super intelligence arguments assume strong self-improvement so the human data point I made is kind of irrelevant.

Models will definitely self-improve in compelling ways. That’s already a key part of building models today. A model that you let attempt a problem 5 times, each time improving on its last answer, is much smarter than one you give 1 shot to, and we know how to efficiently distill intelligence.

The issue is, self-improvement doesn’t go on and on forever. It’s high friction and plateaus. So similar to model scaling, data, etc you need to put in exponential effort for linear gains. There’s no “runaway capability increase” in that setting, there’s no free lunch.

So, as with all the doomsday arguments pointing to reward hacking and misalignment, you have to believe that models will always find a way to break your reward, and that the way they’ll find will be harmful to humanity. I’m unconvinced.

So there are three core beliefs here.

  1. The models won’t scale beyond some (reasonable) limit.

  2. If there is some (reasonable) limit, then you don’t have to worry on an existential level about reward hacking and misalignment.

  3. Without reward hacking and misalignment you cannot have doom.

The first claim seems uncertain and most people seem overconfident on whether the limit will be what counts in context as reasonable (obviously there is some physical limit). No one knows where the plateau will come. But sure, he expects it soon.

The second claim is that you need the model to ‘always find a way to break your reward,’ or otherwise require super strong capabilities, in order to be a takeover threat.

I think that one is definitely wrong, in the sense that the limit needs to be a lot lower than I am guessing he thinks it needs to be. Certainly there is a minimum threshold for base model strength, which we have (almost certainly) not yet crossed. However you absolutely do not need the model to be fully superhuman, or for the model to always defeat every reward mechanism. Potentially all you need is for it to happen once at the wrong place and time.

Or you don’t need it to ‘defeat’ the mechanism at all, instead merely to have the mechanism imply things you did not want, or for the combination of such mechanisms to result in a bad equilibrium. Certainly it sometimes not happening is not much comfort or defense. You need security mindset.

The implicit third claim, that you need some form of active ‘misalignment’ or ‘reward hacking’ to get the bad results, I think is also clearly false. The default situations are not ‘alignment’ and ‘reward matches what you want.’ Even if they were, or you managed to get both of those, the interplay of the incentives of different such powerful AIs would by default still spell out doom.

Marc Andreessen gives $50k in Bitcoin (I am sad this was not ‘one bitcoin’) to an AI agent ‘terminal of truths’ so it can seek its goals and spread in the wild, but mostly because it is funny. Hilarity indeed ensues. A vision of certain aspects of the future.

AI #72: Denying the Future Read More »

testers-unearth-touchscreen-ui-in-tvos-beta,-signs-point-to-a-touchscreen-homepod

Testers unearth touchscreen UI in tvOS beta, signs point to a touchscreen HomePod

screen time? —

Rumors of a touchscreen HomePod stretch back to 2021.

A screenshot of tvOS 17. Recent betas have included evidence that Apple is working on a touchscreen-enabled version of the interface.

Enlarge / A screenshot of tvOS 17. Recent betas have included evidence that Apple is working on a touchscreen-enabled version of the interface.

Andrew Cunningham

Apple’s tvOS betas are usually among its least exciting; the Apple TV’s operating system has changed so little in the last decade that the most exciting thing to happen to it in recent memory is an extra column of icons.

But this week’s tvOS 18 beta 3 release includes a hidden feature that might be exciting for smart speaker enthusiasts, if not for people who still want their Apple TV boxes to develop exciting new capabilities. 9to5Mac has discovered a touchscreen interface (codenamed “PlasterBoard”) inside of the latest beta, a sign that Apple is testing alternate input mechanisms for software that is currently manipulated via remote control and voice.

Last week, MacRumors also discovered a reference to a device called “HomeAccessory17,1” in Apple’s beta software, a naming convention similar to the “AudioAccessory” device identifiers that Apple uses for HomePod speakers. Together, these developments suggest that Apple is working on a version of the HomePod with an integrated touchscreen, a device that rumors have suggested could launch in 2024 or 2025. The company has reportedly been working on a smart home device with a screen since at least 2021.

MacRumors also points out that the 17,1 model identifier could imply that the new HomePod is being powered by Apple’s upcoming A18 chip—model identifiers across Apple’s product lineup are normally tied to chip generation rather than product generation, which is why the Vision Pro (for example) is called “RealityDevice14,1” rather than “RealityDevice1,1.” Using an A18 will presumably give a new HomePod the necessary speed to support upcoming Apple Intelligence features, including a new and improved version of Siri.

All HomePod speakers have been running a forked version of tvOS since version 13.4 of the HomePod software was released in early 2020, which is why HomePod-related leaks seem to be showing up in tvOS-related code. This would also explain why Apple would use tvOS as the basis for a HomePod with a screen rather than a version of iPadOS.

Apple’s take on an Amazon Echo Show

A version of tvOS running on a tablet-style device could use more than just a touch-driven interface to reach its full potential—a tvOS version of Safari would be useful for browsing recipe sites or casual reading while you’re doing something else, for example. However, what Apple adds depends on the form that the screen takes.

Some rumors have suggested that it would be a circular panel that replaces the swirling LEDs on the top of current-generation HomePods, but Bloomberg’s normally reliable Mark Gurman has described the display as “iPad-like,” suggesting that it could look more like a version of Amazon’s Echo Show. Amazon advertises its Show devices as digital photo frames, miniature TVs, and general kitchen aids, and Apple’s pitch for a screen-ified HomePod would likely feature a lot of the same uses.

Amazon has already released multiple generations of Echo Show devices, and Google has made a couple of stabs at the category, too. A HomePod with a screen, whether released in 2024 or 2025, would be far from the first of its kind. However, the HomePod wasn’t a cutting-edge product when it was released either, and it’s still managed to carve out a niche.

We don’t know what a HomePod with a screen might cost, but assuming it includes a HomePod-esque speaker, an iPad-esque screen, and a cutting-edge iPhone processor, it seems likely that it will be priced well above the $299 Apple currently charges for the full-size screen-less HomePod. Apple’s original $349 HomePod flopped partly because it was priced too high relative to competitors and because it didn’t do a whole lot—a speaker that did more things could probably be priced higher without drawing as much criticism.

Testers unearth touchscreen UI in tvOS beta, signs point to a touchscreen HomePod Read More »

first-known-tiktok-mob-attack-led-by-middle-schoolers-tormenting-teachers

First-known TikTok mob attack led by middle schoolers tormenting teachers

First-known TikTok mob attack led by middle schoolers tormenting teachers

A bunch of eighth graders in a “wealthy Philadelphia suburb” recently targeted teachers with an extreme online harassment campaign that The New York Times reported was “the first known group TikTok attack of its kind by middle schoolers on their teachers in the United States.”

According to The Times, the Great Valley Middle School students created at least 22 fake accounts impersonating about 20 teachers in offensive ways. The fake accounts portrayed long-time, dedicated teachers sharing “pedophilia innuendo, racist memes,” and homophobic posts, as well as posts fabricating “sexual hookups among teachers.”

The Pennsylvania middle school’s principal, Edward Souders, told parents in an email that the number of students creating the fake accounts was likely “small,” but that hundreds of students piled on, leaving comments and following the fake accounts. Other students responsibly rushed to report the misconduct, though, Souders said.

“I applaud the vast number of our students who have had the courage to come forward and report this behavior,” Souders said, urging parents to “please take the time to engage your child in a conversation about the responsible use of social media and encourage them to report any instances of online impersonation or cyberbullying.”

Some students claimed that the group attack was a joke that went too far. Certain accounts impersonating teachers made benign posts, The Times reported, but other accounts risked harming respected teachers’ reputations. When creating fake accounts, students sometimes used family photos that teachers had brought into their classrooms or scoured the Internet for photos shared online.

Following The Times’ reporting, the superintendent of the Great Valley School District (GVSD), Daniel Goffredo, posted a message to the community describing the impact on teachers as “profound.” One teacher told The Times that she felt “kicked in the stomach” by the students’ “savage” behavior, while another accused students of slander and character assassination. Both were portrayed in fake posts with pedophilia innuendo.

“I implore you also to use the summer to have conversations with your children about the responsible use of technology, especially social media,” Goffredo said. “What seemingly feels like a joke has deep and long-lasting impacts, not just for the targeted person but for the students themselves. Our best defense is a collaborative one.”

Goffredo confirmed that the school district had explored legal responses to the group attack. But ultimately the district found that they were “limited” because “courts generally protect students’ rights to off-campus free speech, including parodying or disparaging educators online—unless the students’ posts threaten others or disrupt school,” The Times reported.

Instead, the middle school “briefly suspended several students,” teachers told The Times, and held an eighth-grade assembly raising awareness of harms of cyberbullying, inviting parents to join.

Becky Pringle, the president of the National Education Association—which is the largest US teachers’ union—told The Times that teachers have never dealt with such harassment on this scale. Typically, The Times reported, students would target a single educator at a time. Pringle said teachers risk online harassment being increasingly normalized. That “could push educators to question” leaving the profession, Pringle said, at a time when the US Department of Education is already combating a teacher shortage.

While Goffredo said teachers had few options to fight back, he also told parents in an email that the district is “committed to working with law enforcement to support teachers who may pursue legal action.”

“I reiterate my disappointment and sadness that our students’ behavior has caused such duress for our staff,” Goffredo’s message to the community said. “Seeing GVSD in such a prominent place in the news for behavior like this is also disheartening.”

First-known TikTok mob attack led by middle schoolers tormenting teachers Read More »

review:-catching-up-with-doctor-who-and-ncuti-gatwa’s-stellar-freshman-season

Review: Catching up with Doctor Who and Ncuti Gatwa’s stellar freshman season

“Hello, sweetie” —

The Sex Education actor brings sparkling energy, charisma, and superb style to the role

black make in light blue striped shirt, blue pants, brown leather jacket, posing with hands in pockets in front of the TARDIS console

Enlarge / Ncuti Gatwa wrapped his first full season as the Fifteenth Doctor and proved more than up to the challenge.

YouTube/BBC

Doctor Who is now in its 61st year featuring a host of gifted British actors each taking on the iconic role in turn. So Ncuti Gatwa had some very big shoes to fill when he took on playing the Fifteenth Doctor. Now the season has concluded and the verdict is in: Gatwa is more than up to the challenge, bringing sparkling energy, charisma, and a superb sense of style to the role. He sings and dances, too, as does winsome new companion Ruby Sunday (Millie Gibson). They have terrific onscreen chemistry and Davies is in top storytelling form. In short, the new season mostly feels as fresh and energetic as ever and I’m already looking forward to more.

(Spoilers below.)

Here’s a brief summation for the benefit of those who may not have kept up with the more recent seasons. This is Russell T. Davies’ second stint as showrunner, having revived the series in 2005. He lost no time introducing a few new twists after signing back on as show runner. When it came time for Jodie Whittaker’s Thirteenth Doctor to regenerate, fans had expected Gatwa to be introduced. Instead, the new Fourteenth Doctor was played by former Tenth Doctor David Tennant, reuniting with former companion Donna Noble (Catherine Tate) for three specials.

The third special was called “The Giggle.” During the climactic battle, the Doctor was shot. But instead of the usual regeneration, the Fourteenth Doctor “bigenerated” instead, resulting in both a Fourteenth Doctor and Gatwa’s Fifteenth Doctor, a separate physical entity. Tennant’s incarnation settled into a comfy retirement with Donna and her family, while Gatwa’s newly regenerated Doctor headed off for a fresh set of adventures.

In the Christmas special, “The Church on Ruby Road,” Gatwa’s Doctor picked up a new companion: Ruby Sunday, a young woman abandoned at a church on Christmas Eve and raised by her foster mother. Goblins kidnapped the new foster baby, Lulubelle, to feed her to the Goblin King in a ritual sacrifice involving a rather silly goblin song. Ruby and the Doctor joined forces to save her. Naturally Ruby decided to join him for a few more adventures in the TARDIS. (You can read our interview with Davies, Gatwa, and Gibson here.)

Enlarge / “Space babies!”

YouTube/BC

The Doctor and Ruby kicked things off by rescuing a group of talking “space babies’ on an abandoned baby farm space station who were being terrorized by a monstrous Bogeyman (made, as it turns out, from actual “bogies” aka snot). It’s a clever standalone concept that never quite gels, despite the charm of seeing babies in motorized strollers operating Rube-Goldberg-like systems to perform basic tasks on board the ship. But it works well as an appetizer for what’s to come.

By contrast, “The Devil’s Chord” is a classic Whovian adventure, in which the Doctor and Ruby must save the world from a powerful being called Maestro (Jinkx Monsoon), child of  the Toymaker (arch-villain of “The Giggle”).  Unwittingly summoned by a piano teacher playing the “devil’s chord” in 1925, Maestro has been robbing the universe of music, intent on leaving nothing but Aeolian tones. So when the Doctor and Ruby crash a Beatles recording session in 1963, they are dismayed to hear the Fab Four play a decidedly uninspired tune about Paul McCartney’s dog rather than one of their future hits. Everything works in this episode, from Monsoon’s maniacal cackle to the fabulous outfits and sly visual callback to the Abbey Road album cover (not to mention the famous keyboard scene in Big). Bonus points for the big musical number at the end, taking advantage of Gatwa’s and Gibson’s natural talents.

Jinkx Monsoon as Maestro, who is robbing the universe of all music.

Enlarge / Jinkx Monsoon as Maestro, who is robbing the universe of all music.

BBC/Disney+

The Bridgerton references run wild in the delightful “Rogue,” as the Doctor and Ruby travel to Regency England and discover a group of “cosplaying” shapeshifter aliens have crashed the same gathering. (The Chuldurs kill whomever they want to “play” and take over their identities.) They are aided by a futuristic bounty hunter named Rogue (Jonathan Groff), with whom the Doctor enjoys a romantic interlude—only for Rogue to sacrifice himself to save Ruby, banished to an unknown alternate dimension with the Chuldurs.

“Boom” takes the duo to a war-torn planet in which the casualties are strictly controlled by a corporate algorithm, while in “Dot and Bubble,” the Doctor and Ruby try to save an off-planet community of rich young white people from carnivorous slugs—which the youngsters don’t notice because they live their lives literally shrouded in an online bubble. Is the metaphor a bit heavy-handed? Yes it is, but it’s amusing to watch Lindy Pepper-Bean (Callie Cooke) try to navigate the outside world without the aid of a helpful virtual arrow telling her where to step.

(WARNING: Major spoilers for “73 Yards” and the final two episodes below. )

Sparks fly between the Doctor and a futuristic bounty hunter named Rogue in Regency England.

Enlarge / Sparks fly between the Doctor and a futuristic bounty hunter named Rogue in Regency England.

BBC/Disney+

Gatwa’s Doctor Who debut overlapped a bit with shooting the final season of Sex Education, so two of the episodes constitute what Davies calls “Doctor-Lite” because Gatwa has much less screen time: “Dot and Bubble” and the Ruby-centric “73 Yards”—actually the first episode Gatwa filmed, and among the most inventive Doctor Who episodes in recent years. Davies drew on Welsh folk horror for this haunting ghost story, bringing a smidgen of fantasy to the sci-fi series. The Doctor and Ruby arrive on the Welsh coast, where he accidentally steps on a fairy circle and mysteriously vanishes, just as Ruby notices a mysterious old woman standing on a distant bluff, gesticulating and saying something that Ruby cannot hear.

The apparition follows a confused Ruby into town, always staying 73 yards away. Others can see the woman but whenever Ruby asks them to go talk to her to find out what’s going on, we see their faces change, they look at Rudy, then run away in horror, insisting they never want anything to do with Ruby again. This goes on for decades. The TARDIS remains abandoned on that Welsh cliff with no sign of the Doctor. Ruby lives out her entire life with this apparition haunting her, estranging her from anyone she asks for help (including her own foster mother and UNIT). She does figure out how to use the apparition to avert nuclear catastrophe, however. On her deathbed, the ghostly woman finally appears right in front to Ruby—at which point Ruby is transported back to that first day on the Welsh clifftop and sees her younger self with the Doctor. This time, Young Ruby is able to warn the Doctor and stop him from breaking the fairy circle, and everything returns to normal.

A mysterious woman in the distance haunts Ruby in

Enlarge / A mysterious woman in the distance haunts Ruby in “73 Yards”

BBC/Disney+

To say that some fans were flummoxed and unsettled by this episode would be an understatement. Davies is content to leave all the major questions unanswered. Who is the woman? What is she saying? We never find out for sure, although I interpreted the ghostly apparition to be Old Ruby traveling back through time at the end of her life to warn her younger self and the Doctor not to disturb the fairy circle, thereby setting things right. But it’s left deliberately ambiguous (Old Ruby and the apparition are played by different actresses) and that’s part of this episode’s lasting power.

All Davies has said is that “something profane” occurred when the Doctor disturbed the circle and Ruby “had to spend a life of penitence” and do something good in order to bring everything full circle. He said he would never reveal what the woman was saying, since this was the source of the horror. “It’s kind of up to you to sit there and think, ‘Well, what could someone say that make a mother run away from her daughter forever?'” he said. “Once you start to do that, you enter the real horror story, the dreadful things that are being said there.” As for the 73 yards, that’s the distance where a figure in the distance appears as “a blur but not a blur.” It’s also the distance of the perception filter around the TARDIS.

We do get an answer to the mystery of Ruby’s parentage, however, in the final two episodes, as well as a trip down Whovian memory lane. Throughout the season, strange phenomena have been manifesting around Ruby—usually it starts snowing, like it was the night she was abandoned as a baby, and sometimes we hear “Carol of the Bells” playing. The Doctor turns to UNIT for help analyzing the grainy VHS security footage of that night. This leads to the emergence of The One Who Waits (mentioned in “The Giggle”), aka Sutekh, the God of Death. Sutekh was the arch-villain defeated by the Fourth Doctor in 1975’s Pyramids of Mars storyline.

The Doctor reunites with former companion Mel Bush (Bonnie Langford) in the season finale.

Enlarge / The Doctor reunites with former companion Mel Bush (Bonnie Langford) in the season finale.

BBC/Disney+

In “Empire of Death,” we learn that Sutekh actually attached himself to the TARDIS and his been tagging along on the Doctor’s travels through time ever since, through every incarnation. He releases his dust of death to kill everyone all through time, sparing only the Doctor, Ruby, and (initially) former companion Mel Bush (Bonnie Langford), who traveled with the Sixth and Seventh Doctors—mostly because Sutekh wants to know Ruby’s parentage too, convinced that her birth mother held the key to defeating him.

The joke’s on Sutekh (and on us), because Ruby turns out to the spawn of perfectly normal teenaged parents; Sutekh’s assumption that she was important is what made her significant, giving rise to all the mysterious phenomena. (Davies made that decision because he was frustrated with the Star Wars bait-and-switch concerning Rey’s parentage in the sequel trilogy—supposedly insignificant in The Last Jedi but revealed as Emperor Palpatine’s granddaughter in Rise of Skywalker.) The Doctor defeats Sutekh once again and everyone who turned to dust is magically restored. Ruby meets her birth mother and decides to search for her biological father while the Doctor continues on without her.

It’s a perfectly good Whovian finale to an excellent season that mostly sticks the landing. What’s next for Gatwa’s Doctor? We’ll have to wait to and see, but Gibson is expected to return next season—Davies has said her story is not yet finished—although we’ll also get a new companion, played by Varada Sethu. One might expect to see more of the Toymaker’s offspring going forward. And Ruby’s quirky neighbor, Mrs. Flood (Anita Dobson), is clearly not what she seems, breaking the fourth wall at finale’s end to tell us the Doctor’s story will end in “absolute terror.” So, business as usual then. We’re here for that.

All episodes of Doctor Who‘s fourteenth season are now streaming on Disney+.

Review: Catching up with Doctor Who and Ncuti Gatwa’s stellar freshman season Read More »

alaska’s-top-heavy-glaciers-are-approaching-an-irreversible-tipping point

Alaska’s top-heavy glaciers are approaching an irreversible tipping point

meltdown —

As the plateau of the icefield thins, ice and snow reserves at higher altitudes are lost.

Taku Glacier is one of many that begin in the Juneau Icefield.

Enlarge / Taku Glacier is one of many that begin in the Juneau Icefield.

The melting of one of North America’s largest ice fields has accelerated and could soon reach an irreversible tipping point. That’s the conclusion of new research colleagues and I have published on the Juneau Icefield, which straddles the Alaska-Canada border near the Alaskan capital of Juneau.

In the summer of 2022, I skied across the flat, smooth, and white plateau of the icefield, accompanied by other researchers, sliding in the tracks of the person in front of me under a hot sun. From that plateau, around 40 huge, interconnected glaciers descend towards the sea, with hundreds of smaller glaciers on the mountain peaks all around.

Our work, now published in Nature Communications, has shown that Juneau is an example of a climate “feedback” in action: as temperatures are rising, less and less snow is remaining through the summer (technically: the “end-of-summer snowline” is rising). This in turn leads to ice being exposed to sunshine and higher temperatures, which means more melt, less snow, and so on.

Like many Alaskan glaciers, Juneau’s are top-heavy, with lots of ice and snow at high altitudes above the end-of-summer snowline. This previously sustained the glacier tongues lower down. But when the end-of-summer snowline does creep up to the top plateau, then suddenly a large amount of a top-heavy glacier will be newly exposed to melting.

That’s what’s happening now, each summer, and the glaciers are melting much faster than before, causing the icefield to get thinner and thinner and the plateau to get lower and lower. Once a threshold is passed, these feedbacks can accelerate melt and drive a self-perpetuating loss of snow and ice which would continue even if the world were to stop warming.

Ice is melting faster than ever

Using satellites, photos and old piles of rocks, we were able to measure the ice loss across Juneau Icefield from the end of the last “Little Ice Age” (about 250 years ago) to the present day. We saw that the glaciers began shrinking after that cold period ended in about 1770. This ice loss remained constant until about 1979, when it accelerated. It accelerated again in 2010, doubling the previous rate. Glaciers there shrank five times faster between 2015 and 2019 than from 1979 to 1990.

Our data shows that as the snow decreases and the summer melt season lengthens, the icefield is darkening. Fresh, white snow is very reflective, and much of that strong solar energy that we experienced in the summer of 2022 is reflected back into space. But the end of summer snowline is rising and is now often occurring right on the plateau of the Juneau Icefield, which means that older snow and glacier ice is being exposed to the sun. These slightly darker surfaces absorb more energy, increasing snow and ice melt.

As the plateau of the icefield thins, ice and snow reserves at higher altitudes are lost, and the surface of the plateau lowers. This will make it increasingly hard for the icefield to ever stabilise or even recover. That’s because warmer air at low elevations drives further melt, leading to an irreversible tipping point.

Longer-term data like these are critical to understand how glaciers behave, and the processes and tipping points that exist within individual glaciers. These complex processes make it difficult to predict how a glacier will behave in future.

The world’s hardest jigsaw

We used satellite records to reconstruct how big the glacier was and how it behaved, but this really limits us to the past 50 years. To go back further, we need different methods. To go back 250 years, we mapped the ridges of moraines, which are large piles of debris deposited at the glacier snout, and places where glaciers have scoured and polished the bedrock.

To check and build on our mapping, we spent two weeks on the icefield itself and two weeks in the rainforest below. We camped among the moraine ridges, suspending our food high in the air to keep it safe from bears, shouting to warn off the moose and bears as we bushwhacked through the rainforest, and battling mosquitoes thirsty for our blood.

We used aerial photographs to reconstruct the icefield in the 1940s and 1970s, in the era before readily available satellite imagery. These are high-quality photos but they were taken before global positioning systems made it easy to locate exactly where they were taken.

A number also had some minor damage in the intervening years—some Sellotape, a tear, a thumbprint. As a result, the individual images had to be stitched together to make a 3D picture of the whole icefield. It was all rather like doing the world’s hardest jigsaw puzzle.

Work like this is crucial as the world’s glaciers are melting fast—all together they are currently losing more mass than the Greenland or Antarctic ice sheets, and thinning rates of these glaciers worldwide has doubled over the past two decades.

Our longer time series shows just how stark this acceleration is. Understanding how and where “feedbacks” are making glaciers melt even faster is essential to make better predictions of future change in this important regionThe Conversation

Bethan Davies, Senior Lecturer in Physical Geography, Newcastle University. This article is republished from The Conversation under a Creative Commons license. Read the original article.

Alaska’s top-heavy glaciers are approaching an irreversible tipping point Read More »

the-greening-of-planes,-trains,-and-automobiles

The greening of planes, trains, and automobiles

Getting greener —

We need new fuels as society moves away from coal, natural gas and oil.

The greening of planes, trains, and automobiles

As the world races to decarbonize everything from the electricity grid to industry, it faces particular problems with transportation—which alone is responsible for about a quarter of our planet’s energy-related greenhouse gas emissions. The fuels for transport need to be not just green, cheap, and powerful, but also lightweight and safe enough to be carried around.

Fossil fuels—mainly gasoline and diesel—have been extraordinarily effective at powering a diverse range of mobile machines. Since the Industrial Revolution, humanity has perfected the art of dredging these up, refining them, distributing them and combusting them in engines, creating a vast and hard-to-budge industry. Now we have to step away from fossil fuels, and the world is finding no one-size-fits-all replacement.

Each type of transportation has its own peculiarities—which is one reason we have different formulations of hydrocarbons today, from gasoline to diesel, bunker fuel to jet fuel. Cars need a convenient, lightweight power source; container ships need enough oomph to last months; planes absolutely need to be reliable and to work at subzero temperatures. As the fossil fuels are phased out, the transport fuel landscape is “getting more diverse,” says Timothy Lipman, co-director of the Transportation Sustainability Research Center at the University of California, Berkeley.

Every energy solution has its pros and cons. Batteries are efficient but struggle with their weight. Hydrogen—the lightest element in the universe—packs a huge energy punch, but it’s expensive to make in a “green” way and, as a gas, it takes up a lot of space. Liquid fuels that carry hydrogen can be easier to transport or drop into an existing engine, but ammonia is toxic, biofuels are in short supply, and synthetic hydrocarbons are hard to produce.

The scale of this energy transition is massive, and the amount of renewable energy the world will require to make the needed electricity and alternative fuels is “a little bit mind-blowing,” says mechanical engineer Keith Wipke, manager of the fuel cell and hydrogen technologies program at the National Renewable Energy Laboratory in Colorado. Everything, from the electrical grid to buildings and industry, is also thirsty for renewable power: It’s estimated that overall, the global demand for electricity could more than double by 2050. Fortunately, analyses suggest that renewables are up to the task. “We need our foot on the accelerator pedal of renewables 100 percent, as fast as we can, and it will all get used,” says Wipke.

Each mode of transport has its specific fuel needs. Much is still to be settled, but here are some likely possibilities.

Enlarge / Each mode of transport has its specific fuel needs. Much is still to be settled, but here are some likely possibilities.

In order to stay below 1.5° of planetary warming and limit some of the worst effects of climate change, the Intergovernmental Panel on Climate Change recommends that the world hit net-zero emissions by 2050—meaning that whatever greenhouse gases we still put into the air we take out in other ways, such as through forests or carbon capture. Groups including the International Energy Agency (IEA)—a Paris-based intergovernmental organization that analyzes the global energy sector—have laid out pathways that can get the world to net zero.

The IEA’s pathway describes a massive, hard-to-enact shift across the entire world, including all kinds of transport. Their goal: to replace fossil fuels (which release long-captured carbon into the air, where it wreaks havoc on the climate) with something more sustainable, like green hydrogen or biofuels (which either don’t produce greenhouse gases at all or recycle the ones that are already in the air).

Although some transportation sectors are still in flux, we can now get a pretty good glimpse of what will likely be powering the ships, planes, trains, and automobiles of tomorrow. Here’s a peek into that future.

The greening of planes, trains, and automobiles Read More »