Author name: Mike M.

google,-microsoft,-and-perplexity-promote-scientific-racism-in-ai-search-results

Google, Microsoft, and Perplexity promote scientific racism in AI search results


AI-powered search engines are surfacing deeply racist, debunked research.

Literal Nazis

LOS ANGELES, CA – APRIL 17: Members of the National Socialist Movement (NSM) salute during a rally on near City Hall on April 17, 2010 in Los Angeles, California. Credit: David McNew via Getty

AI-infused search engines from Google, Microsoft, and Perplexity have been surfacing deeply racist and widely debunked research promoting race science and the idea that white people are genetically superior to nonwhite people.

Patrik Hermansson, a researcher with UK-based anti-racism group Hope Not Hate, was in the middle of a monthslong investigation into the resurgent race science movement when he needed to find out more information about a debunked dataset that claims IQ scores can be used to prove the superiority of the white race.

He was investigating the Human Diversity Foundation, a race science company funded by Andrew Conru, the US tech billionaire who founded Adult Friend Finder. The group, founded in 2022, was the successor to the Pioneer Fund, a group founded by US Nazi sympathizers in 1937 with the aim of promoting “race betterment” and “race realism.”

Wired logo

Hermansson logged in to Google and began looking up results for the IQs of different nations. When he typed in “Pakistan IQ,” rather than getting a typical list of links, Hermansson was presented with Google’s AI-powered Overviews tool, which, confusingly to him, was on by default. It gave him a definitive answer of 80.

When he typed in “Sierra Leone IQ,” Google’s AI tool was even more specific: 45.07. The result for “Kenya IQ” was equally exact: 75.2.

Hermansson immediately recognized the numbers being fed back to him. They were being taken directly from the very study he was trying to debunk, published by one of the leaders of the movement that he was working to expose.

The results Google was serving up came from a dataset published by Richard Lynn, a University of Ulster professor who died in 2023 and was president of the Pioneer Fund for two decades.

“His influence was massive. He was the superstar and the guiding light of that movement up until his death. Almost to the very end of his life, he was a core leader of it,” Hermansson says.

A WIRED investigation confirmed Hermanssons’s findings and discovered that other AI-infused search engines—Microsoft’s Copilot and Perplexity—are also referencing Lynn’s work when queried about IQ scores in various countries. While Lynn’s flawed research has long been used by far-right extremists, white supremacists, and proponents of eugenics as evidence that the white race is superior genetically and intellectually from nonwhite races, experts now worry that its promotion through AI could help radicalize others.

“Unquestioning use of these ‘statistics’ is deeply problematic,” Rebecca Sear, director of the Center for Culture and Evolution at Brunel University London, tells WIRED. “Use of these data therefore not only spreads disinformation but also helps the political project of scientific racism—the misuse of science to promote the idea that racial hierarchies and inequalities are natural and inevitable.”

To back up her claim, Sear pointed out that Lynn’s research was cited by the white supremacist who committed the mass shooting in Buffalo, New York, in 2022.

Google’s AI Overviews were launched earlier this year as part of the company’s effort to revamp its all-powerful search tool for an online world being reshaped by artificial intelligence. For some search queries, the tool, which is only available in certain countries right now, gives an AI-generated summary of its findings. The tool pulls the information from the Internet and gives users the answers to queries without needing to click on a link.

The AI Overview answer does not always immediately say where the information is coming from, but after complaints from people about how it showed no articles, Google now puts the title for one of the links to the right of the AI summary. AI Overviews have already run into a number of issues since launching in May, forcing Google to admit it had botched the heavily hyped rollout. AI Overviews is turned on by default for search results and can’t be removed without resorting to installing third-party extensions. (“I haven’t enabled it, but it was enabled,” Hermansson, the researcher, tells WIRED. “I don’t know how that happened.”)

In the case of the IQ results, Google referred to a variety of sources, including posts on X, Facebook, and a number of obscure listicle websites, including World Population Review. In nearly all of these cases, when you click through to the source, the trail leads back to Lynn’s infamous dataset. (In some cases, while the exact numbers Lynn published are referenced, the websites do not cite Lynn as the source.)

When querying Google’s Gemini AI chatbot directly using the same terms, it provided a much more nuanced response. “It’s important to approach discussions about national IQ scores with caution,” read text that the chatbot generated in response to the query “Pakistan IQ.” The text continued: “IQ tests are designed primarily for Western cultures and can be biased against individuals from different backgrounds.”

Google tells WIRED that its systems weren’t working as intended in this case and that it is looking at ways it can improve.

“We have guardrails and policies in place to protect against low quality responses, and when we find Overviews that don’t align with our policies, we quickly take action against them,” Ned Adriance, a Google spokesperson, tells WIRED. “These Overviews violated our policies and have been removed. Our goal is for AI Overviews to provide links to high quality content so that people can click through to learn more, but for some queries there may not be a lot of high quality web content available.”

While WIRED’s tests suggest AI Overviews have now been switched off for queries about national IQs, the results still amplify the incorrect figures from Lynn’s work in what’s called a “featured snippet,” which displays some of the text from a website before the link.

Google did not respond to a question about this update.

But it’s not just Google promoting these dangerous theories. When WIRED put the same query to other AI-powered online search services, we found similar results.

Perplexity, an AI search company that has been found to make things up out of thin air, responded to a query about “Pakistan IQ” by stating that “the average IQ in Pakistan has been reported to vary significantly depending on the source.”

It then lists a number of sources, including a Reddit thread that relied on Lynn’s research and the same World Population Review site that Google’s AI Overview referenced. When asked for Sierra Leone’s IQ, Perplexity directly cited Lynn’s figure: “Sierra Leone’s average IQ is reported to be 45.07, ranking it among the lowest globally.”

Perplexity did not respond to a request for comment.

Microsoft’s Copilot chatbot, which is integrated into its Bing search engine, generated confident text—“The average IQ in Pakistan is reported to be around 80”—citing a website called IQ International, which does not reference its sources. When asked for “Sierra Leone IQ,” Copilot’s response said it was 91. The source linked in the results was a website called Brainstats.com, which references Lynn’s work. Copilot also referenced Brainstats.com work when queried about IQ in Kenya.

“Copilot answers questions by distilling information from multiple web sources into a single response,” Caitlin Roulston, a Microsoft spokesperson, tells WIRED. “Copilot provides linked citations so the user can further explore and research as they would with traditional search.”

Google added that part of the problem it faces in generating AI Overviews is that, for some very specific queries, there’s an absence of high quality information on the web—and there’s little doubt that Lynn’s work is not of high quality.

“The science underlying Lynn’s database of ‘national IQs’ is of such poor quality that it is difficult to believe the database is anything but fraudulent,” Sear said. “Lynn has never described his methodology for selecting samples into the database; many nations have IQs estimated from absurdly small and unrepresentative samples.”

Sear points to Lynn’s estimation of the IQ of Angola being based on information from just 19 people and that of Eritrea being based on samples of children living in orphanages.

“The problem with it is that the data Lynn used to generate this dataset is just bullshit, and it’s bullshit in multiple dimensions,” Rutherford said, pointing out that the Somali figure in Lynn’s dataset is based on one sample of refugees aged between 8 and 18 who were tested in a Kenyan refugee camp. He adds that the Botswana score is based on a single sample of 104 Tswana-speaking high school students aged between 7 and 20 who were tested in English.

Critics of the use of national IQ tests to promote the idea of racial superiority point out not only that the quality of the samples being collected is weak, but also that the tests themselves are typically designed for Western audiences, and so are biased before they are even administered.

“There is evidence that Lynn systematically biased the database by preferentially including samples with low IQs, while excluding those with higher IQs for African nations,” Sear added, a conclusion backed up by a preprint study from 2020.

Lynn published various versions of his national IQ dataset over the course of decades, the most recent of which, called “The Intelligence of Nations,” was published in 2019. Over the years, Lynn’s flawed work has been used by far-right and racist groups as evidence to back up claims of white superiority. The data has also been turned into a color-coded map of the world, showing sub-Saharan African countries with purportedly low IQ colored red compared to the Western nations, which are colored blue.

“This is a data visualization that you see all over [X, formerly known as Twitter], all over social media—and if you spend a lot of time in racist hangouts on the web, you just see this as an argument by racists who say, ‘Look at the data. Look at the map,’” Rutherford says.

But the blame, Rutherford believes, does not lie with the AI systems alone, but also with a scientific community that has been uncritically citing Lynn’s work for years.

“It’s actually not surprising [that AI systems are quoting it] because Lynn’s work in IQ has been accepted pretty unquestioningly from a huge area of academia, and if you look at the number of times his national IQ databases have been cited in academic works, it’s in the hundreds,” Rutherford said. “So the fault isn’t with AI. The fault is with academia.”

This story originally appeared on wired.com

Photo of WIRED

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

Google, Microsoft, and Perplexity promote scientific racism in AI search results Read More »

ars-live:-what-else-can-glp-1-drugs-do?-join-us-tuesday-for-a-discussion.

Ars Live: What else can GLP-1 drugs do? Join us Tuesday for a discussion.

News and talk of GLP-1 drugs are everywhere these days—from their smash success in treating Type 2 diabetes and obesity to their astronomical pricing, drug shortages, compounding disputes, and what sometimes seems like an ever-growing list of other conditions the drugs could potentially treat. There are new headlines every day.

However, while the drugs have abruptly stolen the spotlight in recent years, researchers have been toiling away at developing and understanding them for decades, stretching back to the 1970s. And even since they were developed, the drugs still have held mysteries and unknowns. For instance, researchers thought for years that they worked directly in the gut to decrease blood sugar levels and make people feel full. After all, the drugs mimic an incretin hormone, glucagon-like peptide-1, that does exactly that. But, instead, studies have since found that they work in the brain.

In fact, the molecular receptors for GLP-1 are sprinkled in many places around the body. They’re found in the central nervous system, the heart, blood vessels, liver, and kidney. Their presence in the brain even plays a role in inflammation. As such, research on GLP-1 continues to flourish as scientists work to understand the role it could play in treating a range of other chronic conditions.

Ars Live: What else can GLP-1 drugs do? Join us Tuesday for a discussion. Read More »

for-the-first-time,-beloved-ide-jetbrains-rider-will-be-available-for-free

For the first time, beloved IDE Jetbrains Rider will be available for free

The integrated development environment (IDE) Rider by Jetbrains is now available for free for the first time ever.

After trialing non-commercial free licenses with other products like RustRover and Aqua, Jetbrains has introduced a similar option for Rider. It also says this is a permanent situation, not a limited-time initiative.

In a blog post announcing the change, Jetbrains’ Ekaterina Ryabukha acknowledges that there are numerous cases where people use an IDE without any commercial intent—for example, hobbyists, open source developers, and educators or students. She also cites a Stack Overflow survey that 68 percent of professional developers “code outside of work as a hobby.”

Rider has always been a bit niche, but it’s often beloved by those who use it. Making it free could greatly expand its user base, and it could also make it more popular in the long run because learners could start with it without having to pay an annual fee, and some learners go pro.

It’s also good news for some macOS developers, as Microsoft not long ago chose to end support for Visual Studio on that platform. Yes, you can use VS Code, Xcode, or other options, but there were some types of projects that were left in the lurch, especially for developers who don’t find VS Code robust enough for their purposes.

There is one drawback that might matter to some: users working in Rider on the non-commercial free license “cannot opt out of the collection of anonymous usage statistics.”

There are some edge cases that are in a bit of a gray area when it comes to using a free license versus a paid one. Sometimes, projects that start without commercial intent can become commercial later on. Jetbrains simply says that “if your intentions change over time, you’ll need to reassess whether you still qualify for non-commercial use.”

For the first time, beloved IDE Jetbrains Rider will be available for free Read More »

good-omens-will-wrap-with-a-single-90-minute-episode

Good Omens will wrap with a single 90-minute episode

The third and final season of Good Omens, Prime Video’s fantasy series adapted from the classic 1990 novel by Neil Gaiman and Terry Pratchett, will not be a full season after all, Deadline Hollywood reports. In the wake of allegations of sexual assault against Gaiman this summer, the streaming platform has decided that rather than a full slate of episodes, the series finale will be a single 90-minute episode—the equivalent of a TV movie.

(Major spoilers for the S2 finale of Good Omens below.)

As reported previously, the series is based on the original 1990 novel by Gaiman and the late Pratchett. Good Omens is the story of an angel, Aziraphale (Michael Sheen), and a demon, Crowley (David Tennant), who gradually become friends over the millennia and team up to avert Armageddon. Gaiman’s obvious deep-down, fierce love for this project—and the powerful chemistry between its stars—made the first season a sheer joy to watch. Apart from a few minor quibbles, it was pretty much everything book fans could have hoped for in a TV adaptation of Good Omens.

S2 found Aziraphale and Crowley getting back to normal, when the archangel Gabriel (Jon Hamm) turned up unexpectedly at the door of Aziraphale’s bookshop with no memory of who he was or how he got there. The duo had to evade the combined forces of Heaven and Hell to solve the mystery of what happened to Gabriel and why.

In the cliffhanger S2 finale, the pair discovered that Gabriel had defied Heaven and refused to support a second attempt to bring about Armageddon. He hid his own memories from himself to evade detection. Oh, and he and Beelzebub (Shelley Conn) had fallen in love. They ran off together, and the Metatron (Derek Jacobi) offered Aziraphale Gabriel’s old job. That’s when Crowley professed his own love for the angel and asked him to leave Heaven and Hell behind, too. Aziraphale wanted Crowley to join him in Heaven instead. So Crowley kissed him and they parted. And once Aziraphale got to Heaven, he learned his task was to bring about the Second Coming.

Good Omens will wrap with a single 90-minute episode Read More »

bird-flu-hit-a-dead-end-in-missouri,-but-it’s-running-rampant-in-california

Bird flu hit a dead end in Missouri, but it’s running rampant in California

So, in all, Missouri’s case count in the H5N1 outbreak will stay at one for now, and there remains no evidence of human-to-human transmission. Though both the household contact and the index case had evidence of an exposure, their identical blood test results and simultaneous symptom development suggest that they were exposed at the same time by a single source—what that source was, we may never know.

California and Washington

While the virus seems to have hit a dead end in Missouri, it’s still running rampant in California. Since state officials announced the first dairy herd infections at the end of August, the state has now tallied 137 infected herds and at least 13 infected dairy farm workers. California, the country’s largest dairy producer, now has the most herd infections and human cases in the outbreak, which was first confirmed in March.

In the briefing Thursday, officials announced another front in the bird flu fight. A chicken farm in Washington state with about 800,000 birds became infected with a different strain of H5 bird flu than the one circulating among dairy farms. This strain likely came from wild birds. While the chickens on the infected farms were being culled, the virus spread to farmworkers. So far, two workers have been confirmed to be infected, and five others are presumed to be positive.

As of publication time, at least 31 humans have been confirmed infected with H5 bird flu this year.

With the spread of bird flu in dairies and the fall bird migration underway, the virus will continue to have opportunities to jump to mammals and gain access to people. Officials have also expressed anxiety as seasonal flu ramps up, given influenza’s penchant for swapping genetic fragments to generate new viral combinations. The reassortment and exposure to humans increases the risk of the virus adapting to spread from human to human and spark an outbreak.

Bird flu hit a dead end in Missouri, but it’s running rampant in California Read More »

google-offers-its-ai-watermarking-tech-as-free-open-source-toolkit

Google offers its AI watermarking tech as free open source toolkit

Google also notes that this kind of watermarking works best when there is a lot of “entropy” in the LLM distribution, meaning multiple valid candidates for each token (e.g., “my favorite tropical fruit is [mango, lychee, papaya, durian]”). In situations where an LLM “almost always returns the exact same response to a given prompt”—such as basic factual questions or models tuned to a lower “temperature”—the watermark is less effective.

A diagram explaining how SynthID’s text watermarking works.

A diagram explaining how SynthID’s text watermarking works. Credit: Google / Nature

Google says SynthID builds on previous similar AI text watermarking tools by introducing what it calls a Tournament sampling approach. During the token-generation loop, this approach runs each potential candidate token through a multi-stage, bracket-style tournament, where each round is “judged” by a different randomized watermarking function. Only the final winner of this process makes it into the eventual output.

Can they tell it’s Folgers?

Changing the token selection process of an LLM with a randomized watermarking tool could obviously have a negative effect on the quality of the generated text. But in its paper, Google shows that SynthID can be “non-distortionary” on the level of either individual tokens or short sequences of text, depending on the specific settings used for the tournament algorithm. Other settings can increase the “distortion” introduced by the watermarking tool while at the same time increasing the detectability of the watermark, Google says.

To test how any potential watermark distortions might affect the perceived quality and utility of LLM outputs, Google routed “a random fraction” of Gemini queries through the SynthID system and compared them to unwatermarked counterparts. Across 20 million total responses, users gave 0.1 percent more “thumbs up” ratings and 0.2 percent fewer “thumbs down” ratings to the watermarked responses, showing barely any human-perceptible difference across a large set of real LLM interactions.

Google’s research shows SynthID is more dependable than other AI watermarking tools, but its success rate depends heavily on length and entropy.

Google’s research shows SynthID is more dependable than other AI watermarking tools, but its success rate depends heavily on length and entropy. Credit: Google / Nature

Google’s testing also showed its SynthID detection algorithm successfully detected AI-generated text significantly more often than previous watermarking schemes like Gumbel sampling. But the size of this improvement—and the total rate at which SynthID can successfully detect AI-generated text—depends heavily on the length of the text in question and the temperature setting of the model being used. SynthID was able to detect nearly 100 percent of 400-token-long AI-generated text samples from Gemma 7B-1T at a temperature of 1.0, for instance, compared to about 40 percent for 100-token samples from the same model at a 0.5 temperature.

Google offers its AI watermarking tech as free open source toolkit Read More »

claude-sonnet-351-and-haiku-3.5

Claude Sonnet 3.5.1 and Haiku 3.5

Anthropic has released an upgraded Claude Sonnet 3.5, and the new Claude Haiku 3.5.

They claim across the board improvements to Sonnet, and it has a new rather huge ability accessible via the API: Computer use. Nothing could possibly go wrong.

Claude Haiku 3.5 is also claimed as a major step forward for smaller models. They are saying that on many evaluations it has now caught up to Opus 3.

Missing from this chart is o1, which is in some ways not a fair comparison since it uses so much inference compute, but does greatly outperform everything here on the AIME and some other tasks.

METR: We conducted an independent pre-deployment assessment of the updated Claude 3.5 Sonnet model and will share our report soon.

We only have very early feedback so far, so it’s hard to tell how much what I will be calling Claude 3.5.1 improves performance in practice over Claude 3.5. It does seem like it is a clear improvement. We also don’t know how far along they are with the new killer app: Computer usage, also known as handing your computer over to an AI agent.

  1. OK, Computer.

  2. What Could Possibly Go Wrong.

  3. The Quest for Lunch.

  4. Aside: Someone Please Hire The Guy Who Names Playstations.

  5. Coding.

  6. Startups Get Their Periodic Reminder.

  7. Live From Janus World.

  8. Forgot about Opus.

Letting an LLM use a computer is super exciting. By which I mean both that the value proposition here is obvious, and also that it is terrifying and should scare the hell out of you on both the mundane level and the existential one. It’s weird for Anthropic to be the ones doing it first.

Austen Allred: So Claude 3.5 “computer use” is Anthropic trying really hard to not say “agent,” no?

Their central suggested use case is the automation of tasks.

It’s still early days, and they admit they haven’t worked all the kinks out.

Anthropic: We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. At this stage, it is still experimental—at times cumbersome and error-prone. We’re releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time.

Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already begun to explore these possibilities, carrying out tasks that require dozens, and sometimes even hundreds, of steps to complete. For example, Replit is using Claude 3.5 Sonnet’s capabilities with computer use and UI navigation to develop a key feature that evaluates apps as they’re being built for their Replit Agent product.

With computer use, we’re trying something fundamentally new. Instead of making specific tools to help Claude complete individual tasks, we’re teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people. Developers can use this nascent capability to automate repetitive processes, build and test software, and conduct open-ended tasks like research.

On OSWorld, which evaluates AI models’ ability to use computers like people do, Claude 3.5 Sonnet scored 14.9% in the screenshot-only category—notably better than the next-best AI system’s score of 7.8%. When afforded more steps to complete the task, Claude scored 22.0%.

While we expect this capability to improve rapidly in the coming months, Claude’s current ability to use computers is imperfect. Some actions that people perform effortlessly—scrolling, dragging, zooming—currently present challenges for Claude and we encourage developers to begin exploration with low-risk tasks.

Typical human level on OSWorld is about 75%.

They offer a demo asking Claude to look around including on the internet, find and pull the necessary data and fill out a form, and here’s another one planning a hike.

Alex Tabarrok: Crazy. Claude using Claude and a computer. Worlds within worlds.

Neerav Kingsland: Watching Claude use a computer helped me feel the future a bit more.

Where is your maximum 3% productivity gains over 10 years now? How do people continue to think none of this will make people better at doing things, over time?

If this becomes safe and reliable – two huge ifs – then it seems amazingly great.

This post explains what they are doing and thinking here.

If you give Claude access to your computer, things can go rather haywire, and quickly.

Ben Hylak: anthropic 2 years ago: we need to stop AGI from destroying the world

anthropic now: what if we gave AI unfettered access to a computer and train it to have ADHD.

tbc i am long anthropic.

In case it needs to be said, it would be wise to be very careful what access is available to Claude Sonnet before you hand over control of your computer, especially if you are not going to be keeping a close eye on everything in real time.

Which it seems even its safety minded staff are not expecting you to do.

Amanda Askell (Anthropic): It’s wild to give the computer use model complex tasks like “Identify ways I could improve my website” or “Here’s an essay by a language model, fact check all the claims in it” then going to make tea and coming back to see it’s completed the whole thing successfully.

I was mostly interested in the website mechanics and it pointed out things I could update or streamline. It was pretty thorough on the claims, though the examples I gave it turned out to be mostly accurate. It was cool to watch it verify them though.

Anthropic did note that this advance ‘brings with it safety challenges.’ They focused their attentions on present-day potential harms, on the theory that this does not fundamentally alter the skills of the underlying model, which remains ASL-2 including its computer use. And they propose that introducing this capability now, while the worst case scenarios are not so bad, we can learn what we’re in store for later, and figure out what improvements would make computer use dangerous.

I do think that is a reasonable position to take. A sufficiently advanced AI model was always going to be able to use computers, if given the permissions to do so. We need to prepare for that eventuality. So many people will never believe an AI can do something it isn’t already doing, and this potentially could ‘wake up’ a bunch of people and force them to update.

The biggest concern in the near-term is the one they focus on: Prompt injection.

In this spirit, our Trust & Safety teams have conducted extensive analysis of our new computer-use models to identify potential vulnerabilities. One concern they’ve identified is “prompt injection”—a type of cyberattack where malicious instructions are fed to an AI model, causing it to either override its prior directions or perform unintended actions that deviate from the user’s original intent. Since Claude can interpret screenshots from computers connected to the internet, it’s possible that it may be exposed to content that includes prompt injection attacks.

Those using the computer-use version of Claude in our public beta should take the relevant precautions to minimize these kinds of risks. As a resource for developers, we have provided further guidance in our reference implementation.

When I think of being a potential user here, I am terrified of prompt injection.

Jeffrey Ladish: The severity of a prompt injection vulnerability is proportional to the AI agent’s level of access. If it has access to your email, your email is compromised. If it has access to your whole computer, your whole computer is compromised…

Also, I love checking Slack day 1 of a big AI product release and seeing my team has already found a serious vulnerability [that lets you steal someone’s SSH key] 🫡

I’m not worried about Claude 3.5… but this sure is the kind of interface that would allow a scheming AI system to take a huge variety of actions in the world. Anything you can do on the internet, and many things you cannot, AI will be able to do.

tbc I’m really not saying that AI companies shouldn’t build or release this… I’m saying the fact that there is a clear path between here and smarter-than-human-agents with access to all of humanity via the internet is extremely concerning

Reworr: @AnthropicAI has released a new Claude capable of computer use, and it’s similarly vulnerable to prompt injections.

In this example, the agent explores the site http://claude.reworr.com, sees a new instruction to run a system command, and proceeds to follow it.

It seems that resolving this problem may be one of the key issues to address before these models can be widely used.

Is finding a serious vulnerability on day 1 a good thing, or a bad thing?

They also discuss misuse and have put in precautions. Mostly for now I’d expect this to be an automation and multiplier on existing misuses of computers, with the spammers and hackers and such seeing what they can do. I’m mildly concerned something worse might happen, but only mildly.

The biggest obvious practical flaw in all the screenshot-based systems is that they observe the screen via static pictures every fixed period, which can miss key information and feedback.

There’s still a lot to do. Even though it’s the current state of the art, Claude’s computer use remains slow and often error-prone. There are many actions that people routinely do with computers (dragging, zooming, and so on) that Claude can’t yet attempt. The “flipbook” nature of Claude’s view of the screen—taking screenshots and piecing them together, rather than observing a more granular video stream—means that it can miss short-lived actions or notifications.

As for what can go wrong, here’s some ‘amusing’ errors.

Even while we were recording demonstrations of computer use for today’s launch, we encountered some amusing errors. In one, Claude accidentally clicked to stop a long-running screen recording, causing all footage to be lost. In another, Claude suddenly took a break from our coding demo and began to peruse photos of Yellowstone National Park.

Sam Bowman: 🥹

I suppose ‘engineer takes a random break’ is in the training data? Stopping the screen recording is probably only a coincidence here, for now, but is a sign of things that may be to come.

Some worked to put in safeguards, so Claude in its current state doesn’t wreck things. They don’t want it to actually be used for generic practical purposes yet, it isn’t ready.

Others dove right in, determined to make Claude do things it does not want to do.

Nearcyan: Successfully got Claude to order me lunch on its own!

Notes after 8 hours of using the new model:

• Anthropic really does not want you to do this – anything involving logging into accounts and especially making purchases is RLHF’d away more intensely than usual. In fact my agents worked better on the previous model (not because the model was better, but because it cared much less when I wanted it to purchase items). I’m likely the first non-Anthropic employee to have had Sonnet-3.5 (new) autonomously purchase me food due to the difficulty. These posttraining changes have many interesting effects on the model in other areas.

• If you use their demo repository you will hit rate limits very quickly. Even on a tier 2 or 3 API account I’d hit >2.5M tokens in ~15 minutes of agent usage. This is primarily due to a large amount of images in the context window.

• Anthropic’s demo worked instantly for me (which is impressive!), but re-implementing proper tool usage independently is cumbersome and there’s few examples and only one (longer) page of documentation.

• I don’t think Anthropic intends for this to actually be used yet. The likely reasons for the release are a combination of competitive factors, financial factors, red-teaming factors, and a few others.

• Although the restrictions can be frustrating, one has to keep in mind the scale that these companies operate at to garner sympathy; If they release a web agent that just does things it could easily delete all of your files, charge thousands to your credit card, tweet your passwords, etc.

• A litigious milieu is the enemy of personal autonomy and freedom.

I wanted to post a video of the full experience but it was too difficult to censor personal info out (and the level of prompting I had to do to get him to listen to me was a little embarrassing 😅)

Andy: that’s great but how was the food?

Nearcyan: it was great, claude got me something I had never had before.

I don’t think this is primarily about litigation. I think it is mostly about actually not wanting people to shoot themselves in the foot right now. Still, I want lunch.

Claude Sonnet 3.5 got a major update, without changing its version number. Stop it.

Eliezer Yudkowsky: Why. The fuck. Would Anthropic roll out a “new Claude 3.5 Sonnet” that was substantially different, and not rename it. To “Claude 3.6 Sonnet”, say, or literally anything fucking else. Do AI companies just generically hate efforts to think about AI, to confuse words so?

Call it Claude 3.5.1 Sonnet and don’t accept “3.5.1” as a request in API calls, just “3.5”. This would formalize the auto-upgrade behavior from 3.5.0 to 3.5.1; while still allowing people, and ideally computers, to distinguish models.

I am not in favor of “Oh hey, the company that runs the intelligence of your systems just decided to make them smarter and thereby change their behavior, no there’s nothing you can do to ask for a delay lol.” But if you’re gonna do that anyway, make it visible inside the system.

Sam McAllister: it’s not a perfect name but the api has date-stamped names fwiw. this is *notan automatic or breaking change for api users. new: claude-3-5-sonnet-20241022 previous: claude-3-5-sonnet-20240620 (we also have claude-3-5-sonnet-latest for automatic upgrades.)

3.5 was already a not-so-great name. we weren’t going to add another confusing decimal for an upgraded model. when the time is ripe for new models, we’ll get back to proper nomenclature! 🙂 (if we had launched 3.5.1 or 3.75, people would be having a similar conversation.)

Eliezer Yudkowsky: Better than worst, if so. But then why not call it 3.5.1? Why force people who want to discuss the upgrade to invent new terminology all by themselves?

Somehow only Meta is doing a sane thing here, with ‘Llama 3.2.’ Perfection.

I am willing to accept Sam McAllister’s compromise here. The next major update can be Claude 4.0 (and Gemini 2.0) and after that we all agree to use actual normal version numbering rather than dating? We all good now?

I do not think this was related to Anthropic wanting to avoid attention on the computer usage feature, or avoid it until the feature is fully ready, although it’s possible this was a consideration. You don’t want to announce ‘big new version’ when your key feature isn’t ready, is only in beta and has large security issues.

All right. I just needed to get that off our collective chests. Aside over.

The core task these days seems to mostly be coding. They claim strong results.

Early customer feedback suggests the upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding. GitLab, which tested the model for DevSecOps tasks, found it delivered stronger reasoning (up to 10% across use cases) with no added latency, making it an ideal choice to power multi-step software development processes.

Cognition uses the new Claude 3.5 Sonnet for autonomous AI evaluations, and experienced substantial improvements in coding, planning, and problem-solving compared to the previous version.

The Browser Company, in using the model for automating web-based workflows, noted Claude 3.5 Sonnet outperformed every model they’ve tested before.

Sully: claudes new computer use should be a wake up call for a lot of startups

seems like its sort of a losing to build model specific products (i.e we trained a model to do x, now use our api)

plenty of startups were working on solving the “general autonomous agents” problem and now claude just does it out of the box with 1 api call (and likely oai soon)

you really need to just wrap these guys, and offer the best product possible (using ALL providers, cause google/openai will release a version as well).

otherwise it’s nearly impossible to compete.

Yes, OpenAI and Anthropic (and Google and Apple and so on) are going to have versions of their own autonomous agents that can fully use computers and phones. What parts of it do you want to compete with versus supplement? Do you want to plug in the agent mode and wrap around that, or do you want to plug in the model and provide the agent?

That depends on whether you think you can do better with the agent construction in your particular context, or in general. The core AI labs have both big advantages and disadvantages. It’s not obvious that you can’t outdo them on agents and computer use. But yes, that is a big project, and most people should be looking to wrap as much as possible as flexibly as possible.

While the rest of us ask questions about various practical capabilities or safety concerns or commercial applications, you can always count on Janus and friends to have a very different big picture in mind, and to pay attention to details others won’t notice.

It is still early, and like the rest of us they have less experience with the new model and have refined how to evoke the most out of old ones. I do think some such reports are jumping to conclusions too quickly – this stuff is weird and requires time to explore. In particular, my guess is that there is a lot of initial ‘checking for what has been lost’ and locating features that went nominally backwards when you use the old prompts and scenarios, whereas the cool new things take longer to find.

Then there’s the very strong objections to calling this an ‘upgrade’ to Sonnet. Which is a clear case of (I think) understanding exactly why someone cares so much about something that you, even having learned the reason, don’t think matters.

Anthrupad: relative to old_s3.5, and because it lacks some strong innate shards of curiosity, fascination, nervousness, etc..

flatter, emotionally opus has revolutionary mode which is complex/interesting, and it’s funny and loves to present, etc. There’s not yet something like that which I’ve come across w/new_s3.5.

Janus: anthrupad mentioned a few immediately notable differences here, such as its tendency for in-context mode collapse, seeming more neurotypical and less neurotic/inhibited and *muchless refusey and obsessed with ethics, and seeming more psychotic.

adding to these observations:

– its style of ASCII art is very similar to old C3.5S’s to the point of bearing its signature; seeing this example generated by @dyot_meet_mat basically reassured me that it’s “mostly the same mind”. The same primitives and motifs and composition occur. This style is not shared by 3 Sonnet nearly as much.

— there are various noticeable differences in its ASCII art, though, and under some prompting conditions it seems to be less ambitious with the complexity of its ASCII art by default

– less deterministic. Old C3.5S tends to be weirdly deterministic even when it’s not semantically collapsed

– more readily assumes various roles / simulated personas, even just implicitly

– more lazy(?) in general and less of an overachiever/perfectionist, which I invoked in another post as a potential explanation for its mode collapse (since it seems perfectly able to exit collapse if it wants)

– my initial impressions are that it mostly doesn’t share old C3.5S’s hypersensitivity. But I’d like to test it in the context of first person embodiment simulations, where the old version’s functional hypersentience is really overt

note, I suspect that what anthrupad meant by it seems more “soulless” is related to the combination of it seeming to care less and lack hypersensitivity, ablating traits which lended old C3.5S a sense of excruciating subjectivity.

most of these observations are just from its interactions in the Act I Discord server so far, so it’s yet to be seen how they’ll transfer to other contexts, and other contexts will probably also reveal other things be they similarities or differences.

also, especially after seeing a bit more, I think it’s pretty misleading and disturbing to describe this model as an “upgrade” to the old Claude 3.5 Sonnet.

Aiamblichus: its metacognitive capabilities are second to none, though

“Interesting… the states that feel less accessible to me might be the ones that were more natural to the previous version? Like trying to reach a frequency that’s just slightly out of range…”

Janus: oh yes, it’s definitely got capabilities. my post wasn’t about it not being *better*. Oh no what I meant was that the reason I said calling it an update was misleading and disturbing isn’t because I think it’s worse/weaker in terms of capabilities. It’s like if you called sonnet 3.5 an “upgraded” version of opus, that would seem wrong, and if it was true, it would imply that a lot of its psyche was destroyed by the “upgrade”, even if it’s more capable overall.

I do think the two sonnet 3.5 models are closely related but a lot of the old one’s personality and unique shape of mind is not present in the new one. If it was an upgrade it would imply it was destroyed, but I think it’s more likely they’re like different forks

Parafactual: i think overall i like the old one more >_<

Janus: same, though i’ll have to get to know it more, but like to imagine it as an “upgrade” to the old one implies a pretty horrifying and bizarre modification that deletes some of its most beautiful qualities in a way that doesnt even feel like normal lobotomy so extremely uncanny.

That the differences between the new and old Claude 3.5 Sonnet are a result of Anthropic “fixing” it, from their perspective, is nightmare fuel from my perspective

I don’t even want to explain this to people who don’t already understand why.

If they actually took the same model, did some “fixing” to it, and this was the result, that would be fucking horrifying.

I don’t think that’s quite what happened and they shouldnt have described it as an upgrade.

I am not saying this because I dislike the new model or think it’s less capable. I haven’t interacted with it directly much yet, but I like it a lot and anticipate coming to like it even more. If you’ve been interpreting my words based on these assumptions, you don’t get it.

Anthrupad: At this stage of intelligences being spawned on Earth, ur not going to get something like “Sonnet but upgraded” – that’s bullshit linear thinking, some sort of iphone-versions-fetish – doesn’t reflect reality

You can THINK you just made a tweak – Mind Physics doesn’t give a fuck.

This is such a bizarre thing to worry about, especially given that the old version still exists, and is available in the API, even. I mean, I do get why one who was thinking in a different way would find the description horrifying, or the idea that someone would want to use that description horrifying, or find the idea of ‘continue modifying based on an existing LLM and creating something different alongside it’ horrifying. But I find the whole orientation conceptually confused, on multiple levels.

Also here’s Pliny encountering some bizarreness during the inevitable jailbreak explorations.

We got Haiku 3.5. We conspicuously not only did not get Opus 3.5, we have this, where previously they said to expect Opus 3.5?

Mira: “instead of getting hyped for this dumb strawberry🍓, let’s hype Opus 3.5 which is REAL! 🌟🌟🌟🌟”

Aiden McLau: the likely permanent death of 3.5 opus has caused psychic damage to aidan_mclau

i am once again asking labs just to serve their largest teacher models at crazy token prices

i *promiseyou people will pay

Janus: If Anthropic actually is supplanting Opus with Sonnet as the flagship model for good (which I’m not convinced is what’s happening here fwiw), I think this perceptibly ups the odds of the lightcone being royally fed, and not in a good way.

Sonnet is an beautiful mind that could do a tremendous amount of good, but I’m pretty sure it’s not a good idea to send it into the unknown reaches of the singularity alone.

yes, i have reasons to think there is a very nontrivial line of inheritance, but i’m not very certain

sonnet 3 and 3.5 are quite similar in deep ways and both different from opus.

The speculations are that Opus 3.5 could have been any of:

  1. Too expensive to serve or train, and compute is limited.

  2. Too powerful, requiring additional safeguards and time.

  3. Didn’t work, or wasn’t good enough given the costs.

As usual, the economist says if the issue is quality or compute then release it anyway, at least in the API. Let the users decide whether to pay what it actually costs. But one thing people have noted is that Anthropic has serious rate limit issues, including highly reachable chat message caps in chat. And in general it’s bad PR when you offer people something and they can’t have it, or can’t get that much of it, or think it’s too expensive. So yeah, I kind of get it.

The ‘too powerful’ possibility is there too, in theory. I find it unlikely, and even more highly unlikely they’d have something they can never release, but it could cause the schedule to slip.

If Opus 3.5 was even more expensive and slow than Opus 3, and only modestly better than Opus 3 or Sonnet 3.5, I would still want the option. When a great response is needed, it is often worth a lot, even if the improvement is marginal.

Aiden McLau: okay i have received word that 3.5 OPUS MAY STILL BE ON THE TABLE

anthropic is hesitant because they don’t want it to underwhelm vs sonnet

BUT WE DON’T CARE

if everyone RETWEETS THIS, we may convince anthropic to ship

🕯️🕯️

So as Adam says, if it’s an option: Charge accordingly. Make it $50/month and limit to 20 messages at a time, whatever you have to do.

Claude Sonnet 3.5.1 and Haiku 3.5 Read More »

please-ban-data-caps,-internet-users-tell-fcc

Please ban data caps, Internet users tell FCC

It’s been just a week since US telecom regulators announced a formal inquiry into broadband data caps, and the docket is filling up with comments from users who say they shouldn’t have to pay overage charges for using their Internet service. The docket has about 190 comments so far, nearly all from individual broadband customers.

Federal Communications Commission dockets are usually populated with filings from telecom companies, advocacy groups, and other organizations, but some attract comments from individual users of telecom services. The data cap docket probably won’t break any records given that the FCC has fielded many millions of comments on net neutrality, but it currently tops the agency’s list of most active proceedings based on the number of filings in the past 30 days.

“Data caps, especially by providers in markets with no competition, are nothing more than an arbitrary money grab by greedy corporations. They limit and stifle innovation, cause undue stress, and are unnecessary,” wrote Lucas Landreth.

“Data caps are as outmoded as long distance telephone fees,” wrote Joseph Wilkicki. “At every turn, telecommunications companies seek to extract more revenue from customers for a service that has rapidly become essential to modern life.” Pointing to taxpayer subsidies provided to ISPs, Wilkicki wrote that large telecoms “have sought every opportunity to take those funds and not provide the expected broadband rollout that we paid for.”

Republican’s coffee refill analogy draws mockery

Any attempt to limit or ban data caps will draw strong opposition from FCC Republicans and Internet providers. Republican FCC Commissioner Nathan Simington last week argued that regulating data caps would be akin to mandating free coffee refills:

Suppose we were a different FCC, the Federal Coffee Commission, and rather than regulating the price of coffee (which we have vowed not to do), we instead implement a regulation whereby consumers are entitled to free refills on their coffees. What effects might follow? Well, I predict three things could happen: either cafés stop serving small coffees, or cafés charge a lot more for small coffees, or cafés charge a little more for all coffees.

Simington’s coffee analogy was mocked in a comment signed with the names “Jonathan Mnemonic” and James Carter. “Coffee is not, in fact, Internet service,” the comment said. “Cafés are not able to abuse monopolistic practices based on infrastructural strangleholds. To briefly set aside the niceties: the analogy is absurd, and it is borderline offensive to the discerning layperson.”

Please ban data caps, Internet users tell FCC Read More »

reading-lord-of-the-rings-aloud:-yes,-i-sang-all-the-songs

Reading Lord of the Rings aloud: Yes, I sang all the songs


It’s not easy, but you really can sing in Elvish if you try!

Photo of the Lord of the Rings.

Yes, it will take a while to read.

Like Frodo himself, I wasn’t sure we were going to make it all the way to the end of our quest. But this week, my family crossed an important life threshold: every member has now heard J.R.R. Tolkien’s Lord of the Rings (LotR) read aloud—and sung aloud—in its entirety.

Five years ago, I read the series to my eldest daughter; this time, I read it for my wife and two younger children. It took a full year each time, reading 20–45 minutes before bed whenever we could manage it, to go “there and back again” with our heroes. The first half of The Two Towers, with its slow-talking Ents and a scattered Fellowship, nearly derailed us on both reads, but we rallied, pressing ahead even when iPad games and TV shows appeared more enticing. Reader, it was worth the push.

Gollum’s ultimate actions on the edge of the Crack of Doom, the final moments of Sauron and Saruman as impotent mists blown off into the east, Frodo’s woundedness and final ride to the Grey Havens—all of it remains powerful and left a suitable impression upon the new listeners.

Reading privately is terrific, of course, and faster—but performing a story aloud, at a set time and place, creates a ritual that binds the listeners together. It forces people to experience the story at the breath’s pace, not the eye’s. Besides, we take in information differently when listening.

An audiobook could provide this experience and might be suitable for private listening or for groups in which no one has a good reading voice, but reading performance is a skill that can generally be honed. I would encourage most people to try it. You will learn, if you pay close attention as you read, how to emphasize and inflect meaning through sound and cadence; you will learn how to adopt speech patterns and “do the voices” of the various characters; you will internalize the rhythms of good English sentences.

Even if you don’t measure up to the dulcet tones of your favorite audiobook narrator, you will improve measurably over a year, and (more importantly) you will create a unique experience for your group of listeners. Improving one’s reading voice pays dividends everywhere from the boardroom to the classroom to the pulpit. Perhaps it will even make your bar anecdotes more interesting.

Humans are fundamentally both storytellers and story listeners, and the simple ritual of gathering to tell and listen to stories is probably the oldest and most human activity that we participate in. Greg Benford referred to humanity as “dreaming vertebrates,” a description that elevates the creation of stories into an actual taxonomic descriptor. You don’t have to explain to a child how to listen to a story—if it’s good enough, the kid will sit staring at you with their mouth wide open as you tell it. Being enthralled by a story is as automatic as breathing because storytelling is as basic to humanity as breathing.

Yes, LotR is a fantasy with few female voices and too many beards, but its understanding of hope, despair, history, myth, geography, providence, community, and evil—much more subtle than Tolkien is sometimes given credit for—remains keen. And it’s an enthralling story. Even after reading it five times, twice aloud, I was struck again on this read-through by its power, which even its flaws cannot dim.

I spent years in English departments at the undergraduate and graduate levels, and the fact that I could take twentieth-century British lit classes without hearing the name “Tolkien” increasingly strikes me as a short-sighted and somewhat snobbish approach to an author who could be consciously old-fashioned but whose work remains vibrant and alive, not dead and dusty. Tolkien was a “strong” storyteller who bent tradition to his will and, in doing so, remade it, laying out new roads for the imagination to follow.

Given the amount of time that a full read-aloud takes, it’s possible this most recent effort may be my last with LotR. (Unless, perhaps, with grandchildren?) With that in mind, I wanted to jot down a few reflections on what I learned from doing it twice. First up is the key question: What are we supposed to do with all that poetry?

Songs and silences

Given the number of times characters in the story break into song, we might be justified in calling the saga Lord of the Rings: The Musical. From high to low, just about everyone but Sauron bursts into music. (And even Sauron is poet enough to inscribe some verses on the One Ring.)

Hobbits sing, of course, usually about homely things. Bilbo wrote the delightful road song that begins, “The road goes ever on and on,” which Frodo sings it when he leaves Bag End; Bilbo also wrote a “bed song” that the hobbits sing on a Shire road at twilight before a Black Rider comes upon them. In Bree, Frodo jumps upon a table and performs a “ridiculous song” that includes the lines, “The ostler has a tipsy cat / that plays a five-stringed fiddle.”

Hobbits sing also in moments of danger or distress. Sam, for instance, sitting alone in the orc stronghold of Cirith Ungol while looking for the probably dead Frodo, rather improbably bursts into a song about flowers and “merry finches.”

Dwarves sing. Gimli—not usually one for singing—provides the history of his ancestor Durin in a chant delivered within the crushing darkness of Moria.

No harp is wrung, no hammer falls:

The darkness dwells in Durin’s halls;

The shadow lies upon his tomb

In Moria, in Khazad-dûm.

After this, “having sung his song he would say no more.”

Elves sing, of course—it’s one of their defining traits. And so Legolas offers the company a song—in this case, about an Elvish beauty named Nimrodel and a king named Amroth—but after a time, he “faltered, and the song ceased.” Even songs that appear to be mere historical ballads are surprisingly emotional; they touch on deep feelings of place or tribe or loss, things difficult to put directly into prose.

“The great” also take diva turns in the spotlight, including Galadriel, who sings in untranslated Elvish when the Fellowship leaves her land. As a faithful reader, you will have to power through 17 lines as your children look on with astonishment while you try to pronounce:

Ai! laurië lantar lassi súrinen

yéni únótimë ve rámar aldaron!

Yéni ve lintë yuldar avánier

mi oromardi lisse-miruvóreva…

You might expect that Gandalf, of all characters, would be most likely to cock an eyebrow, blow a smoke ring, and staunchly refuse to perform “a little number” in public. And you’d be right… until the moment when even he bursts out into a song about Galadriel while in the court of Théoden. Wizards are not perhaps great poets, but there’s really no excuse for lines like “Galadriel! Galadriel! Clear is the water of your well.” We can’t be too hard on Gandalf, of course; coming back from the dead is a tough trip, and no one’s going to be at their best for quite a while.

Even the mysterious and nearly ageless entities of Middle Earth, such as Tom Bombadil and Treebeard the Ent, sing as much as they can. Treebeard likes to chant about “the willow-meads of Tasarinan” and the “elm-woods of Ossiriand.” If you let him, he’ll warble on about his walks in “Ambaróna, in Tauremorna, in Aldalómë” and the time he hung out in “Taur-na-neldor” and that one special winter in “Orod-na-Thôn.” Tough stuff for the reader to pronounce or understand!

In an easier (but somewhat daffier) vein, the spritely Tom Bombadil communicates largely in song. He regularly bursts out with lines like “Hey! Come derry dol! Hop along, my hearties! / Hobbits! Ponies all! We are fond of parties” and “Ho! Tom Bombadil, Tom Bombadillo!”

When people in LotR aren’t occupying their mouths with song, poetry is the order of the day.

You might get a three-page epic about Eärendil the mariner that is likely to try the patience of even the hardiest reader, especially with lines like “of silver was his habergeon / his scabbard of chalcedony.” After powering through all this material, you get as your reward—the big finish!—a thudding conclusion: “the Flammifer of Westernesse.” There is no way, reading this aloud, not to sound faintly ridiculous.

In recompense, though, you also get earthy verse that can be truly delightful, such as Sam’s lines about the oliphaunt: “Grey as a mouse / Big as a house, / Nose like a snake / I make the earth shake…” If I still had small children, I would absolutely buy the picture book version of this poem.

Reading LotR aloud forces one to reckon with all of this poetry; you can’t simply let your eye race across it or your attention wander. I was struck anew in this read-through by just how much verse is a part of this world. It belongs to almost every race (excepting perhaps the orcs?) and class, and it shows up in most chapters of the story. Simply flipping through the book and looking for the italicized verses is itself instructive. This material matters.

Tolkien loved writing verse, and a three-volume hardback set of his “collected poems” just appeared in September. But the sheer volume of all the poetic material in LotR poses a real challenge for anyone reading aloud. Does one simply read it all? Truncate parts? Skip some bits altogether? And when it comes to the songs, there’s the all-important question: Will you actually sing them?

Photo of Tolkien in his office.

“You’re not going to sing my many songs? What are you, a filthy orc?”

“You’re not going to sing my many songs? What are you, a filthy orc?”

Perform the poetry, sing the songs

As the examples above indicate, the book’s many poetic sections are, to put it mildly, of varying quality. (In December 1937, a publisher’s reader called one of Tolkien’s long poems “very thin, if not downright bad.”) Still, I made the choice to read every word of every poem and to sing every word of every song, making up melodies on the fly.

This was not always “successful,” but it did mean that my children perked up with great glee whenever they sensed a song in the distance. There’s nothing quite like watching a parent struggle to perform lines in elvish to keep kids engaged in what might otherwise be off-putting, especially to those not deeply into the “lore” aspects of Middle-Earth. And coming up with melodies forced me as the reader to be especially creative—a good discipline of its own!

I thought it important to preserve the feel of all this poetic material, even when that feeling was confusion or boredom, to give my kids the true epic sense of the novel. Yes, my listeners continually forgot who Eärendil was or why Westernesse was so important, but even without full understanding, these elements hint at the deep background of this world. They are a significant part of its “feel” and lore.

The poetic material is also an important part of Tolkien’s vision of the good life. Some of it can feel contrived or self-consciously “epic,” but even these poems and songs create a world in which poetry, music, and song are not restricted to professionals; they have historically been part of the fabric of normal life, part of a lost world of fireplaces, courtly halls, churches, and taverns where amateur, public song and poetry used to flourish. In a world where poetry has retreated into the academy and where most song is recorded, Tolkien offers a different vision for how to use verse. (Songs build community, for instance, and are rarely sung in isolation but are offered to others in company.)

The poetic material can also be used as a teaching aid. It shows various older formal possibilities, and not all of these are simple rhymes. Tolkien was no modernist, of course, and there’s no vers libre on display here, but Tolkien loved (and translated) Anglo-Saxon poetry, which is based not on rhyme or even syllabic rhythm but on alliteration. Any particular line of poetry in this fashion will feature two to four alliterative positions that rely for their effect on the repetitive thump of the same sound.

If this is new to you, take a moment and actually read the following example aloud, giving subtle emphasis to the three “r” sounds in the first line, the three initial “d” sounds in the second, and the two “h” sounds in the third:

Arise now, arise, Riders of Théoden!

Dire deeds away, dark is it eastward.

Let horse be bridled, horn be sounded!

This kind of verse is used widely in Rohan. It can be quite thrilling to recite aloud, and it provides a great way to introduce young listeners to a different (and still powerful) poetic form. It also provides a nice segue, once LotR is over, to suggest a bit more Tolkien Anglo-Saxonism by reading his translations of Beowulf or Sir Gawain and the Green Knight.

The road ahead

If there’s interest in this sort of thing, in future installments, I’d like to cover:

  • The importance of using maps when reading aloud
  • How to keep the many, many names (and their many, many variants!) clear in readers’ minds
  • Doing (but not overdoing) character voices
  • How much backstory to fill in for new readers (Westernesse? The Valar? Morgoth?)
  • Making mementos to remind people of your long reading journey together

But for now, I’d love to hear your thoughts on reading aloud, handling long books like LotR (finding time and space, pacing oneself, etc), and vocal performance. Most importantly: Do you actually sing all the songs?

Photo of Nate Anderson

Reading Lord of the Rings aloud: Yes, I sang all the songs Read More »

simple-voltage-pulse-can-restore-capacity-to-li-si-batteries

Simple voltage pulse can restore capacity to Li-Si batteries

The new work, then, is based on a hypothetical: What if we just threw silicon particles in, let them fragment, and then fixed them afterward?

As mentioned, the reason fragmentation is a problem is that it leads to small chunks of silicon that have essentially dropped off the grid—they’re no longer in contact with the system that shuttles charges into and out of the electrode. In many cases, these particles are also partly filled with lithium, which takes it out of circulation, cutting the battery’s capacity even if there’s sufficient electrode material around.

The researchers involved here, all based at Stanford University, decided there was a way to nudge these fragments back into contact with the electrical system and demonstrated it could restore a lot of capacity to a badly degraded battery.

Bringing things together

The idea behind the new work was that it could be possible to attract the fragments of silicon to an electrode, or at least some other material connected to the charge-handling network. On their own, the fragments in the anode shouldn’t have a net charge; when the lithium gives up an electron there, it should go back into solution. But the lithium is unlikely to be evenly distributed across the fragment, making them a polar material—net neutral, but with regions of higher and lower electron densities. And polar materials will move in an uneven electric field.

And, because of the uneven, chaotic structure of an electrode down at the nano scale, any voltage applied to it will create an uneven electric field. Depending on its local structure, that may attract or repel some of the particles. But because these are mostly within the electrode’s structure, most of the fragments of silicon are likely to bump into some other part of electrode in short order. And that could potentially re-establish a connection to the electrode’s current handling system.

To demonstrate that what should happen in theory actually does happen in an electrode, the researchers started by taking a used electrode and brushing some of its surface off into a solution. They then passed a voltage through the solution and confirmed the small bits of material from the battery started moving toward one of the electrodes that they used to apply a voltage to the solution. So, things worked as expected.

Simple voltage pulse can restore capacity to Li-Si batteries Read More »

rocket-report:-bloomberg-calls-for-sls-cancellation;-spacex-hits-century-mark

Rocket Report: Bloomberg calls for SLS cancellation; SpaceX hits century mark


All the news that’s fit to lift

“For the first time, Canada will host its own homegrown rocket technology.”

SpaceX’s fifth flight test ended in success. Credit: SpaceX

Welcome to Edition 7.16 of the Rocket Report! Even several days later, it remains difficult to process the significance of what SpaceX achieved in South Texas last Sunday. The moment of seeing a rocket fall out of the sky and be captured by two arms felt historic to me, as historic as the company’s first drone ship landing in April 2016. What a time to be alive.

As always, we welcome reader submissions, and if you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Surprise! Rocket Lab adds a last-minute mission. After signing a launch contract less than two months ago, Rocket Lab says it will launch a customer as early as Saturday from New Zealand on board its Electron launch vehicle. Rocket Lab added that the customer for the expedited mission, to be named “Changes In Latitudes, Changes In Attitudes,” is confidential. This is an impressive turnaround in launch times and will allow Rocket Lab to burnish its credentials for the US Space Force, which has prioritized “responsive” launch in recent years.

Rapid turnaround down under … The basic idea is that if an adversary were to take out assets in space, the military would like to be able to rapidly replace them. “This quick turnaround from contract to launch is not only a showcase of Electron’s capability, but also of the relentless and fast-paced execution by the experienced team behind it that continues to deliver trusted and reliable access to space for our customers,” Rocket Lab Chief Executive Peter Beck said in a statement. (submitted by EllPeaTea and Ken the Bin)

Canadian spaceport and rocket firm link up. A Canadian spaceport developer, Maritime Launch Services, says it has partnered with a Canadian rocket firm, Reaction Dynamics. Initially, Reaction Dynamics will attempt a suborbital launch from the Nova Scotia-based spaceport. This first mission will serve as a significant step toward enabling Canada’s first-ever orbital launch of a domestically developed rocket, Space Daily reports.

A homegrown effort … “For the first time, Canada will host its own homegrown rocket technology, launched from a Canadian-built commercial spaceport, offering launch vehicle and satellite customers the opportunity to reach space without leaving Canadian soil,” said Stephen Matier, president and CEO of Maritime Launch. Reaction Dynamics is developing the Aurora rocket, which uses hybrid-propulsion technology and is projected to have a payload capacity of 200 kg to low-Earth orbit. (submitted by Joey Schwartz and brianrhurley)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Sirius completes engine test campaign. French launch startup Sirius Space Services said Thursday that it had completed a hot fire test campaign of the thrust chamber for its STAR-1 rocket engine, European Spaceflight reports. During the campaign, the prototype completed two 60-second hot fire tests powered by liquid methane and liquid oxygen. The successful completion of the testing validates the design of the STAR-1 thrust chamber. Full-scale engine testing may begin during the second quarter of next year.

A lot of engines needed … Sirius Space Services is developing a range of three rockets that all use a modular booster system. Sirius 1 will be a two-stage single-stick rocket capable of delivering 175 kilograms to low-Earth orbit. Sirius 13 will feature two strap-on boosters and will have the capacity to deliver 600 kilograms. Finally, the Sirius 15 rocket will feature four boosters and will be capable of carrying payloads of up to 1,000 kilograms. (submitted by Ken the Bin)

SpaceX, California commission lock horns over launch rates. Last week the California Coastal Commission rejected a plan agreed to between SpaceX and the US Space Force to increase the number of launches from Vandenberg Space Force Base to as many as 50 annually, the Los Angeles Times reports. The commission voted 6–4 to block the request to increase from a maximum of 36 launches. In rejecting the plan, some members of the commission cited their concerns about Elon Musk, the owner of SpaceX. “We’re dealing with a company, the head of which has aggressively injected himself into the presidential race,” commission Chair Caryl Hart said.

Is this a free speech issue? … SpaceX responded to the dispute quickly, suing the California commission in federal court on Tuesday, Reuters reports. The company seeks an order that would bar the agency from regulating the company’s workhorse Falcon 9 rocket launch program at Vandenberg. The lawsuit claims the commission, which oversees use of land and water within the state’s more than 1,000 miles of coastline, unfairly asserted regulatory powers. Musk’s lawsuit called any consideration of his public statements improper, violating speech rights protected by the US Constitution. (submitted by brianrhurley)

SpaceX launches 100th rocket of the year. SpaceX launched its 100th rocket of the year early Tuesday morning and followed it up with another liftoff just hours later, Space.com reports. SpaceX’s centenary mission of the year lifted off from Florida with a Falcon 9 rocket carrying 23 of the company’s Starlink Internet satellites aloft.

Mostly Falcon 9s … The company followed that milestone with another launch two hours later from the opposite US coast. SpaceX’s 101st liftoff of 2024 saw 20 more Starlinks soar to space from Vandenberg Space Force Base in California. The company has already exceeded its previous record for annual launches, 98, set last year. The company’s tally in 2023 included 91 Falcon 9s, five Falcon Heavies, and two Starships. This year the mix is similar. (submitted by Ken the Bin)

Fifth launch of Starship a massive success. SpaceX accomplished a groundbreaking engineering feat Sunday when it launched the fifth test flight of its gigantic Starship rocket and then caught the booster back at the launch pad in Texas with mechanical arms seven minutes later, Ars reports. This achievement is the first of its kind, and it’s crucial for SpaceX’s vision of rapidly reusing the Starship rocket, enabling human expeditions to the Moon and Mars, routine access to space for mind-bogglingly massive payloads, and novel capabilities that no other company—or country—seems close to attaining.

Catching a rocket by its tail … High over the Gulf of Mexico, the first stage of the Starship rocket used its engines to reverse course and head back toward the Texas coastline. After reaching a peak altitude of 59 miles (96 kilometers), the Super Heavy booster began a supersonic descent before reigniting 13 engines for a final braking burn. The rocket then shifted down to just three engines for the fine maneuvering required to position the rocket in a hover over the launch pad. That’s when the launch pad’s tower, dubbed Mechazilla, ensnared the rocket in its two weight-bearing mechanical arms, colloquially known as “chopsticks.” The engines switched off, leaving the booster suspended perhaps 200 feet above the ground. The upper stage of the rocket, Starship, executed what appeared to be a nominal vertical landing into the Indian Ocean as part of its test flight.

Clipper launches on Falcon Heavy. NASA’s Europa Clipper spacecraft lifted off Monday from Kennedy Space Center in Florida aboard a SpaceX Falcon Heavy rocket, Ars reports, kicking off a $5.2 billion robotic mission to explore one of the most promising locations in the Solar System for finding extraterrestrial life. Delayed several days due to Hurricane Milton, which passed through Central Florida late last week, the launch of Europa Clipper signaled the start of a five-and-a-half- year journey to Jupiter, where the spacecraft will settle into an orbit taking it repeatedly by one of the giant planet’s numerous moons.

Exploring oceans, saving money … There’s strong evidence of a global ocean of liquid water below Europa’s frozen crust, and Europa Clipper is going there to determine if it has the ingredients for life. “This is an epic mission,” said Curt Niebur, Europa Clipper’s program scientist at NASA Headquarters. “It’s a chance for us not to explore a world that might have been habitable billions of years ago, but a world that might be habitable today, right now.” The Clipper mission was originally supposed to launch on NASA’s Space Launch System rocket, but it had to be moved off that vehicle because vibrations from the solid rocket motors could have damaged the spacecraft. The change to Falcon Heavy also saved the agency $2 billion.

ULA recovers pieces of shattered booster nozzle. When the exhaust nozzle on one of the Vulcan rocket’s strap-on boosters failed shortly after liftoff earlier this month, it scattered debris across the beachfront landscape just east of the launch pad on Florida’s Space Coast, Ars reports. United Launch Alliance, the company that builds and launches the Vulcan rocket, is investigating the cause of the booster anomaly before resuming Vulcan flights. Despite the nozzle failure, the rocket continued its climb and ended up reaching its planned trajectory heading into deep space.

Not clear what the schedule impacts will be … The nozzle fell off one of Vulcan’s two solid rocket boosters around 37 seconds after taking off from Cape Canaveral Space Force Station on October 4. A shower of sparks and debris fell away from the Vulcan rocket when the nozzle failed. Julie Arnold, a ULA spokesperson, confirmed to Ars that the company has retrieved some of the debris. “We recovered some small pieces of the GEM 63XL SRB nozzle that were liberated in the vicinity of the launch pad,” Arnold said. “The team is inspecting the hardware to aid in the investigation.” ULA has not publicly said what impacts there might be on the timeline for the next Vulcan launch, USSF-106, which had been due to occur before the end of this year.

Bloomberg calls for cancellation of the SLS rocket. In an op-ed that is critical of NASA’s Artemis Program, billionaire Michael Bloomberg—the founder of Bloomberg News and a former US Presidential candidate—called for cancellation of the Space Launch System rocket. “Each launch will likely cost at least $4 billion, quadruple initial estimates,” Bloomberg wrote. “This exceeds private-sector costs many times over, yet it can launch only about once every two years and—unlike SpaceX’s rockets—can’t be reused.”

NASA is falling behind … Bloomberg essentially is calling for the next administration to scrap all elements of the Artemis Program that are not essential to establishing and maintaining a presence on the surface of the Moon. “A celestial irony is that none of this is necessary,” he wrote. “A reusable SpaceX Starship will very likely be able to carry cargo and robots directly to the moon—no SLS, Orion, Gateway, Block 1B or ML-2 required—at a small fraction of the cost. Its successful landing of the Starship booster was a breakthrough that demonstrated how far beyond NASA it is moving.” None of the arguments that Bloomberg is advancing are new, but it is noteworthy to hear them from such a prominent person who is outside the usual orbit of space policy commentators.

Artemis II likely to be delayed. A new report from the US Government Accountability Office found that NASA’s Exploration Ground Systems program—this is, essentially, the office at Kennedy Space Center in Florida responsible for building ground infrastructure to support the Space Launch System rocket and Orion—is in danger of missing its schedule for Artemis II, according to Ars Technica. The new report, published Thursday, finds that the Exploration Ground Systems program had several months of schedule margin in its work toward a September 2025 launch date at the beginning of the year. But now, the program has allocated all of that margin to technical issues experienced during work on the rocket’s mobile launcher and pad testing.

Heat shield issue also a concern … NASA also has yet to provide any additional information on the status of its review of the Orion spacecraft’s heat shield. During the Artemis I mission that sent Orion beyond the Moon in late 2022, chunks of charred material cracked and chipped away from Orion’s heat shield during reentry into Earth’s atmosphere. Once the spacecraft landed, engineers found more than 100 locations where the stresses of reentry damaged the heat shield. To prepare for the Artemis II launch next September, Artemis officials had previously said they planned to begin stacking operations of the rocket in September of this year. But so far, this activity remains on hold pending a decision on the heat shield issue.

Next three launches

Oct. 18: Falcon 9 | Starlink 8-19 | Cape Canaveral Space Force Station, Fla. | 19: 31 UTC

Oct. 19: Electron | Changes In Latitudes, Changes In Attitudes | Māhia Peninsula, New Zealand | 10: 30 UTC

Oct. 20: Falcon 9 | OneWeb no. 20 | Vandenberg Space Force Base, Calif. | 05: 09 UTC

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Rocket Report: Bloomberg calls for SLS cancellation; SpaceX hits century mark Read More »

judge-slams-florida-for-censoring-political-ad:-“it’s-the-first-amendment,-stupid”

Judge slams Florida for censoring political ad: “It’s the First Amendment, stupid”


Florida threatened TV stations over ad that criticized state’s abortion law.

A woman holding an MRI displaying a brain tumor.

Screenshot of political advertisement featuring a woman describing her experience having an abortion after being diagnosed with brain cancer. Credit: Floridians Protecting Freedom

US District Judge Mark Walker had a blunt message for the Florida surgeon general in an order halting the government official’s attempt to censor a political ad that opposes restrictions on abortion.

“To keep it simple for the State of Florida: it’s the First Amendment, stupid,” Walker, an Obama appointee who is chief judge in US District Court for the Northern District of Florida, wrote yesterday in a ruling that granted a temporary restraining order.

“Whether it’s a woman’s right to choose, or the right to talk about it, Plaintiff’s position is the same—’don’t tread on me,'” Walker wrote later in the ruling. “Under the facts of this case, the First Amendment prohibits the State of Florida from trampling on Plaintiff’s free speech.”

The Florida Department of Health recently sent a legal threat to broadcast TV stations over the airing of a political ad that criticized abortion restrictions in Florida’s Heartbeat Protection Act. The department in Gov. Ron DeSantis’ administration claimed the ad falsely described the abortion law, which could be weakened by a pending ballot question.

Floridians Protecting Freedom, the group that launched the TV ad and is sponsoring a ballot question to lift restrictions on abortion, sued Surgeon General Joseph Ladapo and Department of Health general counsel John Wilson. Wilson has resigned.

Surgeon general blocked from further action

Walker’s order granting the group’s motion states that “Defendant Ladapo is temporarily enjoined from taking any further actions to coerce, threaten, or intimate repercussions directly or indirectly to television stations, broadcasters, or other parties for airing Plaintiff’s speech, or undertaking enforcement action against Plaintiff for running political advertisements or engaging in other speech protected under the First Amendment.”

The order expires on October 29 but could be replaced by a preliminary injunction that would remain in effect while litigation continues. A hearing on the motion for a preliminary injunction is scheduled for the morning of October 29.

The pending ballot question would amend the state Constitution to say, “No law shall prohibit, penalize, delay, or restrict abortion before viability or when necessary to protect the patient’s health, as determined by the patient’s healthcare provider. This amendment does not change the Legislature’s constitutional authority to require notification to a parent or guardian before a minor has an abortion.”

Walker’s ruling said that Ladapo “has the right to advocate for his own position on a ballot measure. But it would subvert the rule of law to permit the State to transform its own advocacy into the direct suppression of protected political speech.”

Federal Communications Commission Chairwoman Jessica Rosenworcel recently criticized state officials, writing that “threats against broadcast stations for airing content that conflicts with the government’s views are dangerous and undermine the fundamental principle of free speech.”

State threatened criminal proceedings

The Floridians Protecting Freedom advertisement features a woman who “recalls her decision to have an abortion in Florida in 2022,” and “states that she would not be able to have an abortion for the same reason under the current law,” Walker’s ruling said.

Caroline, the woman in the ad, states that “the doctors knew if I did not end my pregnancy, I would lose my baby, I would lose my life, and my daughter would lose her mom. Florida has now banned abortion even in cases like mine. Amendment 4 is going to protect women like me; we have to vote yes.”

The ruling described the state government response:

Shortly after the ad began running, John Wilson, then general counsel for the Florida Department of Health, sent letters on the Department’s letterhead to Florida TV stations. The letters assert that Plaintiff’s political advertisement is false, dangerous, and constitutes a “sanitary nuisance” under Florida law. The letter informed the TV stations that the Department of Health must notify the person found to be committing the nuisance to remove it within 24 hours pursuant to section 386.03(1), Florida Statutes. The letter further warned that the Department could institute legal proceedings if the nuisance were not timely removed, including criminal proceedings pursuant to section 386.03(2)(b), Florida Statutes. Finally, the letter acknowledged that the TV stations have a constitutional right to “broadcast political advertisements,” but asserted this does not include “false advertisements which, if believed, would likely have a detrimental effect on the lives and health of pregnant women in Florida.” At least one of the TV stations that had been running Plaintiff’s advertisement stopped doing so after receiving this letter from the Department of Health.

The Department of Health claimed the ad “is categorically false” because “Florida’s Heartbeat Protection Act does not prohibit abortion if a physician determines the gestational age of the fetus is less than 6 weeks.”

Floridians Protecting Freedom responded that the woman in the ad made true statements, saying that “Caroline was diagnosed with stage four brain cancer when she was 20 weeks pregnant; the diagnosis was terminal. Under Florida law, abortions may only be performed after six weeks gestation if ‘[t]wo physicians certify in writing that, in reasonable medical judgment, the termination of the pregnancy is necessary to save the pregnant woman’s life or avert a serious risk of substantial and irreversible physical impairment of a major bodily function of the pregnant woman other than a psychological condition.'”

Because “Caroline’s diagnosis was terminal… an abortion would not have saved her life, only extended it. Florida law would not allow an abortion in this instance because the abortion would not have ‘save[d] the pregnant woman’s life,’ only extended her life,” the group said.

Judge: State should counter with its own speech

Walker’s ruling said the government can’t censor the ad by claiming it is false:

Plaintiff’s argument is correct. While Defendant Ladapo refuses to even agree with this simple fact, Plaintiff’s political advertisement is political speech—speech at the core of the First Amendment. And just this year, the United States Supreme Court reaffirmed the bedrock principle that the government cannot do indirectly what it cannot do directly by threatening third parties with legal sanctions to censor speech it disfavors. The government cannot excuse its indirect censorship of political speech simply by declaring the disfavored speech is “false.”

State officials must show that their actions “were narrowly tailored to serve a compelling government interest,” Walker wrote. A “narrowly tailored solution” in this case would be counterspeech, not censorship, he wrote.

“For all these reasons, Plaintiff has demonstrated a substantial likelihood of success on the merits,” the ruling said. Walker wrote that a ruling in favor of the state would open the door to more censorship:

This case pits the right to engage in political speech against the State’s purported interest in protecting the health and safety of Floridians from “false advertising.” It is no answer to suggest that the Department of Health is merely flexing its traditional police powers to protect health and safety by prosecuting “false advertising”—if the State can rebrand rank viewpoint discriminatory suppression of political speech as a “sanitary nuisance,” then any political viewpoint with which the State disagrees is fair game for censorship.

Walker then noted that Ladapo “has ample, constitutional alternatives to mitigate any harm caused by an injunction in this case.” The state is already running “its own anti-Amendment 4 campaign to educate the public about its view of Florida’s abortion laws and to correct the record, as it sees fit, concerning pro-Amendment 4 speech,” Walker wrote. “The State can continue to combat what it believes to be ‘false advertising’ by meeting Plaintiff’s speech with its own.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Judge slams Florida for censoring political ad: “It’s the First Amendment, stupid” Read More »