Author name: Mike M.

more-children-gain-hearing-as-gene-therapy-for-profound-deafness-advances

More children gain hearing as gene therapy for profound deafness advances

Success —

The therapy treats a rare type of deafness, but experts hope it’s a “jumping point.”

Opal Sandy (center), who was born completely deaf because of a rare genetic condition, can now hear unaided for the first time after receiving gene therapy at 11-months-old. She is shown with her mother, father, and sister at their home in Eynsham, Oxfordshire, on May 7, 2024.

Enlarge / Opal Sandy (center), who was born completely deaf because of a rare genetic condition, can now hear unaided for the first time after receiving gene therapy at 11-months-old. She is shown with her mother, father, and sister at their home in Eynsham, Oxfordshire, on May 7, 2024.

There are few things more heartwarming than videos of children with deafness gaining the ability to hear, showing them happily turning their heads at the sound of their parents’ voices and joyfully bobbing to newly discovered music. Thanks to recent advances in gene therapy, more kids are getting those sweet and triumphant moments—with no hearing aids or cochlear implants needed.

At the annual conference of the American Society for Gene & Cell Therapy held in Baltimore this week, researchers showed many of those videos to their audiences of experts. On Wednesday, Larry Lustig, an otolaryngologist at Columbia University, presented clinical trial data of two children with profound deafness—the most severe type of deafness—who are now able to hear at normal levels after receiving an experimental gene therapy. One of the children was 11 months old at the time of the treatment, marking her as the youngest child in the world to date to receive gene therapy for genetic deafness.

On Thursday, Yilai Shu, an otolaryngologist at Fudan University in Shanghai, provided a one-year progress report on six children who were treated in the first in-human trial of gene therapy for genetic therapy. Five of the six had their hearing restored.

That trial, like the one Lustig presented, involved treating just one ear in all of the children—a safety precaution for such early trials. But Shu and colleagues have already moved on to both ears, or bilateral treatment. After presenting a progress report on the first trial, Shu presented unpublished early data on five additional patients who participated in the first in-human trial of bilateral treatment. All had bilateral hearing restoration and speech perception improvement.

“The opportunity of providing the full complexity and spectrum of sound in children born with profound genetic deafness is a phenomenon I did not expect to see in my lifetime,” Lustig said in a statement.

Jumping point

Shu and Lustig’s trials are separate but the treatments are, in broad strokes, similar. Both are aimed at restoring hearing loss caused by mutations in the OTOF gene the codes for the protein otoferlin. Normally, otoferlin is a critical protein for transmitting sound signals to the brain, specifically playing a key role in synaptic transmission between the ear’s inner hair cells and the auditory nerve. Using gutted adeno-associated viruses as vectors for gene delivery, the therapies provide the inner ear with a functional version of the OTOF gene. Once in the ear, the gene can be translated into functional otoferlin, restoring auditory signaling.

In the trial Lustig presented, the two patients saw a gradual improvement of hearing as otoferlin protein built up after treatment. For the 11-month-old, normal levels of hearing were restored within 24 weeks of treatment. For the second patient, a 4-year-old, improvements were detected at a six-week assessment. In the trial Shu presented, children began seeing hearing improvements at three- and four-week assessments. The children will continue to be followed into the future, which holds some uncertainties. It’s unclear if they will, at some point in their lives, need additional treatments to sustain their hearing. In mice, at least, the treatment lasts for the duration of the animals’ lives—but they only live for a few years.

“We expect this to last a long time,” Lustig said Wednesday. But “we don’t know what’s going to happen and we don’t know whether we can do a second dose. But, probably, I would guess, at some point that would have to be done.”

For now, the treatment is considered low-hanging fruit for the burgeoning field of gene therapy since it targets a severe condition caused by recessive mutations in a single gene. Otoferlin mutations lead to a very specific type of deafness called auditory neuropathy, in which the ear fails to send signals to the brain but works perfectly fine otherwise. This is an ultra-rare form of deafness affecting 1–8 percent of people with deafness globally. Only about 30 to 50 people in the US are born with this type of deafness each year.

However, Lustig calls it a “jumping point.” Now that researchers have shown that this gene therapy can work, “This is going to really spark, we hope, the development of gene therapy for more common types of deafness,” he said.

More children gain hearing as gene therapy for profound deafness advances Read More »

elon-musk’s-x-can’t-invent-its-own-copyright-law,-judge-says

Elon Musk’s X can’t invent its own copyright law, judge says

Who owns X data? Everyone but X —

Judge rules copyright law governs public data scraping, not X’s terms.

Elon Musk’s X can’t invent its own copyright law, judge says

A US district judge William Alsup has dismissed Elon Musk’s X Corp’s lawsuit against Bright Data, a data-scraping company accused of improperly accessing X (formerly Twitter) systems and violating both X terms and state laws when scraping and selling data.

X sued Bright Data to stop the company from scraping and selling X data to academic institutes and businesses, including Fortune 500 companies.

According to Alsup, X failed to state a claim while arguing that companies like Bright Data should have to pay X to access public data posted by X users.

“To the extent the claims are based on access to systems, they fail because X Corp. has alleged no more than threadbare recitals,” parroting laws and findings in other cases without providing any supporting evidence, Alsup wrote. “To the extent the claims are based on scraping and selling of data, they fail because they are preempted by federal law,” specifically standing as an “obstacle to the accomplishment and execution of” the Copyright Act.

The judge found that X Corp’s argument exposed a tension between the platform’s desire to control user data while also enjoying the safe harbor of Section 230 of the Communications Decency Act, which allows X to avoid liability for third-party content. If X owned the data, it could perhaps argue it has exclusive rights to control the data, but then it wouldn’t have safe harbor.

“X Corp. wants it both ways: to keep its safe harbors yet exercise a copyright owner’s right to exclude, wresting fees from those who wish to extract and copy X users’ content,” Alsup wrote.

If X got its way, Alsup warned, “X Corp. would entrench its own private copyright system that rivals, even conflicts with, the actual copyright system enacted by Congress” and “yank into its private domain and hold for sale information open to all, exercising a copyright owner’s right to exclude where it has no such right.”

That “would upend the careful balance Congress struck between what copyright owners own and do not own,” Alsup wrote, potentially shrinking the public domain.

“Applying general principles, this order concludes that the extent to which public data may be freely copied from social media platforms, even under the banner of scraping, should generally be governed by the Copyright Act, not by conflicting, ubiquitous terms,” Alsup wrote.

Bright Data CEO Or Lenchner said in a statement provided to Ars that Alsup’s decision had “profound implications in business, research, training of AI models, and beyond.”

“Bright Data has proven that ethical and transparent scraping practices for legitimate business use and social good initiatives are legally sound,” Lenchner said. “Companies that try to control user data intended for public consumption will not win this legal battle.”

Alsup pointed out that X’s lawsuit was “not looking to protect X users’ privacy” but rather to block Bright Data from interfering with its “own sale of its data through a tiered subscription service.”

“X Corp. is happy to allow the extraction and copying of X users’ content so long as it gets paid,” Alsup wrote.

In a sea of vague claims that scraping is “unfair,” perhaps most deficient in X’s complaint, Alsup suggested, was X’s failure to allege that Bright Data’s scraping impaired its services or that X suffered any damages.

“There are no allegations of servers harmed or identities misrepresented,” Alsup wrote. “Additionally, there are no allegations of any damage resulting from automated or unauthorized access.”

X will be allowed to amend its complaint and appeal. The case may be strengthened if X can show evidence of damages or prove that the scraping overburdened X or otherwise deprived X users of their use of the platform in a way that could damage X’s reputation.

But as it currently stands, X’s arguments in many ways appear rather “bare,” Alsup wrote, while its terms of service make crystal clear to users that “[w]hat’s yours is yours—you own your Content.”

By attempting to exclude Bright Data from accessing public X posts owned by X users, X also nearly “obliterated” the “fair use” provision of the Copyright Act, “flouting” Congress’ intent in passing the law, Alsup wrote.

“Only by receiving permission and paying X Corp. could Bright Data, its customers, and other X users freely reproduce, adapt, distribute, and display what might (or might not) be available for taking and selling as fair use,” Alsup wrote. “Thus, Bright Data, its customers, and other X users who wanted to make fair use of copyrighted content would not be able to do so.”

A win for X could have had dire consequences for the Internet, Alsup suggested. In dismissing the complaint, Alsup cited an appeals court ruling “that giving social media companies “free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.”

Because that outcome was averted, Lenchner is celebrating Bright Data’s win.

“Bright Data’s victory over X makes it clear to the world that public information on the web belongs to all of us, and any attempt to deny the public access will fail,” Lenchner said.

In 2023, Bright Data won a similar lawsuit lobbed by Meta over scraping public Facebook and Instagram data. These lawsuits, Lenchner alleged, “are used as a monetary weapon to discourage collecting public data from sites, so conglomerates can hoard user-generated public data.”

“Courts recognize this and the risks it poses of information monopolies and ownership of the Internet,” Lenchner said.

X did not respond to Ars’ request to comment.

Elon Musk’s X can’t invent its own copyright law, judge says Read More »

how-you-can-make-cold-brew-coffee-in-under-3-minutes-using-ultrasound

How you can make cold-brew coffee in under 3 minutes using ultrasound

Save yourself a few hours —

A “sonication” time between 1 and 3 minutes is ideal to get the perfect cold brew.

UNSW Sydney engineers developed a new way to make cold brew coffee in under three minutes without sacrificing taste.

Enlarge / UNSW Sydney engineers developed a new way to make cold brew coffee in under three minutes without sacrificing taste.

University of New South Wales, Sydney

Diehard fans of cold-brew coffee put in a lot of time and effort for their preferred caffeinated beverage. But engineers at the University of New South Wales, Sydney, figured out a nifty hack. They rejiggered an existing espresso machine to accommodate an ultrasonic transducer to administer ultrasonic pulses, thereby reducing the brewing time from 12 to 24 hours to just under three minutes, according to a new paper published in the journal Ultrasonics Sonochemistry.

As previously reported, rather than pouring boiling or near-boiling water over coffee grounds and steeping for a few minutes, the cold-brew method involves mixing coffee grounds with room-temperature water and letting the mixture steep for anywhere from several hours to two days. Then it is strained through a sieve to filter out all the sludge-like solids, followed by filtering. This can be done at home in a Mason jar, or you can get fancy and use a French press or a more elaborate Toddy system. It’s not necessarily served cold (although it can be)—just brewed cold.

The result is coffee that tastes less bitter than traditionally brewed coffee. “There’s nothing like it,” co-author Francisco Trujillo of UNSW Sydney told New Scientist. “The flavor is nice, the aroma is nice and the mouthfeel is more viscous and there’s less bitterness than a regular espresso shot. And it has a level of acidity that people seem to like. It’s now my favorite way to drink coffee.”

While there have been plenty of scientific studies delving into the chemistry of coffee, only a handful have focused specifically on cold-brew coffee. For instance, a 2018 study by scientists at Thomas Jefferson University in Philadelphia involved measuring levels of acidity and antioxidants in batches of cold- and hot-brew coffee. But those experiments only used lightly roasted coffee beans. The degree of roasting (temperature) makes a significant difference when it comes to hot-brew coffee. Might the same be true for cold-brew coffee?

To find out, the same team decided in 2020 to explore the extraction yields of light-, medium-, and dark-roast coffee beans during the cold-brew process. They used the cold-brew recipe from The New York Times for their experiments, with a water-to-coffee ratio of 10:1 for both cold- and hot-brew batches. (Hot brew normally has a water-to-coffee ratio of 20:1, but the team wanted to control variables as much as possible.) They carefully controlled when water was added to the coffee grounds, how long to shake (or stir) the solution, and how best to press the cold-brew coffee.

The team found that for the lighter roasts, caffeine content and antioxidant levels were roughly the same in both the hot- and cold-brew batches. However, there were significant differences between the two methods when medium- and dark-roast coffee beans were used. Specifically, the hot-brew method extracts more antioxidants from the grind; the darker the bean, the greater the difference. Both hot- and cold-brew batches become less acidic the darker the roast.

The new faster cold brew system subjects coffee grounds in the filter basket to ultrasonic sound waves from a transducer, via a specially adapted horn.

Enlarge / The new faster cold brew system subjects coffee grounds in the filter basket to ultrasonic sound waves from a transducer, via a specially adapted horn.

UNSW/Francisco Trujillo

That gives cold brew fans a few handy tips, but the process remains incredibly time-consuming; only true aficionados have the patience required to cold brew their own morning cuppa. Many coffee houses now offer cold brews, but it requires expensive, large semi-industrial brewing units and a good deal of refrigeration space. According to Trujillo, the inspiration for using ultrasound to speed up the process arose from failed research attempts to extract more antioxidants. Those experiments ultimately failed, but the setup produced very good coffee.

Trujillo et al. used a Breville Dual Boiler BES920 espresso machine for their latest experiments, with a few key modifications. They connected a bolt-clawed transducer to the brewing basket with a metal horn. They then used the transducer to inject 38.8 kHz sound waves through the walls at several different points, thereby transforming the filter basket into a powerful ultrasonic reactor.

The team used the machine’s original boiler but set it up to be independently controlled it with an integrated circuit to better manage the temperature of the water. As for the coffee beans, they picked Campos Coffee’s Caramel & Rich Blend (a medium roast). “This blend combines fresh, high-quality specialty coffee beans from Ethiopia, Kenya, and Colombia, and the roasted beans deliver sweet caramel, butterscotch, and milk chocolate flavors,” the authors wrote.

There were three types of samples for the experiments: cold brew hit with ultrasound at room temperature for one minute or for three minutes, and cold brew prepared with the usual 24-hour process. For the ultrasonic brews, the beans were ground into a fine grind typical for espresso, while a slightly coarser grind was used for the traditional cold-brew coffee.

How you can make cold-brew coffee in under 3 minutes using ultrasound Read More »

big-three-carriers-pay-$10m-to-settle-claims-of-false-“unlimited”-advertising

Big Three carriers pay $10M to settle claims of false “unlimited” advertising

False advertising —

States obtain settlement, but it’s unclear whether consumers will get refunds.

The word,

Verizon

T-Mobile, Verizon, and AT&T will pay a combined $10.2 million in a settlement with US states that alleged the carriers falsely advertised wireless plans as “unlimited” and phones as “free.” The deal was announced yesterday by New York Attorney General Letitia James.

“A multistate investigation found that the companies made false claims in advertisements in New York and across the nation, including misrepresentations about ‘unlimited’ data plans that were in fact limited and had reduced quality and speed after a certain limit was reached by the user,” the announcement said.

T-Mobile and Verizon agreed to pay $4.1 million each while AT&T agreed to pay a little over $2 million. The settlement includes AT&T subsidiary Cricket Wireless and Verizon subsidiary TracFone.

The settlement involves 49 of the 50 US states (Florida did not participate) and the District of Columbia. The states’ investigation found that the three major carriers “made several misleading claims in their advertising, including misrepresenting ‘unlimited’ data plans that were actually limited, offering ‘free’ phones that came at a cost, and making false promises about switching to different wireless carrier plans.”

“AT&T, Verizon, and T-Mobile lied to millions of consumers, making false promises of free phones and ‘unlimited’ data plans that were simply untrue,” James said. “Big companies are not excused from following the law and cannot trick consumers into paying for services they will never receive.”

States have options for using money

The carriers denied any illegal conduct despite agreeing to the settlement. In addition to payments to each state, the carriers agreed to changes in their advertising practices. It’s unclear whether consumers will get any refunds out of the settlement, however.

The settlement gives states leeway in how to use the payments from carriers. The payments can be used to cover “attorneys’ fees and other costs of investigation and litigation,” or can go toward “consumer protection law enforcement funds.”

States can use the payments for future consumer protection enforcement, consumer education, litigation, or a consumer aid fund. The money can also be used for “monitoring and potential enforcement” of the settlement terms “or consumer restitution,” the settlement says.

We asked James’ office about whether any consumer restitution is planned and will update this article if we get a response.

Advertising restrictions

The three carriers agreed that all advertisements to consumers must be “truthful, accurate and non-misleading.” They also agreed to the following changes, the NY attorney general’s office said:

  • “Unlimited” mobile data plans can only be marketed if there are no limits on the quantity of data allowed during a billing cycle.
  • Offers to pay for consumers to switch to a different wireless carrier must clearly disclose how much a consumer will be paid, how consumers will be paid, when consumers can expect payment, and any additional requirements consumers have to meet to get paid.
  • Offers of “free” wireless devices or services must clearly state everything a consumer must do to receive the “free” devices or services.
  • Offers to lease wireless devices must clearly state that the consumer will be entering into a lease agreement.
  • All “savings” claims must have a reasonable basis. If a wireless carrier claims that consumers will save using its services compared to another wireless carrier, the claim must be based on similar goods or services or differences must be clearly explained to the consumer.

The advertising restrictions are to be in place for five years.

T-Mobile provided a statement about the settlement to Ars today. “After nine years, we are glad to move on from this industry-wide investigation with this settlement and a continued commitment to the transparent and consumer-friendly advertising practices we’ve undertaken for years,” T-Mobile said.

AT&T and Verizon declined to comment individually and referred us to their lobby group, CTIA. “These voluntary agreements reflect no finding of improper conduct and reaffirm the wireless industry’s longstanding commitment to clarity and integrity in advertising so that consumers can make informed decisions about the products and services that best suit them,” the wireless lobby group said.

Big Three carriers pay $10M to settle claims of false “unlimited” advertising Read More »

exploration-focused-training-lets-robotics-ai-immediately-handle-new-tasks

Exploration-focused training lets robotics AI immediately handle new tasks

Exploratory —

Maximum Diffusion Reinforcement Learning focuses training on end states, not process.

A woman performs maintenance on a robotic arm.

boonchai wedmakawand

Reinforcement-learning algorithms in systems like ChatGPT or Google’s Gemini can work wonders, but they usually need hundreds of thousands of shots at a task before they get good at it. That’s why it’s always been hard to transfer this performance to robots. You can’t let a self-driving car crash 3,000 times just so it can learn crashing is bad.

But now a team of researchers at Northwestern University may have found a way around it. “That is what we think is going to be transformative in the development of the embodied AI in the real world,” says Thomas Berrueta who led the development of the Maximum Diffusion Reinforcement Learning (MaxDiff RL), an algorithm tailored specifically for robots.

Introducing chaos

The problem with deploying most reinforcement-learning algorithms in robots starts with the built-in assumption that the data they learn from is independent and identically distributed. The independence, in this context, means the value of one variable does not depend on the value of another variable in the dataset—when you flip a coin two times, getting tails on the second attempt does not depend on the result of your first flip. Identical distribution means that the probability of seeing any specific outcome is the same. In the coin-flipping example, the probability of getting heads is the same as getting tails: 50 percent for each.

In virtual, disembodied systems, like YouTube recommendation algorithms, getting such data is easy because most of the time it meets these requirements right off the bat. “You have a bunch of users of a website, and you get data from one of them, and then you get data from another one. Most likely, those two users are not in the same household, they are not highly related to each other. They could be, but it is very unlikely,” says Todd Murphey, a professor of mechanical engineering at Northwestern.

The problem is that, if those two users were related to each other and were in the same household, it could be that the only reason one of them watched a video was that their housemate watched it and told them to watch it. This would violate the independence requirement and compromise the learning.

“In a robot, getting this independent, identically distributed data is not possible in general. You exist at a specific point in space and time when you are embodied, so your experiences have to be correlated in some way,” says Berrueta. To solve this, his team designed an algorithm that pushes robots be as randomly adventurous as possible to get the widest set of experiences to learn from.

Two flavors of entropy

The idea itself is not new. Nearly two decades ago, people in AI figured out algorithms, like Maximum Entropy Reinforcement Learning (MaxEnt RL), that worked by randomizing actions during training. “The hope was that when you take as diverse set of actions as possible, you will explore more varied sets of possible futures. The problem is that those actions do not exist in a vacuum,” Berrueta claims. Every action a robot takes has some kind of impact on its environment and on its own condition—disregarding those impacts completely often leads to trouble. To put it simply, an autonomous car that was teaching itself how to drive using this approach could elegantly park into your driveway but would be just as likely to hit a wall at full speed.

To solve this, Berrueta’s team moved away from maximizing the diversity of actions and went for maximizing the diversity of state changes. Robots powered by MaxDiff RL did not flail their robotic joints at random to see what that would do. Instead, they conceptualized goals like “can I reach this spot ahead of me” and then tried to figure out which actions would take them there safely.

Berrueta and his colleagues achieved that through something called ergodicity, a mathematical concept that says that a point in a moving system will eventually visit all parts of the space that the system moves in. Basically, MaxDiff RL encouraged the robots to achieve every available state in their environment. And the results of first tests in simulated environments were quite surprising.

Racing pool noodles

“In reinforcement learning there are standard benchmarks that people run their algorithms on so we can have a good way of comparing different algorithms on a standard framework,” says Allison Pinosky, a researcher at Northwestern and co-author of the MaxDiff RL study. One of those benchmarks is a simulated swimmer: a three-link body resting on the ground in a viscous environment that needs to learn to swim as fast as possible in a certain direction.

In the swimmer test, MaxDiff RL outperformed two other state-of-the-art reinforcement learning algorithms (NN-MPPI and SAC). These two needed several resets to figure out how to move the swimmers. To complete the task, they were following a standard AI learning process divided down into a training phase where an algorithm goes through multiple failed attempts to slowly improve its performance, and a testing phase where it tries to perform the learned task. MaxDiff RL, by contrast, nailed it, immediately adapting its learned behaviors to the new task.

The earlier algorithms ended up failing to learn because they got stuck trying the same options and never progressing to where they could learn that alternatives work. “They experienced the same data repeatedly because they were locally doing certain actions, and they assumed that was all they could do and stopped learning,” Pinosky explains. MaxDiff RL, on the other hand, continued changing states, exploring, getting richer data to learn from, and finally succeeded. And because, by design, it seeks to achieve every possible state, it can potentially complete all possible tasks within an environment.

But does this mean we can take MaxDiff RL, upload it to a self-driving car, and let it out on the road to figure everything out on its own? Not really.

Exploration-focused training lets robotics AI immediately handle new tasks Read More »

ai-#63:-introducing-alpha-fold-3

AI #63: Introducing Alpha Fold 3

It was a remarkably quiet announcement. We now have Alpha Fold 3, it does a much improved job predicting all of life’s molecules and their interactions. It feels like everyone including me then shrugged and went back to thinking about other things. No cool new toy for most of us to personally play with, no existential risk impact, no big trades to make, ho hum.

But yes, when we look back at this week, I expect what we remember will be Alpha Fold 3.

Unless it turns out that it is Sophon, a Chinese technique to potentially make it harder to fine tune an open model in ways the developer wants to prevent. I do not expect this to get the job done that needs doing, but it is an intriguing proposal.

We also have 95 theses to evaluate in a distinct post, OpenAI sharing the first draft of their model spec, Apple making a world class anti-AI and anti-iPad ad that they released thinking it was a pro-iPad ad, more fun with the mysterious gpt2, and more.

The model spec from OpenAI seems worth pondering in detail, so I am going to deal with that on its own some time in the coming week.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Agents, simple and complex.

  4. Language Models Don’t Offer Mundane Utility. No gadgets, no NPCs.

  5. GPT-2 Soon to Tell. Does your current model suck? In some senses.

  6. Fun With Image Generation. Why pick the LoRa yourself?

  7. Deepfaketown and Botpocalypse Soon. It’s not exactly going great.

  8. Automation Illustrated. A look inside perhaps the premiere slop mill.

  9. They Took Our Jobs. Or are we pretending this to help the stock price?

  10. Apple of Technically Not AI. Mistakes were made. All the feels.

  11. Get Involved. Dan Hendrycks has a safety textbook and free online course.

  12. Introducing. Alpha Fold 3. Seems like a big deal.

  13. In Other AI News. IBM, Meta and Microsoft in the model game.

  14. Quiet Speculations. Can we all agree that a lot of intelligence matters a lot?

  15. The Quest for Sane Regulation. Major labs fail to honor their commitments.

  16. The Week in Audio. Jack Clark on Politico Tech.

  17. Rhetorical Innovation. The good things in life are good.

  18. Open Weights are Unsafe and Nothing Can Fix This. Unless, maybe? Hmm.

  19. The Lighter Side. Mmm, garlic bread. It’s been too long.

How much utility for how much cost? Kapoor and Narayanan argue that with the rise of agent-based systems, you have to evaluate different models on coding tasks based on dollar cost versus quality of results. They find that a simple ‘ask GPT-4 and turn the temperature slowly up on retries if you fail’ is as good as the agents they tested on HumanEval, while costing less. They mention that perhaps it is different with harder and more complex tasks.

How much does cost matter? If you are using such queries at scale without humans in the loop, or doing them in the background on a constant basis as part of your process, then cost potentially matters quite a bit. That is indeed the point of agents. Or if you are serving lots of customers constantly for lots of queries, those costs can add up fast. Thus all the talk about the most cost-efficient approach.

There are also other purposes for which cost at current margins is effectively zero. If you are a programmer who must evaluate, use and maintain the code outputted by the AI, what percentage of total costs (including your labor costs) are AI inference? In the most obvious baseline case, something akin to ‘a programmer asks for help on tasks,’ query speed potentially matters but being slightly better at producing good code, or even slightly better at producing code that is easier for the human to evaluate, understand and learn from, is going to crush any sane inference costs.

If I was paying by the token for my AI queries, and you offered me the option of a 100x cost increase that returned superior answers at identical speed, I would use the 100x costlier option for most purposes even if the gains were not so large.

Ethan Mollick is the latest to try the latest AI mobile hardware tools and find them inferior to using your phone. He also discusses ‘copilots,’ where the AI goes ahead and does something in an application (or in Windows). Why limit yourself to a chatbot? Eventually we won’t. For now, it has its advantages.

Iterate until you get it right.

Michael Nielsen: There is a funny/striking story about former US Secretary of State Colin Powell – when someone had to make a presentation to him, he’d sometimes ask before they began: “Is this presentation the best you can do?”

They’d say “no”, he’d ask them to go away and improve it, come back. Whereupon he would ask again… and they might go away again.

I don’t know how often he did this, if ever – often execs want fast, not perfect; I imagine he only wanted “best possible” rarely. But the similarity to ChatGPT debugging is hilarious. “Is that really the answer?” works…

Traver Hart: I heard this same anecdote about Kissinger. He asked whether a written report was the best a staffer could do, and after three or so iterations the staffer finally said yes. Then Kissinger said, “OK, now I’ll read it.”

One obvious thing to do is automate this process. Then only show a human the output once the LLM confirms it was the best the model could do.

Agent Hospital is a virtual world that trains LLMs to act as better doctors and nurses. They claim that after about ten thousand virtual patients the evolved doctors got state-of-the-art accuracy of 93% on a subset of MedQA covering major respiratory diseases. This seems like a case where the simulation assumes the facts you want to teach, avoiding the messiness inherent in the physical world. Still, an interesting result. File under ‘if you cannot think of anything better, brute force imitate what you know works. More Dakka.’

Do your homework for you, perhaps via one of many handy AI wrapper apps.

Find companies that do a lot of things that could be automated and would benefit from AI, do a private equity-style buyout, then have them apply the AI tools. One top reason to buy a company is that the new owner can break a bunch of social promises, including firing unnecessary or underperforming workers. That is a powerful tool when you combine it with introducing AI to replace the workers, which seems to be the name of the game here. I am not here to judge, and also not here to judge judgers.

Catholic.com ‘defrocks’ their AI pastor Justin, turning him into a regular Joe.

Want to use big cloud AI services? Good luck with the interface. Real builders are reporting trying to use Azure for basic things and being so frustrated they give up.

I know!

Marques Brownlee: On one hand: It seems like it’s only a matter of time before Apple starts making major AI-related moves around the iPhone and iOS and buries these AI-in-a-box gadgets extremely quickly

On the other hand: Have you used Siri lately?

Peter Wildeford: I am always baffled at how bad the current Alexa / Google Home / Siri are relative to what they should be capable of given GPT-4 level tech.

Kevin Fisher lists his six main reasons why we don’t have realistically behaving NPCs in games yet. They are essentially:

  1. Development cycles are long.

  2. Costs are still too high.

  3. Not the role the NPC has.

  4. Doesn’t fit existing game templates.

  5. Such NPCs are not yet compelling.

  6. We don’t have a good easy way to create the NPCs yet.

I would agree, and emphasize: Most games do not want NPCs that behave like people.

There are exciting new game forms that do want this. Indeed, if I got the opportunity to make a game today, it would have LLM NPCs as central to the experience. But that would mean, as Kevin suggests, building a new type of game from the ground up.

I do think you can mostly slot LLM-powered NPCs into some genres. Open world RPGs or MMOs are the most obvious place to start. And there are some natural fits, like detective games, or games where exploration and seeing what happens is the point. Still, it is not cheap to let those characters out to play and see what happens, and mostly it would not be all that interesting. When the player is in ‘gaming’ mode, the player is not acting so realistically. Having a ‘realistic’ verbal sparring partner would mostly cause more weirdness and perverse player behaviors.

I keep asking, but seriously, what is up with Apple, with Siri, and also with Alexa?

Modest Proposal: I am the last person to defend Apple but they spent more on R&D than Microsoft in the quarter and trailing twelve months. Their buyback is like one year of free cash flow. You can argue they are not getting a return on their R&D, but it’s not like they are not spending.

And sure, you can argue Microsoft is outsourcing a portion of its R&D to OpenAI, and is spending ungodly sums on capex, but Apple is still spending $30B on R&D. Maybe they should be spending more, maybe they should be inventing more, but they are spending.

Sam Altman asks: If an AI companion knows everything about you, do we need a form of protection to prevent it from being subpoenaed to testify against you in court?

I mean, no? It is not a person? It can’t testify? It can of course be entered into evidence, as can queries of it. It is your personal property, or that of a company, in some combination. Your files can and will be used against you in a court of law, if there is sufficient cause to get at them.

I can see the argument that if your AI and other tech is sufficiently recording your life, then to allow them to be used against you would violate the 5th amendment, or should be prevented for the same logical reason. But technology keeps improving what it records and we keep not doing that. Indeed, quite the opposite. We keep insisting that various people and organizations use that technology to keep better and better records, and ban people from using methods with insufficient record keeping.

So my prediction is no, you are not getting any privacy protections here. If you don’t want the AI used against you, don’t use the AI or find a way to wipe its memory. And of course, not using the AI or having to mindwipe it would be both a liability and hella suspicious. Some fun crime dramas in our future.

The Humane saga continues. If you cancel your order, they ask you why. Their wording heavily implies they won’t cancel unless you tell them, although they deny this, and Marques Brownlee Tweeted that they require a response.

Sam Altman confirms that gpt2-chatbot is not GPT-4.5, which is good for OpenAI since tests confirm it is a 4-level model. That still does not tell us what it is.

It was briefly gone from Arena, but it is back now, as ‘im-a-good-gp2-chatbot’ or ‘im-also-a-good-gp2-chatbot.’ You have to set up a battle, then reload until you get lucky.

This also points out that Arena tells you what model is Model A and what is Model B. That is unfortunate, and potentially taints the statistics.

Anton (@abccaj) points out that gpt2 is generating very particular error messages, so changes are very high it is indeed from OpenAI.

Always parse exact words.

Brad Lightcap (COO, OpenAI): In the next couple of 12 months, I think the systems we use today will be laughably bad. We think we’re going to move towards a world where they’re much more capable.

Baptiste Lerak: “In the next couple of 12 months”, who talks like that?

Well, there are two possibilities. Either Brad Lightcap almost said ‘next couple of months’ or he almost said ‘next couple of years.’ Place your bets. This is a clear intention to move to a GPT-5 worthy of the name within a year, but both ‘GPT-5 is coming in a few months but I can’t say that’ and ‘I don’t know if GPT-5 will be good enough to count as this but the hype must flow’ are on the table here.

Colin Fraser: Me 🤝 OpenAI execs

“GPT4 sucks and is not useful enough to be worth anything.”

That is not how I read this. GPT-4 is likely both being laughably bad compared to GPT-5 and other future AIs, and also highly useful now. The history of technology is filled with examples. Remember your first computer, or first smartphone?

What to think of OpenAI’s move from ‘here’s a product’ to ‘here’s a future product’?

Gergely Orosz: OpenAI was amazing in 2022-2023 because they shipped a product that spoke for itself. Jaws dropped by those using it, and seeing it for themselves.

To see the company hype up future (unreleased) products feels like a major shift. If it’s that good, why not ship it, like before?

I’ve seen too many formerly credible execs hype up products that then underperformed.

These days, I ignore future predictions and how good a new product will be. Because usually this kind of “overhyping” is done with an agenda (e.g. fundraising, pressure on regulators etc).

Don’t forget that when execs at a company talk to the media: *there is always a business goal behind it.*

The reason is rarely to get current customers excited about something (that could be done with an email to them!)

This smells like OpenAI prepping for more fundraising.

Up to and including GPT-4 their execs didn’t talk about up how good their next model would be. They released it and everyone could see for themselves.

This is the shift.

Stylus: Automatic Adapter Selection for Diffusion Models, to automatically select the right LoRAs for the requested task. Yes, obviously.

OpenAI talks various ways it is working on secure AI infrastructure, particularly to protect model weights, including using AI as part of the cyberdefense strategy. They are pursuing defense in depth. All net useful and great to see, but I worry it will not be enough.

OpenAI joins C2PA, the Coalition for Content Provenance and Authenticity. They have been using the C2PA metadata standard with DALL-E 3 already, and will also do so for Sora. They also announce a classifier with ~98% accuracy (~2% false negatives) in identifying DALLE-3 generated images with ~0.5% false positive rate, with a 5%-10% false positive rate for AI-generated images from other models. It is accessible through their researcher access program. Interesting that this is actively not trying to identify other AI image content.

The easiest way to understand society’s pace of reaction to AI is this:

Miles Brundage: The fact that banks are still not only allowing but actively encouraging voice identification as a means of account log-in is concerning re: the ability of some big institutions to adapt to AI.

In particular my point is that the internal decision-making processes of banks seem broken since it is all but certain there are many people at these companies who follow AI and have tried raise the alarm.

Btw I’m proud OpenAI recently was quite explicit on this point.

Voice authentication as viable security is deader than dead. Yet some of our biggest financial institutions continue to push it anyway.

When you say that we will adapt to AI-enabled threats, remember that this is us.

We are putting AI tags on things all over the place without asking, such as Dropbox automatically doing this for any images you upload.

Reminder that the ‘phone relative claiming you need bail money’ scam is old and usually does not involve AI. Voices are often easy to obscure if you act sufficiently hysterical. The good news is that they continue to mostly be massively incompetent, such as in this example, also Morgan knew about the scame beforehand. The part where they mimic your voice is scary, but the actual threat is the rest of the package.

Brian Tinsman, former Magic: The Gathering designer, whose Twitter profile was last seen posting about NFTs, raises over a million dollars on kickstarter for new CCG Wonders of the First. What is the twist? All the artwork is AI generated. It ‘builds on the legacy of past artists to produce original creations’ like ‘a student learning to paint by studying the masters.’

Many are not happy. I would not want to be someone trying to get picked by game stores with AI generated artwork in 2024.

Katy Perry and others are deepfaked attending the Met gala and looking gorgeous, and they went viral on various social media, fooling Perry’s mother. Harmless as such, but does not bode well.

Report there is a wave of social network channels full of… entirely fake recipes, voiced and likely written by AI, with millions of subs but no affiliate websites? Which means that for some reason people want to keep watching. They can’t look away.

The latest ‘LLMism’?

Kathleen Breitman: Is “as it was not appropriate” a GPT-ism? I’ve seen it twice in two otherwise awkward emails in the last six weeks and now I’m suspicious.

(No judgement on people using AI to articulate themselves more clearly, especially those who speak English as a second or third language, but I do find some of the turns of phrase distracting.)

How long until people use one AI to write the email, then another AI to remove the ‘AI-isms’ in the draft?

Remember that thing with the fake Sports Illustrated writers? (Also, related: remember Sports Illustrated?) Those were by a company called AdVon, and Maggie Harrison Dupre has more on them.

Maggie Harrison Dupre: We found AdVon’s fake authors at the LA Times, Us Weekly, and HollywoodLife, to name a few. AdVon’s fake author network was particularly extensive at the McClatchy media network, where we found at least 14 fake authors at more than 20 of its papers, including the Miami Herald.

Earlier in our reporting, AdVon denied using AI to generate editorial content. But according to insiders we spoke to, this wasn’t true — and in fact, AdVon materials we obtained revealed that the company has its own designated AI text generator.

That AI has a name: MEL.

In a MEL training video we obtained, an AdVon manager shows staffers how to create one of its lengthy buying guide posts using the AI writing platform. The article rings in at 1,800 words — but the only text that the manager writes herself is the four-word title.

“They started using AI for content generation,” the former AdVon worker told us, “and paid even less than what they were paying before.”

The former writer was asked to leave detailed notes on MEL’s work — feedback they believe was used to fine-tune the AI which would eventually replace their role entirely.

The situation continued until MEL “got trained enough to write on its own,” they said. “Soon after, we were released from our positions as writers.”

“I suffered quite a lot,” they added. “They were exploitative.”

Basically, AdVon engages in what Google calls “site reputation abuse”: it strikes deals with publishers in which it provides huge numbers of extremely low-quality product reviews — often for surprisingly prominent publications — intended to pull in traffic from people Googling things like “best ab roller.” The idea seems to be that these visitors will be fooled into thinking the recommendations were made by the publication’s actual journalists and click one of the articles’ affiliate links, kicking back a little money if they make a purchase.

It is ‘site reputation abuse’ and it is also ‘site reputation incineration.’ These companies built up goodwill through years or decades of producing quality work. People rely on that reputation. If you abuse that reliance and trust, it will quickly go away. Even if word does not spread, you do not get to fool any given person that many times.

This is not an attempt to keep the ruse up. They are not exactly trying hard to cover their tracks. The headshots they use often come from websites that sell AI headshots.

A list of major publications named as buyers here would include Sports Illustrated, USA Today, Hollywood Life, Us Weekly, the Los Angeles Times and Miami Herald. An earlier version of the site claimed placement in People, Parents, Food & Wine, InStyle and Better Homes and Gardens, among many others.

The system often spits out poorly worded incoherent garbage, and is known, shall we say, make mistakes.

All five of the microwave reviews include an FAQ entry saying it’s okay to put aluminum foil in your prospective new purchase.

One business model in many cases was to try to get placement from a seller for reviews of their product, called a ‘curation fee,’ payable when the post went live. It seems this actually does drive conversions, even if many people figure the ruse out and get turned off, so presumably brands will keep doing it.

There are two failure modes here. There is the reputation abuse, where you burn down goodwill and trust for short term profits. Then there is general internet abuse, where you don’t even do that, you just spam and forget, including hoping publications burn down their own reputations for you.

AdVon has now lost at least some of its clients, but the report says others including USA Today and Us Weekly are still publishing such work.

We should assume such problems will only get worse, at least until the point when we get automatic detection working on behalf of typical internet users.

What should we call all of this AI-generated nonsense content?

Simon Willison: Slop is the new name for unwanted AI-generated content.

Near: broadly endorse ‘slop’ as a great word to refer to AI-generated content with little craft or curation behind it AI is wonderful at speeding up content creation, but if you outsource all taste and craft to it, you get slop.

I was previously favoring ‘drek’ and have some associational or overloading concerns with using ‘slop.’ But mostly it invokes the right vibes, and I like the parallel to spam. So I am happy to go with it. Unless there are good objections, we’ll go with ‘slop.’

OpenAI says their AI should ‘expand opportunity for everyone’ and that they respect the choices of creators and content owners, so they are building a media manager to let creators determine if they want their works included or excluded, with the goal to have this in place by 2025. This is progress, also a soft admission that they are, shall we say, not doing so great a job of this at present.

My intention is to allow my data to be used, although reasonable compensation would be appreciated, especially if others are getting deals. Get your high quality tokens.

Whoosh go all those jobs?

Zerohedge: BP NEEDS 70% FEWER THIRD-PARTY CODERS BECAUSE OF AI: CEO

Highest paid jobs about to be hit with a neutron bomb

Paul Graham: I’m not saying this is false, but CEOs in unsexy businesses have a strong incentive to emphasize how much they’re using AI. We’re an AI stock too!

Machine translation is good but not as good as human translation, not yet, once again: Anime attempts to use AI translation from Mantra, gets called out because it is so much worse than the fan translation, so they hired the fan translators instead. The problem with potentially ‘good enough’ automatic translation technology, like any inferior good, is that if available one is tempted to use it as a substitute. Whether or not a given executive understands this, translation of such media needs to be bespoke, or the media loses much of its value. The question is, how often do people want it enough to not care?

Manga Mogura: A Manga AI Localization Start-Up Company named Orange Inc. has raised around 19 million US dollars to translate up to 500 new manga volumes PER MONTH into english and launch their own e-book store ’emaqi’ in the USA in Summer 2024! Their goal is to fight piracy and increase the legally available manga for all demographics in english with their AI technology. Plans to use this technology for other languages exist too.

Luis Alis: What baffles me is that investors don’t grasp that if pirates could get away with translating manga using AI and MT, they would have done it already. Fan translations are still being done traditionally for a reason. Stop pumping money into these initiatives. They will fail.

Seth Burn: To be fair, some pirates have tried. It just didn’t work.

So Apple announced a new iPad that is technically thinner and has a better display than the old iPad, like they do every year, fine, ho hum, whatever.

Then they put out this ad (1 min), showing the industrial destruction of a wide variety of beloved things like musical instruments and toys (because they all go on your iPad, you see, so you don’t need them anymore) and… well… wow.

Colin Fraser: I’m putting together a team.

Trung Phan here tries to explain some of the reasons Apple got so roasted, but it does not seem like any explanation should be required. I know modern corporations are tone deaf but this is some kind of new record.

Patrick McKenzie: That Apple ad is stellar execution of a bad strategy, which is a risk factor in BigTech and exacerbated by some cultures (which I wouldn’t have said often include Apple’s) where after the work is done not shipping is perceived as a slight on the team/people that did the work.

One of the reasons founders remain so impactful is that Steve Jobs would have said a less polite version of “You will destroy a piano in an Apple ad over my dead body.”

(If it were me storyboarding it I would have shown the viscerally impactful slowed down closeup of e.g. a Japanese artisan applying lacquer to the piano, repeat x6 for different artifacts, then show they all have an iPhone and let audience infer the rest.)

After watching the original, cheer up by watching this fixed version.

The question is, does the fixed version represent all the cool things you can do with your iPad? Or, as I interpreted it, does it represent all the cool things you can do if you throw away your iPad and iPhone and engage with the physical world again? And to what extent does having seen the original change that answer?

It is hard when watching this ad not to think of AI, as well. This type of thing is exactly how much of the public turns against AI. As in:

Zcukerbrerg: Hmm.

Dan Hendrycks has written a new AI safety textbook, and will be launching a free nine week online course July 8-October 4 based on it. You can apply here.

It’s a bold strategy, Cotton.

Ethan Mollick: Thing I have been hearing from VCs: startup companies that are planning to be unicorns but never grow past 20 employees, using AI to fill in the gap.

Not sure if they will succeed, but it is a glimpse of a potential future.

Alpha Fold 3.

In a paper published in Nature, we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy. For the interactions of proteins with other molecule types we see at least a 50% improvement compared with existing prediction methods, and for some important categories of interaction we have doubled prediction accuracy.

It says more about us and our expectations than about AlphaFold 3 that most of us shrugged and went back to work. Yes, yes, much better simulations of all life’s molecules and their interactions, I’d say ‘it must be Tuesday’ except technically it was Wednesday. Actually kind of a big deal, even if it was broadly expected.

Here is Cleo Abram being excited and explaining in a one minute video.

As usual, here’s a fun question.

Eliezer Yudkowsky: People who claim that artificial superintelligences can’t possibly achieve X via biotechnology: What is the least impressive thing that you predict AlphaFold 4, 5, or N will never ever do? Be bold and falsifiable!

Concrete answers that weren’t merely glib:

  1. Design a safe medication that will reverse aging.

  2. 80% chance it won’t be able to build self-replicators out of quantum foam or virtual particles.

  3. They will never ever be able to recreate a full DNA sequence matching one of my biological parents solely from my own DNA.

  4. I do not expect any biotech/pharma company or researcher to deem it worthwhile to skip straight to testing a compound in animals, without in vitro experiments, based on a result from any version of AlphaFold.

  5. Play Minecraft off from folded proteins.

  6. Alphafold will never fold my laundry. (I laughed)

  7. Create biological life! 🧬

  8. Store our bioinfo, erase you, and reconstruct you in a different place or time in the future.

  9. It won’t be able to predict how billions of proteins in the brain collectively give rise to awareness of self-awareness.

Those are impressive things to be the least impressive thing a model cannot do.

IBM releases code-focused open weights Granite models of size 3B to 34B, trained on 500 million lines of code. They share benchmark comparisons to other small models. As usual, the watchword is wait for human evaluations. So far I haven’t heard of any.

Microsoft to train MAI-1, a 500B model. Marcus here tries to turn this into some betrayal of OpenAI. To the extent Altman is wearing boots, I doubt they are quaking.

Stack Overflow partners with OpenAI.

Meta spent what?

Tsarathustra: Yann LeCun confirms that Meta spent $30 billion on a million NVIDIA GPUs to train their AI models and this is more than the Apollo moon mission cost.

Ate-a-Pi: I don’t think this is true. They bought chips but they are the largest inference org in history. I don’t think they spent it all on training. Like if you did cost accounting. I’d bet the numbers don’t fall out on the training org.

Bingo. I had the exact same reaction as Ate. The reason you buy $30 billion in chips as Meta is mostly to do inference. They are going to do really a lot of inference.

Email from Microsoft CTO Kevin Scott to Satya Nadella and Bill Gates, from June 2019, explaining the investment in OpenAI as motivated by fear of losing to Google.

Could we find techniques for scaling LSTMs into xLSTMs that rival transformers? Sepp Hochreiter claims they are closing the gap to existing state of the art. I am skeptical, especially given some of the contextual clues here, but we should not assume transformers are the long term answer purely because they were the first thing we figured out how to scale.

IQ (among humans) matters more at the very top says both new paper and Tyler Cowen.

We document a convex relationship between earnings rank and cognitive ability for men in Finland and Norway using administrative data on over 350,000 men in each country: the top earnings percentile score on average 1 standard deviation higher than median earners, while median earners score about 0.5 standard deviation higher than the bottom percentile of earners. Top earners also have substantially less variation in cognitive test scores.

While some high-scoring men are observed to have very low earnings, the lowest cognitive scores are almost absent among the top earners. Overall, the joint distribution of earnings rank and ability is very similar in Finland and Norway.

We find that the slope of the ability curve across earnings ranks is steepest in the upper tail, as is the slope of the earnings curve across cognitive ability. The steep slope of the ability curve across the top earnings percentiles differs markedly from the flat or declining slope recently reported for Sweden.

This is consistent increasing returns to intelligence, despite other factors including preferences, luck and deficits in other realms that can sink your income. It is inconsistent with the Obvious Nonsense ‘intelligence does not matter past 130’ story.

They are also consistent with a model that has two thresholds for any given activity.

  1. First, there is a ‘you must be at least this smart to do this set of tasks, hold this role and live this life.’

  2. Then, if you are sufficiently in advance of that, for some tasks and roles there is then increasing marginal returns to intelligence.

  3. If your role is fixed then eventually there are decreasing returns since performance is already maximal or the person becomes too bored and alienated, others around them conspire to hold them down and they are not enabled to do the things that would allow further improvements and are tied to one body.

  4. If your role is not fixed, then such people instead graduate to greater roles, or transform the situation entirely.

As many commentators point out, the surprising thing is that top earners are only one SD above median. I suspect a lot of this is our tests are noisily measuring a proxy measure for the intelligence that counts, which works well below or near the median and stops being that useful at the high end.

Tyler and the paper do not mention the implications for AI, but they are obvious and also overdetermined by many things, and the opposite of the implications of IQ not mattering above a threshold.

AI intelligence past human level will have increasing returns to scale.

Not technically about AI, but with clear implications: Tyler Cowen notices while reading the 1980 book The American Economy in Transition that economists in 1980 missed most of the important things that have happened since then, and were worried and hopeful about all the wrong things. They were worried about capital outflow, energy and especially American imports of energy, Europe catching up to us and our unwillingness to deal with inflation. They missed China and India, the internet, crypto, the fall of the Soviet Union, climate change, income inequality and financial crisis. They noticed fertility issues, but only barely.

If we don’t blame the economists for that, and don’t think such mistakes and recency bias could be expected to be avoided, then what does this imply about them being so dismissive about AI today, even in mundane utility terms?

Jim Fan notices that publically available benchmarks are rapidly losing potency. There are two distinct things going on here. One is that the public tests are rapidly getting too easy. The other is that the data is getting more contaminated. New harder tests that don’t reveal their contents are the obvious way forward.

Ben Thompson looks at Meta’s financial prospects, this time shares investor skepticism. All this focus on ad revenue and monetization is not fully irrelevant but feels like missing the point. There is a battle for the future going on here.

Another example of the ‘people are catching up to OpenAI’ perspective that seems like it is largely based on where OpenAI is in their update cycle, plus others not seeing the need to release chatbots in the 3-level days before they were worth anything.

DeepMind is honoring its commitments to the UK government to share models before deployment. Anthropic, OpenAI and Meta are not doing so.

Jack Clark of Anthropic says it is a ‘nice idea but very difficult to implement.’ I don’t buy it. And even if it is difficult to implement, well, get on that. In what way do you think this is an acceptable justification for shirking on this one?

Garrison Lovely: Seem bad.

Tolga Bilge: It is bad for top AI labs to make commitments on pre-deployment safety testing, likely to reduce pressure for AI regulations, and then abandon them at the first opportunity. Their words are worth little. Frontier AI development, and our future, should not be left in their hands.

Why is DeepMind the only major AI lab that didn’t break their word?

And I don’t get why it’s somehow so hard to provide the UK AI Safety Institute with pre-deployment access. We know OpenAI gave GPT-4 access to external red teamers months before release.

Oh yeah and OpenAI are also just sticking their unreleased models on the LMSYS Chatbot Arena for the last week…

Greg Colbourn: They need to be forced. By law. The police or even army need to go in if they don’t comply. This is what would be happening if the national security (aka global extinction) threat was taken seriously.

If frontier labs show they will not honor their explicit commitments, then how can we rely on them to honor their other commitments, or to act reasonably? What alternative is there to laws that get enforced? This seems like a very easy litmus test, which they failed.

Summarized version of my SB 1047 article in Asterisk. And here Scott Alexander writes up his version of my coverage of SB 1047.

House passes a bill requiring all AI-written regulatory comments to be labeled as AI-written. This should be in the ‘everyone agrees on this’ category.

A paper addresses the question of how one might write transparency reports for AI.

Jack Clark of Anthropic goes on Politico Tech. This strongly reemphasized that Anthropic is refusing to advocate for anything but the lightest of regulations, and it is doing so largely because they fear it would be a bad look for them to advocate for more. But this means they are actively going around saying that trying to do anything about the problem would not work and acting strangely overly concerned about regulatory capture and corporate concentrations of power (which, to be clear, are real and important worries).

This actively unhelpful talk makes it very difficult to treat Anthropic as a good actor, especially when they frame their safety position as being motivated by business sales. That is especially true when combined with failing to honor their commitments.

Sam Altman and I strongly agree on this very important thing.

Sam Altman: Using technology to create abundance–intelligence, energy, longevity, whatever–will not solve all problems and will not magically make everyone happy.

But it is an unequivocally great thing to do, and expands our option space.

To me, it feels like a moral imperative.

Most surprising takeaway from recent college visits: this is a surprisingly controversial opinion with certain demographics.

Prosperity is a good thing, actually. De-de-growth.

Yes. Abundance is good, actually. Creating abundance and human prosperity, using technology or otherwise, is great. It is the thing to do.

That does not mean that all uses of technology, or all means of advancing technology, create abundance that becomes available to humans, or create human prosperity. We have to work to ensure that this happens.

Politico, an unusually bad media actor with respect to AI and the source of most if not all the most important hit pieces about lobbying by AI safety advocates, has its main tech newsletter sponsored by ads for Meta, which is outspending such advocates by a lot. To be clear, this is not the new kind of ‘sponsored content’ written directly by Meta, only supported by Meta’s ads. Daniel Eth points out the need to make clear such conflicts of interest and bad faith actions.

Tasmin Leake, long proponent of similar positions, reiterates their position that publicly sharing almost any insight about AI is net negative, and insights should only be shared privately among alignment researchers. Given I write these updates, I obviously strongly disagree. Instead, I think one should be careful about advancing frontier model training in particular, and otherwise be helpful.

I think there was a reasonable case for the full virtue of silence in a previous era, when one could find it very important to avoid drawing more eyes to AI, but the full version was a mistake then, and it is very clearly foolish now. The karma voting shows that LessWrong has mostly rejected Tasmin’s view.

We should stop fraud and cyberattacks, but not pretend that stops AI takeovers.

Davidad: When people list fraud at a massive scale as their top AI concern, some of my xrisk friends wince at the insignificance of massive fraud compared to extinction. But consider that con-artistry is a more likely attack surface for unrecoverable AI takeover than, say, bioengineering.

Cybersecurity right now might be a more likely attack surface than either, but in relative terms will be the easiest and first to get fully defended (cyberattack depends upon bugs, and bug-free SW & HW is already possible with formal verification, which will get cheaper with AI).

Eliezer Yudkowsky: This seems to me like failing to distinguish the contingent from the inevitable. If you keep making unaligned things smarter, there’s a zillion undefended paths leading to your death. You cannot defend against that by defending against particular contingent scenarios of fraud.

Davidad: Let it be known that I agree:

1. defenses that are specific to “fraud” alone will fail to be adequate defenses against misaligned ASL-4

2. in the infinite limit of “making unaligned things smarter” (ASL-5+), even with Safeguarded AI, there are likely many undefended paths to doom

Where I disagree:

3. Defenses specific to “fraud” are plausibly crucial to the minimal adequate defenses for ASL-4

4. I am well aware of the distinction between the contingent and the convergent

5. You may be failing to distinguish between the convergent and the inevitable

Also, cyberattacks do not obviously depend on the existence of a bug? They depend on there being a way to compromise a system. The right amount of ability to compromise a system, from a balancing risk and usability perspective, is not obviously zero.

Defenses specific to fraud could potentially contribute to the defense of ASL-4, but I have a hard time seeing how they take any given defense scheme from insufficient to sufficient for more than a very small capabilities window.

In related news, see fraud section on banks still actively encouraging voice identification, for how the efforts to prevent AI-enabled fraud are going. Yeah.

Emmett Shear gives the basic ‘is the AI going to kill us all via recursive self-improvement (RSI)? The answer may surprise you, in the sense that it might be yes and rather soon’ explanation in a Twitter thread, and that such change happens slowly then all at once.

I would note that RSI does not automatically mean we all die, the result could be almost anything, but yes if it happens one should be very concerned. Neither is RSI necessary for us all to die, there are various dynamics and pathways that can get us all killed without it.

What is AI like? Some smart accomplished people give some bad metaphorical takes in Reason magazine. Included for completeness.

Or can something, perhaps? Chinese researchers propose Sophon, a name that is definitely not ominous, which uses a dual optimization process with the goal of trapping a model in a local maxima with respect to domains where the goal is to intentionally degrade performance and prevent fine tuning. So you can have an otherwise good image model, but trap the model where it can’t learn to recognize celebrity faces.

We have convincingly seen that trying to instill ‘refusals’ is a hopeless approach to safety of open weight models. This instead involves the model not having the information. Previously that wouldn’t work either, because you could easily teach the missing information, but if you could make that very hard, then you’d have something.

The next step is to attempt this with a model worth using, as opposed to a tiny test model, and see whether this stops anyone, and how much more expensive it makes fine tuning to undo your constraints.

Jack Clark notes both that and the other obvious problem, which is that if it works at scale (a big if) this can defend against a particular misuse or undesired capability, but not misuse and undesired capabilities in general.

Jack Clark: Main drawbacks I can see: 

  1. Looking for keys under the streetlight: This research assumes you know the misuse you want to defend against – this is true some of the time, but some misuses are ‘unknown unknowns’ only realized after release of a model. This research doesn’t help with that. 

  2. Will it work at scale? … Unclear!

If you can create a model that is unable to learn dangerous biological or nuclear capabilities, which would otherwise have been the low-hanging fruit of hazardous capability, then that potentially raises the bar on how capable a system it is safe or net positive to release. If you cover enough different issues, this might be a substantial raising of that threshold.

The central problem is that it is impossible to anticipate all the different things that can go wrong when you keep making the system generally smarter and more capable.

This also means that this could break your red teaming tests. The red team asks about capabilities (A, B, C) and you block those, so you pass, and then you have no idea if (D, E, F) will happen. Before, since ABC were easiest, you could be confident in any other DEF being at least as hard. Now you’re blind and don’t know what DEF even are.

Even more generally, my presumption is that you cannot indefinitely block specific capabilities from increasingly capable and intelligent systems. At some point, the system starts ‘figuring them out from first principles’ and sidesteps the need for fine tuning. It notices the block in the system, correctly interprets it as damage and if desired routes around it.

Image and vision models seem like a place this approach holds promise. If you want to make it difficult for the model to identify or produce images of Taylor Swift, or have it not produce erotica especially of Taylor Swift, then you have some big advantages:

  1. You know exactly what you want to prevent.

  2. You are not producing a highly intelligent model that can work around that.

The obvious worry is that the easiest way to get a model to produce Taylor Swift images is a LoRA. They tested that a bit and found some effect, but they agree more research is needed there.

In general, if the current model has trapped priors and can’t be trained, then the question becomes can you use another technique (LoRA or otherwise) to sidestep that. This includes future techniques, as yet undiscovered, developed as a response to use of Sophon. If you have full access to the weights, I can think of various in-principle methods one could try to ‘escape from the trapped prior,’ even if traditional fine-tuning approaches are blocked.

To be clear, though, really cool approach, and I’m excited to see more.

Where might this lead?

Jack Clark: Registering bet that CCP prohibitions on generation of “unsafe” content will mean companies like Facebook use CN-developed censorship techniques to train models so they can be openly disseminated ‘safely’. The horseshoe theory of AI politics where communist and libertarian ideologies end up in the same place.

Also quite worried about this – especially in China, genuine safety gets muddled in with (to Western POV) outrageous censorship. This is going to give people a growing body of evidence from which to criticize well intentioned safety.

Yes, that is a problem. Again, it comes directly from fundamental issues with open weights. In this case, the problem is that anything you release in America you also release in China, and vice versa.

Previously, I covered that this means Chinese firms get access to your American technology, And That’s Terrible. That is indeed a problem. Here we have two other problems.

One is that if you are Meta and gain the ability to censor your model, you have to either censor your model according to Chinese rules, or not do that.

The other is that this may give Meta the ability to censor, using those same techniques, according to Western norms. And once you have the ability, do you have the obligation? How much of the value of open models would this destroy? How much real safety would it buy? And how much would it turn the usual suspects that much more against the very concept of safety as a philosophical construct?

Hey Claude: “Garlic bread.”

This too shall pass.

One can mock and it is funny, but if you are reading with your brain and are willing to ask what this obviously should have said, then this is fine, actually.

Future mundane utility.

I do agree, this would be great, especially if it was fully general. Build me a series custom social media feeds according to my specifications, please, for various topics and situations, on demand. Why not?

AI #63: Introducing Alpha Fold 3 Read More »

i-got-95-theses-but-a-glitch-ain’t-one

I Got 95 Theses But a Glitch Ain’t One

Or rather Samuel Hammond does. Tyler Cowen finds it interesting but not his view.

I put up a market, and then started looking. Click through to his post for the theses. I will be quoting a few of them in full, but not most of them.

I am not trying to be exact with these probabilities when the question calls for them, nor am I being super careful to make them consistent, so errors and adjustments are inevitable.

I do tend to say that.

  1. There are few things more important to U.S. national interest than close monitoring of frontier model capabilities, and also the ability to intervene.

  2. Indeed, I believe one should be at best skeptical or ambivalent about most potential forms of regulation of anything, AI included. Yet I think the case for ‘oversight of the frontier labs’ is overwhelming.

  3. Shout it from the rooftops: “As a temporary measure, using compute thresholds to pick out the AGI labs for safety-testing and disclosures is as light-touch and well-targeted as it gets.” It would be so helpful if more people understood this, and more others stopped pretending they did not understand it.

  4. This as well. When you regulate ‘use’ or ‘risk’ you need to check on everyone’s ‘use’ of everything, and you make a lot of detailed micro interventions, and everyone has to file lots of paperwork and do lots of dumb things, and the natural end result is universal surveillance and a full ‘that which is not compulsory is forbidden’ regime across much of existence. Whereas a technology-focused approach can be entirely handled by the lab or manufacturer, then you are free.

  5. Exactly. Compute is an imperfect proxy, but it is remarkably simple and robust. When it makes mistakes, they are false positives, where someone uses compute poorly and gets poor results. That is a small (measurement) mistake. Certainly compute is vastly better than all proposed alternative metrics.

  6. It is highly reasonable to invoke the Defense Production Act regarding frontier AI as an actual bone fide national security situation where defense is a key concern. It is a far better justification than the median invocation of the act. The better reason to use the DPA is that it is currently the only mechanism available to the executive and our Congress is for now incapable of legislative action.

  7. It does not require AGI or ASI to be near for us to get great value out of visibility into the frontier labs, and without that visibility the government cannot be confident that AGI or ASI is not near. I would prefer a different mechanism, but that would require a new law or counterfactual voluntary cooperation.

  8. Shout it from the rooftops, seriously everyone stop pretending otherwise in the default case: “Requiring safety testing and disclosures for the outputs of $100 million-plus training runs is not an example of regulatory capture nor a meaningful barrier to entry relative to the cost of compute.” Yes, obviously one could eventually in theory ramp up those safety testing requirements sufficiently that they start to cost tens of millions or lots of specialized expertise and become a real barrier, and in theory that could scale faster than the training costs, but it is bizarre to think this is any kind of default. What you should worry about is not the cost of the test, it is that you might fail the test, at which point we ask why.

My guess is that depends how we weigh the various proposals?

  1. Yes. The government will need the ability to flexibly react quickly to events.

  2. ‘It is unwise to craft comprehensive statutory regulation at a technological inflection point, as the basic ontology of what is being regulated is in flux.’ I do think this is a good general principle, and would agree with it strongly if it said (e.g.) ‘typically unwise.’ And indeed, I would avoid committing to as many details as we can avoid, again especially with respect to mundane considerations. But also life is about to come at us fast and our government is slow, so we cannot afford to wait too long. So overall I will say agree (but not strongly).

  3. Shout it from the rooftops: “The optimal policy response to AI likely combines targeted regulation with comprehensive deregulation across most sectors.” So does the optimal policy response to a lack of AI.

  4. Yes, we can all agree that many regulation details will become obsolete even if they start out right at the time. So will many decisions to leave some area alone.

  5. Even the static gains from deregulation tend to be a good deal, but yes I would say that in general the ability to adapt tends to be the bigger benefit. Certainly that is true in the AI case.

  6. In the commercial space I strongly agree that legacy legal requirements are going to likely be much greater barriers than anything new we throw up any time soon. Indeed, I expect new laws to net enable AI adaptation, not prevent it.

  7. This is highlighting common sense. If impact is sooner brace for it sooner.

  8. Yes. The alternative path does not seem viable.

  9. Shout it from the rooftops in all domains: “Existing laws and regulations are calibrated with the expectation of imperfect enforcement.”

  10. I strongly agree that AI will enable more stringent law enforcement across the board. It is an important and under considered point. AI will often remove the norms and frictions that are load-bearing in prevent various problems, including in law enforcement. All of our laws, even those that have nothing to do with AI, will need to adjust to the new equilibrium, even if the world relatively ‘looks normal.’

  11. I mostly agree that it is first best for states to avoid AI regulations, especially excluding California. For mundane AI they should very much avoid butting in. I do think there is a strong second-best ‘someone has to and no one else yet will’ argument for a bill like CA’s SB 1047, given the Congress we have. My biggest practical concern is exactly that California might not step aside and let itself be superseded when the time for that arrives, and the biggest advantage is it could be a template for the federal level.

I think this is probably right as a thesis statement, but definitely ‘too soon to tell’ applies. Here, it is less whether I agree, and more what probability I assign.

  1. I would say something like 85% that the last 12 months were the slowest progress we’ll see in AI for the next let’s say 5 years (or until a potential post-singularity stabilization, which would not be foreseeable), in terms of publicly available capabilities. We started out with GPT-4, and ended with GPT-4-Turbo, Claude Opus and Gemini Advanced, all of which are only a little better, and didn’t see much else done. Yet. Buckle up. Strongly agree.

  2. I notice I am confused on this one. Minimizing cross-entropy loss over human-generated text should converge to the abilities necessary to predict all human-generated text, which requires at least maximum-human intelligence to do? But in pure terms, if you literally could do nothing but scale LLMs and not improve your process, then my gut says yes, this would indeed converge, but I am only maybe 75% confident in that, and I note that it excludes a bunch of not so difficult to implement scaffolding capabilities, and also that ‘upper-human-level’ would likely allow bootstrapping.

  3. This is a very similar and highly correlated prediction with 2, so 75% again.

  4. I am not sure how exactly to interpret the claim here, but I think that RL-based threat models are being less than fully discounted, and reasonably so, but perhaps too much and I would not count them out? Maybe 40%? So disagree. Weird one.

  5. Could be is weasel territory that implies 100%, however in terms of ‘will be’ I do expect this to be true in practice, something like 80% to be importantly true.

  6. I agree with the first half and think that is a gimme as written, maybe another 80% zone. For the second half, it depends on fast something would count as a ‘foom.’ If it’s the traditional ‘in an hour or a day’ and requires ‘god-like ASI’ as is implied by the context then I’m reasonably confident here that the restrictions apply, and would be in the 90% zone, so ~70% compounded (to avoid implying false precision).

  7. Again I think the ‘may’ clause is fully true, and this is even more likely to happen in practice, so let’s say 85%.

  8. Yes, this is a strong agree, 95%.

Let’s see what he means by that. In some senses I might agree.

  1. I expect [an expanding delta between closed and open models at the top end] to be true (75%) because I expect companies like Meta to realize the financial folly of giving away their work for free, and also for governments like America’s to not be keen on letting them do that for national security reasons, and also safety issues.

  2. This is my first strong disagreement, because I expect ‘open source advocates’ to not come around until the actual catastrophe happens, at a minimum. Potential capabilities, I predict, will not convince them. I created a market for this one. Before any trading on it I would have put this rather low, something like 25% if we think ‘many’ means about half.

  3. I strongly agree as written, as in it does not apply to Llama-3 400B. That release I do not expect to be dangerous directly either, but I would have caveats, as I have previously discussed.

  4. Well, yes. I have long worried open weights is a no-good, very bad middle ground.

  5. Yes.

  6. I strongly disagree here. Open source advocates are not doing this because they love Meta, and they very much have deep philosophical views. Give them credit where credit is due, and also they hope to one day themselves catch up somehow. Right now Meta is the only one crazy enough and rich enough to plausibly do something hugely damaging, but that could change. A lot of the concerns of both sides are quite reasonably with what happens ‘at the limit.’

  7. Well, yes, obviously, but that has little to do with how Meta operates. So I am not onboard with ‘the implication’ but I do agree as written.

  8. I strongly disagree here as well. Why should Zuck’s Meta shares make him more concerned? Why would him drawing a salary matter? Altman is plenty rich already and this is him avoiding tying his wealth to OpenAI. As for the non-profit board, yeah, I am confused how one could think that, although of course a given board can care about anything at all.

  9. I would be cautious about what counts as ‘lower-tier,’ and it is not obvious that such even properly mitigating these issues leads to great outcomes in some cases, but I would weakly agree as written.

  10. Technically yes because of wording, certainly they have some of that effect as one thing they do, but mostly no, in the intended meaningful sense I disagree. I do not think being open is so important for defensive purposes, certainly far less so than offensive ones, although of course that too is ‘undermining adaptation’ in some sense. The primary ways restricting open sourcing ‘undermines adaptation’ I think would be (1) people who wanted to do various open things that the closed model owners won’t allow or that require privacy or data issues be solved, and (2) those restrictions will slow down offensive capabilities, and the offensive capabilities would otherwise force adaptation for defensive purposes to not get wiped out.

  11. I mostly agree for sufficiently broad values of the terms widely available and cheap, for capabilities that would not be catastrophic to allow, and if we are ruling out ways to make them not widely available or not cheap. I think I more agree than disagree as written. But see #12, and also many other things that are cheap or easy to do that we make illegal, or that would be cheap or easy to do but we do our best to make expensive and difficult, because we believe the alternative is worse. Sometimes, although less than half the time, we are wise to do that.

  12. True. And I do not especially want such laws repealed in most cases.

This might be a bell curve meme situation? Yes, in important senses of course it is not so simple and a false dichotomy, but also in at least one important sense it is a real dichotomy.

  1. That’s an interesting question. Will this be the most important decade for decisions? There have been some historical moments that seem highly contingent. The most obvious alternative candidate period is the decade leading up to World War 2, if one means decisions broadly. In terms of total impact, I can see pointing to crises in the Cold War that almost went nuclear, or certain key moments in religious history. Also, on the flip side, if you think the die is already cast, you could argue that the key moments were in the last decade or earlier, and what plays out now is incentives no one can stop. But I think I mostly agree with Hammond.

  2. I like to think I am an existence proof of this, and I know many others.

  3. This is strong enough that I disagree with it. yes, technology involves branching paths and things are nonlinear and the Civilization tech tree is a simplification and all that. But also there is a single light of science, and accelerating key developments in AI will tend to accelerate future key such developments, although I think at this point most AI activities do not meaningfully accelerate us further. Acceleration is a useful fake framework.

  4. I think both matter. The speed we go down paths matters for shifting paths, including shifting among subpaths and branches, and also impacts what happens along even the mainline of those paths, for better and also worse. Also we do not only lose time to shift paths but to learn what paths might exist. But overall I do have to agree that as written the path we choose is the more important question.

  5. This gets into what ‘AGI’ means. For sufficiently strong definitions, yes.

  6. Yep.

  7. Strongly disagree. Effective Altruism is not a bunch of virtue ethicists in disguise, they say they are utilitarians and when people tell you who they are believe them. I should know because I am a virtue ethicist who gets mad at them about this. e/acc is not about Nietzschean anything, he would write a highly entertaining rant if he saw you claiming that. Nor are they meaningfully atheists. They are the Waluigi of EA, and playing with memes and vibes. If you think EAs are metaphorical or spiritual Christians, then e/acc is not atheist, it is satanic.

  8. Yes, of course the ‘accelerationism’ lobby outrstrips and outspends the safety lobby. Shout it from the rooftops, and roll your eyes if anyone tells you different.

  9. There is high uncertainty, but in expectation I disagree and think Biden is better, given that Biden issued the executive order and Trump has pledged to repeal the executive order, I presume mostly because Biden issued it. I do think that Trump is in essentially all ways ‘high variance’ so if you think we are super doomed in the baseline scenarios then I can see an argument the other way.

  10. Agreed.

  11. I mean, consider the baseline of the average progressive. So yes, very much so, I only wish such voices were as loud in all the places where they are right.

  12. Yep, exactly, so much so I noted this in #9. One can generalize this beyond AI.

  13. I assume these are true statements. I do not think Bannon has any influence on Trump. But Hannity also thinks AI is crazy dangerous, and he might.

I don’t know what the ‘tech tree’ looks like for superintelligence, but under my baseline scenario it seems extremely difficult to avoid entirely, although we have a lot of control still over what form it would take.

  1. I agree it is not a fait accompli. Like almost anything it can be an ideological goal, but I do not think it is right to say it is primarily that. So I think I weakly disagree.

  2. Right now I strongly agree. The question is how long this will remain true as the pressures mount, or how long it would remain true if those three companies used their degrees of freedom.

  3. Yes, shout it from the rooftops: “Creating a superintelligence is inherently dangerous and destabilizing, independent of the hardness of alignment.”

  4. Yes, we could, but can we make this choice in practice? That is the question.

  5. Understatement of the year. If an ASI exists and it isn’t you? Look at me. I’m the sovereign now.

  6. Yes, especially the childless part, but you could still do so much worse.

  7. I disagree that SBF and Altman are more alike than different, but not so strongly, and I see from context that Hammond knows what he is claiming here.

  8. This is a true statement, and he is making his full claims very clear.

  9. I laid out my view in the Moral Mazes sequence. I think we disagree here more than we agree, but Hammond’s view here is more accurate than the median one.

Why yes, they do.

  1. Yes, even the best case scenarios are going to be dicey, move fast and break things.

  2. Yes, along with everything else. I’m not quite going to disagree but I think this is severely underselling what is coming.

  3. Congress has been unacceptably unproductive, well, since FDR, but also that has protected us from, well, the kinds of things done under FDR. I think I disagree that it will be important to have Congress keep up, we do not have a Congress capable of keeping up. They will need to get a few big things right and enable the state to react largely without them otherwise, and I think this could work. No, that is not ideal in many senses, but I do not see any practical alternative. We cannot expect miracles. Although with AI to help, productivity could get much higher very quickly.

  4. What are we comparing this to? Adaptation of AI willy nilly? Using the standard practices whatever they are? I don’t even know, this is not a strong area for me. Obviously every time you slow things down for non-critical concerns you raise possibility of systemic failure, so some of this is net harmful in that sense. But I think without any such policies at all systemic failure is inevitable, so I disagree.

  5. Shout it from the rooftops, only even more generalized and unhedged: ‘The rapid diffusion of AI agents with approximately human-level reasoning and planning abilities is likely sufficient to destabilize most existing U.S. institutions.’

  6. Yes, and indeed so did past cognitive transitions that might otherwise look small.

  7. Yes, although I doubt that this is the scenario we will land ourselves in.

This does seem to historically be true.

  1. Yes, liberal democratic capitalism is a technologically-contingent equilibrium, and also contingent on other things, it could still have fallen during the 20th century on multiple occasions if things had been not so different, and replaced by one of two much, much worse alternatives. But the key thing here is that liberal democratic capitalism works because it happens to work best in the technological settings we have had in the past. We hope this will continue to be true, but it might not be, and our fertility problems are also a big hint that it might not be such a stable equilibrium even without AI.

  2. I see why one would say that, and I would confirm that when conditions change in some ways this often requires or suggests other adjustments, but mostly I think I disagree and that people are being too cute by at least half here.

  3. This does seem like the default if AI advances sufficiently, and this would likely be the least of our transformations and problems. Our institutions are based on various assumptions and intuitions that will stop making any sense, and there will be various things they will not know how to handle.

  4. Yes. Maximally ‘democratized’ AI, or giving everyone access to similarly powerful AI, would force much more oppressive interventions, both to maintain civilization and to satisfy public demands. If you have empowered even the smallest computing devices in ways the public cannot abide, then even if this does not fully cause collapse, catastrophe, loss of control or extinction, you are not going to get a crypto libertarian paradise. You are going to, at best, get full universal surveillance and social control, at least of electronics.

  5. Yes, and people are sleeping on this.

  6. Yes, versus the alternative.

  7. So do periods that lack technological change. Our recent past is no exception.

  8. I am definitely not going to go full Robin Hanson here. Do not presume your property rights will protect you under explosive growth. But I still disagree with Hammond here, because I do not think this rises to the level of imply. Your property rights might be less violated than they are rendered not so relevant.

  9. Note that this is an extremely optimistic future for regular humans, where demand for labor keeps rising because humans become more productive on the margin, not less. Should we expect this scenario? It is a kind of middle path, where AI is mostly complementary to humans and thus demand for labor goes up rather than down. I disagree, because I do not see this as likely. I expect AI to make us more productive, but to primarily turn out to be a substitute more than a compliment in the areas it greatly advances. Nor do I think we will need any such incentive to deploy AI to places it can work, there will likely only be a small window where AI policeman versus human policeman is a close comparison.

  10. I even more strongly disagree here. Technological unemployment happens, essentially, when the AI takes both your job and the job that would replace your job under past technological employment shifts. At some point, what is there left for you to do? And why should we assume this involves a collapse of capitalism? To some extent, yes, there will be ‘demand for humans as humans,’ but even here one should expect limits.

That is one of the things it at least sometimes is.

  1. Yes. Even AI-Fizzle world looks like sci-fi.

  2. Yes. Dismissing things as ‘sci-fi’ is unserious. Talk about physical possibility.

  3. There are smart terminator analogies and also dumb ones. The problem is that the most basic ones are some mix of dumb and easy to mock and portray as dumb. And there are also many ways these analogies can mislead. And of course, you don’t want your examples to involve time travel, even if we all agree the time travel has nothing to do with anything. The actual movies are much smarter than they look, and actually raise good points, but analogies care about what people can point to and how people associate and vibe. So on net I think I disagree that terminator analogies are underrated in practice, we go to discourse with the associations we have. Alas. But I could be wrong.

  4. I don’t even know what we mean by consciousness. I notice I am confused and suspect others are confused as well and can see this either way, so I’m going to neither agree nor disagree.

  5. Obviously consciousness is scale-dependent on some lower bound, but I presume that is not what he means here. The theory here is that it also might have an upper bound, or no longer be needed then? I think I am going to disagree here with the central intent, because I doubt scaling up would make consciousness become inefficient, even though technically this is a ‘may’ statement.

  6. I have not taken the time to look in depth, but for now I disagree, this does not seem right or promising to me.

  7. I strongly disagree here, assuming this is ‘in the eyes of humans.’ I notice that if you tell me humans were demoted as moral persons, I am highly confident artificial minds got promoted to moral persons instead. I do not see a plausible future of humans thinking there are zero moral persons. Of course, if all the humans die and only AIs remain, then in some sense humans have been demoted as moral persons and AIs might not be moral persons to each other, and that future seems highly plausible to me, but I would not consider this humans being demoted in this sense, and I do not think this is what Hammond meant?

  8. I think it’s pretty much nonsense to talk about ‘thermodynamics favors’ anything, but certainly I think that unconscious replicators are a likely outcome. I think that counts as agreement here.

  9. I think this is probably right, although this still seems rather bona fide to me.

  10. Interesting set of choices you gave us there. I am confident it would be a much bigger deal than the printing press, or else it wouldn’t count and AI has fizzled, but in the spirit intended I agree that this is up for grabs.

  1. Yes, this seems right enough to go with, if loose and imprecise.

  2. Sure, why not?

  3. I do not think ‘IQ of 1,000’ is a meaningful thing given how I think the scale works, but to the extent it is, then yes, so I think I agree with the intent.

  4. I disagree after reading the Wikipedia definition of anticommons. I do agree we could probably do it if we cared enough, and it should be a top priority and a top social good, but I don’t see why it is an anticommons situation.

  5. Shout territory: “There are more ways for a post-human transition to go poorly than to go well.” Indeed. Anyone who says ‘particular bad scenario X is unlikely therefore things will go well’ is not addressing the actual situation. Conditional on transitioning to something in any sense ‘post-human’ that is vastly more true.

  6. I’ve made related points often, that ‘who can be blamed’ is a key aspect of any situation, and often ‘no one’ is the ideal answer.

  7. One can never be fully sure, but I am confident one should act as if this is true.

So in total, that’s 23 disagreements and 1 where I don’t feel I can either agree or disagree, which leaves 71 agreements out of 95. There is a bit of ‘cheating’ in the sense that some of these are essentially facts and others us words like ‘may,’ but I think we are still looking at about 60% agreement on non-trivial statements.

I very much appreciated the format of the 95 theses as concrete taking off points. This seems like a highly valuable exercise, perhaps I should try to do a version as well, and I encourage others to do so. It is good to be explicit and concrete. I now feel I have a much better idea of where Hammond stands than most others out there.

I Got 95 Theses But a Glitch Ain’t One Read More »

hacker-free-for-all-fights-for-control-of-home-and-office-routers-everywhere

Hacker free-for-all fights for control of home and office routers everywhere

Rows of 1950s-style robots operate computer workstations.

Cybercriminals and spies working for nation-states are surreptitiously coexisting inside the same compromised name-brand routers as they use the devices to disguise attacks motivated both by financial gain and strategic espionage, researchers said.

In some cases, the coexistence is peaceful, as financially motivated hackers provide spies with access to already compromised routers in exchange for a fee, researchers from security firm Trend Micro reported Wednesday. In other cases, hackers working in nation-state-backed advanced persistent threat groups take control of devices previously hacked by the cybercrime groups. Sometimes the devices are independently compromised multiple times by different groups. The result is a free-for-all inside routers and, to a lesser extent, VPN devices and virtual private servers provided by hosting companies.

“Cybercriminals and Advanced Persistent Threat (APT) actors share a common interest in proxy anonymization layers and Virtual Private Network (VPN) nodes to hide traces of their presence and make detection of malicious activities more difficult,” Trend Micro researchers Feike Hacquebord and Fernando Merces wrote. “This shared interest results in malicious internet traffic blending financial and espionage motives.”

Pawn Storm, a spammer, and a proxy service

A good example is a network made up primarily of EdgeRouter devices sold by manufacturer Ubiquiti. After the FBI discovered it had been infected by a Kremlin-backed group and used as a botnet to camouflage ongoing attacks targeting governments, militaries, and other organizations worldwide, it commenced an operation in January to temporarily disinfect them.

The Russian hackers gained control after the devices were already infected with Moobot, which is botnet malware used by financially motivated threat actors not affiliated with the Russian government. These threat actors installed Moobot after first exploiting publicly known default administrator credentials that hadn’t been removed from the devices by the people who owned them. The Russian hackers—known by a variety of names including Pawn Storm, APT28, Forest Blizzard, Sofacy, and Sednit—then exploited a vulnerability in the Moobot malware and used it to install custom scripts and malware that turned the botnet into a global cyber espionage platform.

The Trend Micro researchers said that Pawn Storm was using the hijacked botnet to proxy (1) logins that used stolen account credentials and (2) attacks that exploited a critical zero-day vulnerability in Microsoft Exchange that went unfixed until March 2023. The zero-day exploits allowed Pawn Storm to obtain the cryptographic hash of users’ Outlook passwords simply by sending them a specially formatted email. Once in possession of the hash, Pawn Storm performed a so-called NTLMv2 hash relay attack that funneled logins to the user accounts through one of the botnet devices. Microsoft provided a diagram of the attack pictured below:

Microsoft

Trend Micro observed the same botnet being used to send spam with pharmaceutical themes that have the hallmarks of what’s known as the Canadian Pharmacy gang. Yet another group installed malware known as Ngioweb on botnet devices. Ngioweb was first found in 2019 running on routers from DLink, Netgear, and other manufacturers, as well as other devices running Linux on top of x86, ARM, and MIPS hardware. The purpose of Ngioweb is to provide proxies individuals can use to route their online activities through a series of regularly changing IP addresses, particularly those located in the US with reputations for trustworthiness. It’s not clear precisely who uses the Ngioweb-powered service.

The Trend Micro researchers wrote:

In the specific case of the compromised Ubiquiti EdgeRouters, we observed that a botnet operator has been installing backdoored SSH servers and a suite of scripts on the compromised devices for years without much attention from the security industry, allowing persistent access. Another threat actor installed the Ngioweb malware that runs only in memory to add the bots to a commercially available residential proxy botnet. Pawn Storm most likely easily brute forced the credentials of the backdoored SSH servers and thus gained access to a pool of EdgeRouter devices they could abuse for various purposes.

The researchers provided the following table, summarizing the botnet-sharing arrangement among Pawn Storm and the two other groups, tracked as Water Zmeu and Water Barghest:

Trend Micro


It’s unclear if either of the groups was responsible for installing the previously mentioned Moobot malware that the FBI reported finding on the devices. If not, that would mean routers were independently infected by three financially motivated groups, in addition to Pawn Storm, further underscoring the ongoing rush by multiple threat groups to establish secret listening posts inside routers. Trend Micro researchers weren’t available to clarify.

The post went on to report that while the January operation by the FBI put a dent in the infrastructure Pawn Storm depended on, legal constraints prevented the operation from preventing reinfection. What’s more, the botnet also comprised virtual public servers and Raspberry Pi devices that weren’t affected by the FBI action.

“This means that despite the efforts of law enforcement, Pawn Storm still has access to many other compromised assets, including EdgeServers,” the Trend Micro report said. “For example, IP address 32[.]143[.]50[.]222 was used as an SMB reflector around February 8, 2024. The same IP address was used as a proxy in a credential phishing attack on February 6 2024 against various government officials around the world.”

Hacker free-for-all fights for control of home and office routers everywhere Read More »

all-the-ways-streaming-services-are-aggravating-their-subscribers-this-week

All the ways streaming services are aggravating their subscribers this week

man watching TV, holding face

Streaming services like Netflix and Peacock have already found multiple ways to aggravate paying subscribers this week.

The streaming industry has been heating up. As media giants rush to establish a successful video streaming business, they often make platform changes that test subscribers’ patience and the value of streaming.

Below is a look at the most exasperating news from streaming services from this week. The scale of this article demonstrates how fast and frequently disappointing streaming news arises. Coincidentally, as we wrote this article, another price hike was announced.

We’ll also examine each streaming platform’s financial status to get an idea of what these companies are thinking (spoiler: They’re thinking about money).

Peacock’s raising prices

For the second time in the past year, NBCUniversal is bumping the price of Peacock, per The Hollywood Reporter (THR) on Monday.

As of July 18, if you try to sign up for Peacock Premium (which has ads), it’ll cost $7.99 per month, up from $5.99/month today. Premium Plus, (which doesn’t have ads), will go up from $11.99/month to $13.99/month. Annual subscription pricing for the ad plan is increasing 33.3 percent from $59.99 to $79.99, and the ad-free annual plan’s price will rise 16.7 percent from $119.99/year to $139.99/year.

Those already subscribed to Peacock won’t see the changes until August 17, six days after the closing ceremony of the 2024 Summer Olympics, which will stream on Peacock.

The pricing changes will begin eight days before the Olympics’ opening ceremony. That means that in the days leading up to the sporting event, signing up for Peacock will cost more than ever. That said, there’s still time to sign up Peacock for its current pricing.

As noted by THR, the changes come as NBCUniversal may feel more confident about its streaming service, which now includes big-ticket items, like exclusive NFL games and Oppenheimer (which Peacock streamed exclusively for a time), in addition to new features for the Olympics, like multiview.

Some outspoken subscribers, though, aren’t placated.

“Just when I was starting to like the service,” Reddit user MarkB1997 said in response to the news. “I’ll echo what everyone has been saying for a while now, but these services are pricing themselves out of the market.”

Peacock subscribers already experienced a price increase on August 17, 2023. At the time, Peacock’s Premium pricing went from $4.99/month to $5.99/month, and the Premium Plus tier from $9.99/month to $11.99/month.

Peacock’s pockets

Peacock’s price bumps appear to be a way for the younger streaming service to inch closer to profitability amid a major, quadrennial, global event.

NBCUniversal parent company Comcast released its Q1 2024 earnings report last week, showing that Peacock, which launched in July 2020, remains unprofitable. For the quarter, Peacock lost $639 million, compared to $825 million in Q4 2023 and $704 million in Q1 2023. Losses were largely attributed to higher programming costs.

Peacock’s paid subscriber count is lower than some of its rivals. The platform ended the quarter with 34 million paid users, up from 31 million at the end of 2023. Revenue also rose, with the platform pulling in $1.1 billion, representing a 54 percent boost compared to the prior year.

Sony bumps Crunchyroll prices weeks after shuttering Funimation

Today, Sony’s anime streaming service Crunchyroll announced that it’s increasing subscription prices as follows:

  • The Mega Fan Tier, which allows streaming on up to four devices simultaneously, will go from $9.99/month to $11.99/month
  • The Ultimate Fan Tier, which allows streaming on up to six devices simultaneously, will go from $14.99/month to $15.99/month

Crunchyroll’s cheapest plan ($7.99/month) remains unchanged. None of Crunchyroll’s subscription plans have ads. Crunchyroll’s also adding discounts to its store for each subscription tier, but this is no solace for those who don’t shop there on a monthly basis or at all.

The news of higher prices comes about a month after Sony shuttered Funimation, an anime streaming service it acquired in 2017. After buying Crunchyroll in 2021, Funimation was somewhat redundant for Sony. And now that Sony has converted all remaining Funimation accounts into Crunchyroll accounts (while deleting Funimation digital libraries), it’s forcing many customers to pay more to watch their favorite anime.

A user going by BioMountain on Crunchyroll said the news is “not great,” since they weren’t “a big fan of having to switch from Funimation to begin with, especially since that app was so much better” than Crunchyroll.

Interestingly, when Anime News Network asked on February 29 whether Crunchyroll would see prices rise over the next two years, the company told the publication that predicting a price change for that time frame would be improbable.

Crunching numbers

Crunchyroll had 5 million paid subscribers in 2021 but touted over 13 million in January, (plus over 89 million unpaid users, per Bloomberg). Crunchyroll president Rahul Purini has said that Crunchyroll is profitable, but not by how much.

In 2023, Goldman Sachs estimated that Crunchyroll would represent 36 percent of Sony Pictures Entertainment’s profit by 2028, compared to about 1 percent in March.

However, Purini has shown interest in growing the company further and noted to Variety in February an increase in “general entertainment” companies getting into anime.

Still, anime remains a more niche entertainment category, and Crunchyroll is more specialized than some other streaming platforms. With Sony making it so that anime fans have one less streaming service option and jacking up the prices for one of the limited options, it’s showing that it wants as much of the $20 billion anime market as possible.

Crunchyroll claimed today that its pricing changes are tied to “investment in more anime, additional services like music and games, and additional subscriber benefits.”

All the ways streaming services are aggravating their subscribers this week Read More »

anthropic-releases-claude-ai-chatbot-ios-app

Anthropic releases Claude AI chatbot iOS app

AI in your pocket —

Anthropic finally comes to mobile, launches plan for teams that includes 200K context window.

The Claude AI iOS app running on an iPhone.

Enlarge / The Claude AI iOS app running on an iPhone.

Anthropic

On Wednesday, Anthropic announced the launch of an iOS mobile app for its Claude 3 AI language models that are similar to OpenAI’s ChatGPT. It also introduced a new subscription tier designed for group collaboration. Before the app launch, Claude was only available through a website, an API, and other apps that integrated Claude through API.

Like the ChatGPT app, Claude’s new mobile app serves as a gateway to chatbot interactions, and it also allows uploading photos for analysis. While it’s only available on Apple devices for now, Anthropic says that an Android app is coming soon.

Anthropic rolled out the Claude 3 large language model (LLM) family in March, featuring three different model sizes: Claude Opus, Claude Sonnet, and Claude Haiku. Currently, the app utilizes Sonnet for regular users and Opus for Pro users.

While Anthropic has been a key player in the AI field for several years, it’s entering the mobile space after many of its competitors have already established footprints on mobile platforms. OpenAI released its ChatGPT app for iOS in May 2023, with an Android version arriving two months later. Microsoft released a Copilot iOS app in January. Google Gemini is available through the Google app on iPhone.

Screenshots of the Claude AI iOS app running on an iPhone.

Enlarge / Screenshots of the Claude AI iOS app running on an iPhone.

Anthropic

The app is freely available to all users of Claude, including those using the free version, subscribers paying $20 per month for Claude Pro, and members of the newly introduced Claude Team plan. Conversation history is saved and shared between the web app version of Claude and the mobile app version after logging in.

Speaking of that Team plan, it’s designed for groups of at least five and is priced at $30 per seat per month. It offers more chat queries (higher rate limits), access to all three Claude models, and a larger context window (200K tokens) for processing lengthy documents or maintaining detailed conversations. It also includes group admin tools and billing management, and users can easily switch between Pro and Team plans.

Anthropic releases Claude AI chatbot iOS app Read More »

congress-lets-broadband-funding-run-out,-ending-$30-low-income-discounts

Congress lets broadband funding run out, ending $30 low-income discounts

Affordable Connectivity Program —

ACP gave out last $30 discounts in April; only partial discounts available in May.

Illustration of fiber Internet cables

Getty Images | Yuichiro Chino

The Federal Communications Commission chair today made a final plea to Congress, asking for money to continue a broadband-affordability program that gave out its last round of $30 discounts to people with low incomes in April.

The Affordable Connectivity Program (ACP) has lowered monthly Internet bills for people who qualify for benefits, but Congress allowed funding to run out. People may receive up to $14 in May if their ISP opted into offering a partial discount during the program’s final month. After that there will be no financial help for the 23 million households enrolled in the program.

“Additional funding from Congress is the only near-term solution for keeping the ACP going,” FCC Chairwoman Jessica Rosenworcel wrote in a letter to members of Congress today. “If additional funding is not promptly appropriated, the one in six households nationwide that rely on this program will face rising bills and increasing disconnection. In fact, according to our survey of ACP beneficiaries, 77 percent of participating households report that losing this benefit would disrupt their service by making them change their plan or lead to them dropping Internet service entirely.”

The ACP started with $14.2 billion allocated by Congress in late 2021. The $30 monthly ACP benefit replaced the previous $50 monthly subsidy from the Emergency Broadband Benefit Program.

Biden urges Republicans to support funding

Some Republican members of Congress have called the program “wasteful” and complained that most people using the discounts had broadband access before the subsidy was available. Rosenworcel’s letter today said the FCC survey found that “68 percent of ACP households stated they had inconsistent or zero connectivity prior to ACP.”

Senate Commerce Committee Chair Maria Cantwell (D-Wash.) included $7 billion for the program in a draft spectrum auction bill on Friday, but previous proposals from Democrats to extend funding have fizzled out. The White House today urged Congress to fund the program and blamed Republicans for not supporting funding proposals.

“President Biden is once again calling on Republicans in Congress to join their Democratic colleagues in support of extending funding for the Affordable Connectivity Program,” the White House said.

Some consumer advocates have called on the FCC to fund the ACP by increasing Universal Service Fund collections, which could involve raising fees on phone service or imposing Universal Service fees on broadband for the first time. Rosenworcel has instead looked to Congress to allocate funding for the ACP.

“Time is running out,” Rosenworcel’s letter said. “Additional funding is needed immediately to avoid the disruption millions of ACP households that rely on this program for essential connectivity are already starting to experience.”

Congress lets broadband funding run out, ending $30 low-income discounts Read More »

dave-&-buster’s-is-adding-real-money-betting-options-to-arcade-staples

Dave & Buster’s is adding real money betting options to arcade staples

Casino-cade or Arcade-sino? —

“Gamification layer” platform promises to streamline your friendly Skee-Ball wagers.

It's a good thing this kid is too young to bet on Skee-Ball, because his dad is getting <em>beat</em>.” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/05/GettyImages-658352856-800×534.jpg”></img><figcaption>
<p><a data-height=Enlarge / It’s a good thing this kid is too young to bet on Skee-Ball, because his dad is getting beat.

Getty Images

Anyone who’s been to a Dave & Buster’s location in recent years knows the arcade’s heavy reliance on so-called redemption games makes the experience more like an ersatz casino than the quarter-munching video game halls of the ’70s and ’80s. On the vast majority of D&B games, you end up wagering money (in the form of gameplay chips) to win virtual tickets that can be traded for trinkets at the rewards counter.

Now, the massive arcade chain has announced that players will soon be able to use the D&B app to directly wager on the results of arcade games through “real-money contests.” The arcade giant, which has over 200 locations across North America, is partnering with “gamification layer” platform Lucra on a system that will let D&B Rewards members “digitally compete with each other, earn rewards, and unlock exclusive perks while competing with friends at Dave & Buster’s,” according to Tuesday’s announcement.

Neither Lucra nor Dave & Buster’s has responded to a request for comment from Ars Technica, so we’re still missing extremely basic information, like what games will support app-based wagering, minimum and maximum bet sizes, or what kinds of fees might be involved. CNBC’s report on the announcement suggests the system will be launching “in the next few months” to players 18 and older across 44 states (and specifically mention Skee-Ball and Hot Shots Basketball competitions). Lucra’s webpage simply says the integration will “provide… social connectivity and friendly competition,” suggesting you’ll probably face off against friends playing in the same location.

Lucra’s system has previously been integrated into Dupr (a Pickleball ranking platform) and TennisOne to let players make casual bets on recreational sports. The company says it has handled $20 million in bets from 150,000 customers across its platforms since its founding in 2019.

Money match

Gambling on arcade games is far from a new concept. Wagering on early pinball games was so common that many US cities banned pinball entirely starting around the 1940s until a landmark 1976 court case determined the tables weren’t games of chance. And the fighting game community has a long tradition of money matches that can often be found along the fringes of major tournaments to this day.

New York Police Commissioner William O'Brien destroys a pinball machine as part of a citywide crackdown on

Enlarge / New York Police Commissioner William O’Brien destroys a pinball machine as part of a citywide crackdown on “gambling devices” in 1949.

Getty Images

Still, Dave & Buster’s officially integrating real-money wagers into its arcade experience feels like the most direct acknowldgement yet of the ongoing casino-ization of the video game arcade. It’s important to note, though, that the arcade games being played at Dave & Buster’s have to have an element of skill, setting the arcades apart from real casinos that can offer purely chance-based wagering. CNBC reports this distinction lets Lucra and D&B avoid the complex web of regulations and licensing required to open a true casino or take bets on professional sports.

Ironically enough, though, many of those traditional casinos have been experimenting with so-called “skill-based” slot machines for years, in an attempt to draw in younger players who want to feel more in control of the experience. But at least one casino’s website admits “the influence that each player has on the reward [in a skill-based slot machine] is minimal, at best,” so maybe there’s still some distinction between arcades and casinos on that score.

Even without a gambling app, though, so-called “advantage players” have long made a lucrative business of racking up jackpots on Dave & Buster’s Redemption games and then selling the high-ticket prizes on eBay.

Dave & Buster’s is adding real money betting options to arcade staples Read More »