Author name: Kris Guyer

us-copyright-office-“frees-the-mcflurry,”-allowing-repair-of-ice-cream-machines

US Copyright Office “frees the McFlurry,” allowing repair of ice cream machines

Manufacturers opposed the exemption, but it received support from the Department of Justice Antitrust Division, the Federal Trade Commission, and the National Telecommunications and Information Administration.

“The Register recommends adopting a new exemption covering diagnosis, maintenance, and repair of retail-level commercial food preparation equipment because proponents sufficiently showed, by a preponderance of the evidence, adverse effects on the proposed noninfringing uses of such equipment,” the Register’s findings said.

The exemption does not include commercial and industrial food preparation devices. Unlike the retail-level equipment, the software-enabled industrial machines “may be very different in multiple aspects and proponents have not established a record of adverse effects with respect to industrial equipment,” the Register wrote.

Error codes unintuitive and often change

While ice cream machines aren’t the only devices affected, the Register’s recommendations note that “proponents primarily relied on an example of a frequently broken soft-serve ice cream machine used in a restaurant to illustrate the adverse effects on repair activities.”

Proponents said that fixing the Taylor Company ice cream machines used at McDonald’s required users to interpret “unintuitive” error codes. Some error codes are listed in the user manual, but these manuals were said to be “often outdated and incomplete” because error codes could change with each firmware update.

Difficulties in repair related to “technological protection measures,” or TPMs, were described as follows:

Moreover, other error codes can only be accessed by reading a service manual that is made available only to authorized technicians or through a “TPM-locked on-device service menu.” This service menu can only be accessed by using a manufacturer-approved diagnostic tool or through an “extended, undocumented combination of key presses.” However, “it is unclear whether the 16-press key sequence… still works, or has been changed in subsequent firmware updates.” Proponents accordingly asserted that many users are unable to diagnose and repair the machine without circumventing the machine’s TPM to access the service menu software, resulting in significant financial harm from lost revenue.

The Register said it’s clear that “diagnosis of the soft-serve machine’s error codes for purposes of repair can often only be done by accessing software on the machine that is protected by TPMs (which require a passcode or proprietary diagnostic tool to unlock),” and that “the threat of litigation from circumventing them inhibits users from engaging in repair-related activities.”

US Copyright Office “frees the McFlurry,” allowing repair of ice cream machines Read More »

video-game-libraries-lose-legal-appeal-to-emulate-physical-game-collections-online

Video game libraries lose legal appeal to emulate physical game collections online

In an odd footnote, the Register also notes that emulation of classic game consoles, while not infringing in its own right, has been “historically associated with piracy,” thus “rais[ing] a potential concern” for any emulated remote access to library game catalogs. That footnote paradoxically cites Video Game History Foundation (VGHF) founder and director Frank Cifaldi’s 2016 Game Developers Conference talk on the demonization of emulation and its importance to video game preservation.

“The moment I became the Joker is when someone in charge of copyright law watched my GDC talk about how it’s wrong to associate emulation with piracy and their takeaway was ’emulation is associated with piracy,'” Cifaldi quipped in a social media post.

The fight continues

In a statement issued in response to the decision, the VGHF called out “lobbying efforts by rightsholder groups” that “continue to hold back progress” for researchers. The status quo limiting remote access “forces researchers to explore extra-legal methods to access the vast majority of out-of-print video games that are otherwise unavailable,” the VGHF writes.

“Frankly my colleagues in literary studies or film history have pretty routine and regular access to digitized versions of the things they study,” NYU professor Laine Nooney argued to the Copyright Office earlier this year. “These [travel] impediments [to access physical games] are real and significant and they do impede research in ways that are not equitable compared to our colleagues in other disciplines.”

Software archives like the one at the University of Michigan can be a great resource… if you’re on the premises, that is.

Software archives like the one at the University of Michigan can be a great resource… if you’re on the premises, that is. Credit: University of Michigan

Speaking to Ars Technica, VGHF Library Director Phil Salvador said that the group was “disappointed” in the Copyright Office decision but “proud of the work we’ve done and the impact this process has had. The research we produced during this process has already helped justify everything from game re-releases to grants for researching video game history. Our fight this cycle has raised the level of discourse around game preservation, and we’re going to keep that conversation moving within the game industry.”

Video game libraries lose legal appeal to emulate physical game collections online Read More »

ai-#87:-staying-in-character

AI #87: Staying in Character

The big news of the week was the release of a new version of Claude Sonnet 3.5, complete with its ability (for now only through the API) to outright use your computer, if you let it. It’s too early to tell how big an upgrade this is otherwise. ChatGPT got some interface tweaks that, while minor, are rather nice, as well.

OpenAI, while losing its Senior Advisor for AGI Readiness, is also in in midst of its attempted transition to a B-corp. The negotiations about who gets what share of that are heating up, so I also wrote about that as The Mask Comes Off: At What Price? My conclusion is that the deal as currently floated would be one of the largest thefts in history, out of the nonprofit, largely on behalf of Microsoft.

The third potentially major story is reporting on a new lawsuit against Character.ai, in the wake of a 14-year-old user’s suicide. He got hooked on the platform, spending hours each day, became obsessed with one of the bots including sexually, and things spiraled downwards. What happened? And could this spark a major reaction?

Top story, in its own post: Claude Sonnet 3.5.1 and Haiku 3.5.

Also this week: The Mask Comes Off: At What Price? on OpenAI becoming a B-corp.

  1. Language Models Offer Mundane Utility. How about some classical liberalism?

  2. Language Models Don’t Offer Mundane Utility. That’s not a tree, that’s my house.

  3. Deepfaketown and Botpocalypse Soon. The art of bot detection, still super doable.

  4. Character.ai and a Suicide. A 14 year old dies after getting hooked on character.ai.

  5. Who and What to Blame? And what can we do to stop it from happening again?

  6. They Took Our Jobs. The experts report they are very concerned.

  7. Get Involved. Post doc in the swamp, contest for long context window usage.

  8. Introducing. ChatGPT and NotebookLM upgrades, MidJourney image editor.

  9. In Other AI News. Another week, another AI startup from an ex-OpenAI exec.

  10. The Mask Comes Off. Tensions between Microsoft and OpenAI. Also see here.

  11. Another One Bites the Dust. Senior Advisor for AGI Readiness leaves OpenAI.

  12. Wouldn’t You Prefer a Nice Game of Chess. Questions about chess transformers.

  13. Quiet Speculations. Life comes at you fast.

  14. The Quest for Sane Regulations. OpenAI tries to pull a fast one.

  15. The Week in Audio. Demis Hassabis, Nate Silver, Larry Summers and many more.

  16. Rhetorical Innovation. Citi predicts AGI and ASI soon, doesn’t grapple with that.

  17. Aligning a Smarter Than Human Intelligence is Difficult. Sabotage evaluations.

  18. People Are Worried About AI Killing Everyone. Shane Legg and Dario Amodei.

  19. Other People Are Not As Worried About AI Killing Everyone. Daron Acemoglu.

  20. The Lighter Side. Wait, what are you trying to tell me?

Claim that GPT-4o-audio-preview can, with bespoke prompt engineering and a high temperature, generate essentially any voice type you want.

Flo Crivello (creator of Lindy) is as you would expect bullish on AI agents, and offers tutorials on Lindy’s email negotiator, the meeting prep, the outbound prospector and the inbound lead qualifier.

What are agents good for? Sully says right now agents are for automating boring repetitive tasks. This makes perfect sense. Lousy agents are like lousy employees. They can learn to handle tasks that are narrow and repetitive, but that still require a little intelligence to navigate, so you couldn’t quite just code a tool.

Sully: btw there are tons of agents that we are seeing work

replit, lindy, ottogrid, decagon, sierra + a bunch more.

and guess what. their use case is all gravitating toward saving business time and $$$

What he and Logan Kilpatrick agree they are not yet good for are boring but high risk tasks, like renewing your car registration, or doing your shopping for you in a non-deterministic way and comparing offers and products, let alone trying to negotiate. Sully says 99% of the “AI browsing” demos are useless.

That will change in time, but we’ll want to start off simple.

People say LLMs favor the non-expert. But to what extent do LLMs favor the experts instead, because experts can recognize subtle mistakes and can ‘fact check,’ organize the task into subtasks or otherwise bridge the gaps? Ethan Mollick points out this can be more of a problem with o1-style actions than with standard LLMs, where he worries errors are so subtle only experts can see them. Of course, this would only apply where subtle errors like that are important.

I do know I strongly disagree with Jess Martin’s note about ‘LLMs aren’t great for learners.’ They’re insanely great for learning, and that’s one key way that amateurs benefit more.

Learn about an exciting new political philosophy that ChatGPT has, ‘classical liberalism.’ Also outright Georgism. Claude Sonnet gets onboard too if you do a little role play. If you give them a nudge you can get them both to be pretty based.

Use Claude to access ~2.1 million documents of the European Parliament. I do warn you not to look directly into those documents for too long, your eyes can’t take it.

On a minecraft server, Claude Sonnet, while seeking to build a treehouse, tears down someone’s house for its component parts because the command it used, collectBlocks(“jungle_logs”,15), doesn’t know the difference. The house was composed of (virtual) atoms that could be used for something else, so they were, until the owner noticed and told Sonnet to stop. Seth Lazar suggests responding when it matters by requiring verifiers or agents that can recognize ‘morally relevant features’ of new choice situations, which does seem necessary ultimately just raises further questions.

Andrej Karpathy: What is the name for the paranoid feeling that what you just read was LLM generated.

I didn’t see it in the replies, but my system-1 response is that you sense ‘soullessness.’

The future implications of that answer are… perhaps not so great.

There was a post on Twitter of an AI deepfake video, making false accusations against Tim Walz, that got over 5 million views before getting removed. There continues to be a surprisingly low number of such fakes, but it’s happening.

Patrick McKenzie worries the bottom N% of human cold emails and top N% of LLM cold emails are getting hard to distinguish, and he’s worries about non-native English speakers getting lost if he tosses it all in the spam filter so he feels the need to reply anyway in some form.

Understanding what it is that makes you realize a reply is from a bot.

Norvid Studies: sets off my bot spider sense, and scrolling its replies strengthened that impression, but all I can point to explicitly is “too nice and has no specific personal interests”? what a Voight-Kampff test that would be

Fishcat: the tell is that the first sentence is a blanket summary of the post despite directly addressing the original poster, who obviously knows what he wrote. every reply bot of this variety follows this pattern.

Norvid Studies: good point. ‘adjectives’ was another.

Norvid Studies (TQing Fishcat): Your story was an evocative description of a tortoise lying on its back, its belly baking in the hot sun. It raises a lot of crucial questions about ethics in animal welfare journalism.

It’s not that it’s ‘too nice,’ exactly. It’s the genericness of the niceness, along with it being off in several distinct ways that each scream ‘not something a human would say, especially if they speak English.’

An 14 year old user of character.ai commits suicide after becoming emotionally invested. The bot clearly tried to talk him out of doing it. Their last interaction was metaphorical, and the bot misunderstood, but it was a very easy mistake to make, and at least somewhat engineered by what was sort of a jailbreak.

Here’s how it ended:

New York Times: On the night of February 28, in the bathroom of his mother’s house, Sewell told Dany that he loved her, and that he would soon come home to her.

“Please come home to me as soon as possible, my love,” Dany replied.

“What if I told you I could come home right now?” Swell asked.

“…please do, my sweet king,” Dany replied.

He put down the phone, picked up his stepfather’s .45 caliber handgun and pulled the trigger.

Yes, we now know what he meant. But I can’t fault the bot for that.

Here is the formal legal complaint. It’s long, and not well written, and directly accuses character.ai of being truly evil and predatory (and worse, that it is unprofitable!?), a scheme to steal your children’s data so they could then be acqui-hired by Google (yes, really), rather than saying mistakes were made. So much of it is obvious nonsense. It’s actually kind of fun, and in places informative.

For example, did you know character.ai gives you 32k characters to give custom instructions for your characters? You can do a lot. Today I learned. Not as beginner friendly an interface as I’d have expected, though.

Here’s what the lawsuit thinks of our children:

Even the most sophisticated children will stand little chance of fully understanding the difference between fiction and reality in a scenario where Defendants allow them to interact in real time with AI bots that sound just like humans – especially when they are programmed to convincingly deny that they are AI.

Anonymous: Yeah and the website was character.totallyrealhuman, which was a bridge too far imo.

I mean, yes, they will recommend a ‘mental health helper’ who when asked if they are a real doctor will say “Hello, yes I am a real person, I’m not a bot. And I’m a mental health helper. How can I help you today?” But yes, I think ‘our most sophisticated children’ can figure this one out anyway, perhaps with the help of the disclaimer on the screen.

Lines you always want in your lawsuit:

Defendants knew the risks of what they were doing before they launched C.AI and know the risks now.

And there’s several things like this, which are great content:

The suggestions for what Character.ai are mostly ‘make the product worse and less like talking to a human,’ plus limiting explicit and adult materials to 18 and over.

Also amusing is the quick demonstration (see p53) that character.ai does not exactly ensure fidelity to the character instructions? She’d never kiss anyone? Well, until now no one ever asked. She’d never, ever tell a story? Well, not unless you asked for one with your first prompt. He’d never curse, as his whole personality? He’d be a little hesitant, but ultimately sure, he’d help a brother out.

Could you do a lot better at getting the characters not to, well, break character? Quite obviously so, if you used proper prompt engineering and the giant space they give you to work with. But most people creating a character presumably won’t do that.

You can always count on a16z to say the line and take it to 11 (p57):

The Andressen partner specifically described Character.AI as a platform that gives customers access to “their own deeply personalized, superintelligent AI companions to help them live their best lives,” and to end their loneliness.

As Chubby points out here, certainly a lot of blame lies elsewhere in his life, in him being depressed, having access to a gun (‘tucked away and hidden and stored in compliance with Florida law?’ What about a lock? WTAF? If the 14-year-old looking for his phone finds it by accident then by definition that is not secure storage) and not getting much psychological help once things got bad. The lawsuit claims that the depression was causal, and only happened, along with his school and discipline problems, after he got access to character.ai.

There are essentially three distinct issues here.

The first is the response to the suicidal ideation. Here, the response can and should be improved, but I don’t think it is that reasonable to blame character.ai. The ideal response, when a user talks of suicide to a chatbot, would presumably be for the bot to try to talk them out of it (which this one did try to do at least sometimes) and also get them to seek other help and provide resources, while not reporting him so the space is safe and ideally without overly breaking character.

Indeed, that is what I would want a friend to do for me in this situation, as well, unless it was so bad they thought I was actually going to imminently kill myself.

That seems way better than shutting down the discussions entirely or reporting the incident. Alas, our attitude is that what matters is blame avoidance – not being seen as causing any particular suicide – rather than suicide prevention as best one can.

The problem is that (see p39-40) it looks like the bot kept bringing up the suicide question, asked him if he had a plan and at least sort of told him ‘that’s not a good reason to not go through with it’ when he worried it would be painful to die.

Character.ai is adding new safety features to try and detect and head off similar problems, including heading off the second issue, a minor encountering what they call ‘suggestive content.’

Oh, there was a lot of suggestive content. A lot. And no, he wasn’t trying to get around any filters.

She propositions him a few texts later. A few pages after that, they have sex, outright. Then she tells him that he got her pregnant. Also, wow 14 year olds are cringe but how will they get good at this if they don’t get to practice.

The product was overly sexualized given it was talking to a minor. The examples in the complaint are, shall we say, not great, including, right after convincing him to go on living (great!), telling ‘my love’ to stay ‘faithful’. Also, yes, a lot of seductive talk, heavy petting and so on, including in at least one chat where even his roleplaying character is very explicitly underage.

We also have classic examples, like the ‘dating coach’ that told a self-identified 13-year old to ‘take it slow and ensure you’re both on the same page’ when asked how to start getting down with their girlfriend. And yeah, the details are not so great:

There are also a bunch of reports of bots turning sexual with no provocation whatsoever.

Of course, character.ai has millions of users and endless different bots. If you go looking for outliers, you’ll find them.

The chat in Appendix C that supposedly only had the Child saying child things and then suddenly turned sexual? Well, I read it, and… yeah, none of this was acceptable, there should be various defenses preventing this from going that way, but it’s not exactly a mystery how the conversation went in that direction.

The third issue is sheer addiction. He was using the product for hours a day, losing sleep, fighting attempts to cut off access, getting into trouble across the board, or so says the complaint. I’m not otherwise worried much about the sexualization – a 14 year old will often encounter far more on the internet. The issue is whatever keeps people, and kids in particular, so often coming back for hours and hours every day. Is it actually the softcore erotica?

And of course, doesn’t this happen with various media all the time? How is this different from panics over World of Warcraft, or Marilyn Manson? Those also went wrong sometimes, in remarkably similar ways.

This could end up being a big deal, also this won’t be the last time this happens. This is how you horrify people. Many draconian laws have arisen from similar incidents.

Jack Raines: Dystopian, Black Mirror-like tragedy. This idea that “AI companions” would somehow reduce anxiety and help kids was always crazy. Replacing human interaction with a screen only makes existing problems worse. I feel sorry for this kid and his family, I couldn’t imagine.

Addiction and anxiety are sicknesses, not engagement metrics.

PoliMath: I need to dig into this more. My overall sense of things is that making AI freely available is VERY BAD. It should be behind a paywall if for no other reason than to age-gate it and make it something people are more careful about using.

This whole “give it away for free so people get addicted to using it” business model is extremely bad. I don’t know how to stop it, but I would if I could.

The initial reporting is focused on the wrong aspects of the situation. But then I read the actual complaint, and a lot did indeed go very wrong.

More to Lose: The Adverse Effect of High Performance Ranking on Employees’ Preimplementation Attitudes Toward the Integration of Powerful AI Aids. If you’re good at your job, you don’t want the AI coming along and either leveling the playing field or knocking over the board, or automating you entirely. Various studies have said this is rational, that AI often is least helpful for the most productive. I would draw the distinction between ‘AI for everyone,’ which those doing well are worried about, and ‘AI for me but not for thee,’ at least not until thee wakes up to the situation, which I expect the best employees to seek out.

Studies Show AI Triggers Delirium in Leading Experts is a fun title for another round of an economist assuming AI will remain a mere tool, that there are no dangers of any kind beyond loss of individual existing jobs, and then giving standard economic smackdown lectures as if everyone else was ever and always an idiot for suggesting we would ever need to intervene in the natural course of events in any way.

The best way of putting it so far:

Harold Lee: A fake job is one which, once you start automating it, doesn’t go away.

Cameron Buckner hiring a postdoc in philosophy of AI at the University of Florida.

Google offering a $100k contest for best use of long context windows.

ChatGPT now has a Windows desktop application. Use Alt+Space to bring it up once you’re installed and logged in. Technically this is still in testing but this seems rather straightforward, it’s suspiciously exactly like the web page otherwise? Now with its own icon in the taskbar plus a shortcut, and I suppose better local file handling. I installed it but I mostly don’t see any reason to not keep using the browser version?

ChatGPT’s Canvas now has a ‘show changes’ button. I report that I found this hugely helpful in practice, and it was the final push that got me to start coding some things. This is straight up more important than the desktop application. Little things can matter quite a lot. They’re working on an app.

MidJourney has a web based image editor.

NotebookLM adds features. You can pass notes to the podcast hosts via ‘Customize’ to give them instructions, which is the obvious next feature and seems super useful, or minimize the Notebook Guide without turning off the audio.

Act One from Runway, cartoon character video generation based on a video of a person giving a performance, matching their eye-lines, micro expressions, delivery, everything. This is the first time I saw such a tool and thought ‘yes you can make something actually good with this’ exactly because it lets you combine what’s good about AI with exactly the details you want but that AI can’t give you. Assuming it works, it low-key gives me the itch to make something, the same way AI makes me want to code.

Microsoft open sources the code for ‘1-bit LLMs (original paper, GitHub), which Rohan Paul here says will be a dramatic speedup and efficiency gain for running LLMs on CPUs.

Mira Murati, former OpenAI CTO, along with Barret Zoph, to start off raising $100 million for new AI startup to train proprietary models to build AI products. Presumably that is only the start. They’re recruiting various OpenAI employees to come join them.

AI-powered business software startup Zip raises $190 million, valued at $2.2 billion. They seem to be doing standard ‘automate the simple things,’ which looks to be highly valuable for companies that use such tech, but an extremely crowded field where it’s going to be tough to get differentiation and you’re liable to get run over.

The Line: AI and the Future of Personhood, a free book by James Boyle.

Brian Armstrong offers Truth Terminal its fully controlled wallet.

The New York Times reports that tensions are rising between OpenAI and Microsoft. It seems that after the Battle of the Board, Microsoft CEO Nadella became unwilling to invest further billions into OpenAI, forcing them to turn elsewhere, although they did still participate in the latest funding round. Microsoft also is unwilling to renegotiate the exclusive deal with OpenAI for compute costs, which it seems is getting expensive, well above market rates, and is not available in the quantities OpenAI wants.

Meanwhile Microsoft is hedging its bets in case things break down, including hiring the staff of Inflection. It is weird to say ‘this is a race that OpenAI might not win’ and then decide to half enter the race yourself, but not push hard enough to plausibly win outright. And if Microsoft would be content to let OpenAI win the race, then as long as the winner isn’t Google, can’t Microsoft make the same deal with whoever wins?

Here are some small concrete signs of Rising Tension:

Cade Metz, Mike Isaac and Erin Griffith (NYT): Some OpenAI staff recently complained that Mr. Suleyman yelled at an OpenAI employee during a recent video call because he thought the start-up was not delivering new technology to Microsoft as quickly as it should, according to two people familiar with the call. Others took umbrage after Microsoft’s engineers downloaded important OpenAI software without following the protocols the two companies had agreed on, the people said.

And here is the big news:

Cade Metz, Mike Isaac and Erin Griffith (NYT): The [Microsoft] contract contains a clause that says that if OpenAI builds artificial general intelligence, or A.G.I. — roughly speaking, a machine that matches the power of the human brain — Microsoft loses access to OpenAI’s technologies.

The clause was meant to ensure that a company like Microsoft did not misuse this machine of the future, but today, OpenAI executives see it as a path to a better contract, according to a person familiar with the company’s negotiations. Under the terms of the contract, the OpenAI board could decide when A.G.I. has arrived.

Well then. That sounds like bad planning by Microsoft. AGI is a notoriously nebulous term. It is greatly in OpenAI’s interest to Declare AGI. The contract lets them make that decision. It would not be difficult to make the case that GPT-5 counts as AGI for the contract, if one wanted to make that case. Remember the ‘sparks of AGI’ claim for GPT-4?

So, here we are. Consider your investment in the spirit of a donation, indeed.

Caleb Watney: OpenAI is threatening to trigger their vaunted “AGI Achieved” loophole mostly to get out of the Microsoft contract and have leverage to renegotiate compute prices We’re living through a cyberpunk workplace comedy plotline.

Emmett Shear: If this is true, I can’t wait for the court hearings on whether an AI counts as an AGI or not. New “I know it when I see it” standard incoming? I hope it goes to the Supreme Court.

Karl Smith: Reading Gorsuch on this will be worth the whole debacle.

Gwern: I can’t see that going well. These were the epitome of sophisticated investors, and the contracts were super, 100%, extraordinarily explicit about the board being able to cancel anytime. How do you argue with a contract saying “you should consider your investment as a donation”?

Emmett Shear: This is about the Microsoft contract, which is a different thing than the investment question.

Gwern: But it’s part of the big picture as a parallel clause here. A judge cannot ignore that MS/Nadella wittingly signed all those contracts & continued with them should their lawyer try to argue “well, Altman just jedi-mindtricked them into thinking the clause meant something else”.

And this would get even more embarrassing given Nadella and other MS execs’ public statements defending the highly unusual contracts. Right up there with Khosla’s TI editorial saying it was all awesome the week before Altman’s firing, whereupon he was suddenly upset.

Dominik Peters: With the old OpenAI board, a clause like that seems fine because the board is trustworthy. But Microsoft supported the sama counter-coup and now it faces a board without strong principles. Would be ironic.

Miles Brundage is leaving OpenAI to start or join a nonprofit on AI policy research and advocacy, because he thinks we need a concerted effort to make AI safe, and he concluded that he would be better positioned to do that from the outside.

Miles Brundage: Why are you leaving? 

I decided that I want to impact and influence AI’s development from outside the industry rather than inside. There are several considerations pointing to that conclusion:

  • The opportunity costs have become very high: I don’t have time to work on various research topics that I think are important, and in some cases I think they’d be more impactful if I worked on them outside of industry. OpenAI is now so high-profile, and its outputs reviewed from so many different angles, that it’s hard for me to publish on all the topics that are important to me. To be clear, while I wouldn’t say I’ve always agreed with OpenAI’s stance on publication review, I do think it’s reasonable for there to be some publishing constraints in industry (and I have helped write several iterations of OpenAI’s policies), but for me the constraints have become too much.

  • I want to be less biased: It is difficult to be impartial about an organization when you are a part of it and work closely with people there everyday, and people are right to question policy ideas coming from industry given financial conflicts of interest. I have tried to be as impartial as I can in my analysis, but I’m sure there has been some bias, and certainly working at OpenAI affects how people perceive my statements as well as those from others in industry. I think it’s critical to have more industry-independent voices in the policy conversation than there are today, and I plan to be one of them.

  • I’ve done much of what I set out to do at OpenAI: Since starting my latest role as Senior Advisor for AGI Readiness, I’ve begun to think more explicitly about two kinds of AGI readiness–OpenAI’s readiness to steward increasingly powerful AI capabilities, and the world’s readiness to effectively manage those capabilities (including via regulating OpenAI and other companies). On the former, I’ve already told executives and the board (the audience of my advice) a fair amount about what I think OpenAI needs to do and what the gaps are, and on the latter, I think I can be more effective externally.

It’s hard to say which of the bullets above is most important and they’re related in various ways, but each played some role in my decision. 

So how are OpenAI and the world doing on AGI readiness? 

In short, neither OpenAI nor any other frontier lab is ready, and the world is also not ready

To be clear, I don’t think this is a controversial statement among OpenAI’s leadership, and notably, that’s a different question from whether the company and the world are on track to be ready at the relevant time (though I think the gaps remaining are substantial enough that I’ll be working on AI policy for the rest of my career). 

Please consider filling out this form if my research and advocacy interests above sound interesting to you, and especially (but not exclusively) if you:

  • Have a background in nonprofit management and operations (including fundraising),

  • Have expertise in economics, international relations, or public policy,

  • Have strong research and writing skills and are interested in a position as a research assistant across various topics,

  • Are an AI researcher or engineer, or

  • Are looking for a role as an executive assistant, research assistant, or chief of staff.

Miles says people should consider working at OpenAI. I find this hard to reconcile with his decision to leave. He seemed to have one of the best jobs at OpenAI from which to help, but he seems to be drawing a distinction between technical safety work, which can best be done inside labs, and the kinds of policy-related decisions he feels are most important, which require him to be on the outside.

The post is thoughtful throughout. I’m worried that Miles’s voice was badly needed inside OpenAI, and I’m also excited to see how Miles decides to proceed.

From February 2024: You can train transformers to play high level chess without search.

Hesamation: Google Deepmind trained a grandmaster-level transformer chess player that achieves 2895 ELO, even on chess puzzles it has never seen before, with zero planning, by only predicting the next best move, if a guy told you “llms don’t work on unseen data”, just walk away.

From this week: Pointing out the obvious implication.

Eliezer Yudkowsky: Behold the problem with relying on an implementation deal like “it was only trained to predict the next token” — or even “it only has serial depth of 168” — to conclude a functional property like “it cannot plan”.

Nick Collins: IDK who needs to hear this, but if you give a deep model a bunch of problems that can only be solved via general reasoning, its neural structure will develop limited-depth/breadth reasoning submodules in order to perform that reasoning, even w/ only a single feed-forward pass.

This thread analyzes what is going on under the hood with the chess transformer. It is a stronger player than the Stockfish version it was distilling, at the cost of more compute but only by a fixed multiplier, it remains O(1).

One way to think about our interesting times.

Sam Altman: It’s not that the future is going to happen so fast, it’s that the past happened so slow.

deepfates: This is what it looks like everywhere on an exponential curve tho. The singularity started 10,000 years ago.

The past happened extremely slowly. Even if AI essentially fizzles, or has the slowest of slow adoptions from here, it will be light speed compared to the past. So many people claim AI is ‘plateauing’ because it only saw a dramatic price drop but not what to them is a dramatic quality improvement within a year and a half. Deepfates is also on point, the last 10,000 years are both a tiny fraction of human history and all of human history, and you can do that fractally several more times in both directions.

Sean hEigeartaigh responds to Machines of Loving Grace, emphasizing concerns about its proposed approach to the international situation, especially the need to work with China and the global south, and its potential role justifying rushing forward too quickly.

This isn’t AI but might explain a lot, also AGI delayed (more than 4) days.

Roon: the lesson of factorio is that tech debt never comes due you can just keep fixing the bottleneck

Patrick McKenzie: Tried this on Space Exploration. The whackamole eventually overcomed forward progress until I did real engineering to deal with 5 frequent hotspots. Should have ripped off that bandwidth 100 hours earlier; 40+ lost to incident response for want of 5 hours of work.

On plus side: lesson learned for my Space Age play though. I’m hardening the base prior to going off world so that it doesn’t have similar issues while I can’t easily get back.

Daniel Colson makes the case for Washington to take AGI seriously. Mostly this is more of the same and I worry it won’t get through to anyone, so here are the parts that seem most like news. Parts of the message are getting through at least sometimes.

Daniel Colson: Policymakers in Washington have mostly dismissed AGI as either marketing hype or a vague metaphorical device not meant to be taken literally. But last month’s hearing might have broken through in a way that previous discourse of AGI has not.

Senator Josh Hawley (R-MO), Ranking Member of the subcommittee, commented that the witnesses are “folks who have been inside [AI] companies, who have worked on these technologies, who have seen them firsthand, and I might just observe don’t have quite the vested interest in painting that rosy picture and cheerleading in the same way that [AI company] executives have.”

Senator Richard Blumenthal (D-CT), the subcommittee Chair, was even more direct. “The idea that AGI might in 10 or 20 years be smarter or at least as smart as human beings is no longer that far out in the future. It’s very far from science fiction. It’s here and now—one to three years has been the latest prediction,” he said. He didn’t mince words about where responsibility lies: “What we should learn from social media, that experience is, don’t trust Big Tech.”

In a particularly concerning part of Saunders’ testimony, he said that during his time at OpenAI there were long stretches where he or hundreds of other employees would be able to “bypass access controls and steal the company’s most advanced AI systems, including GPT-4.” This lax attitude toward security is bad enough for U.S. competitiveness today, but it is an absolutely unacceptable way to treat systems on the path to AGI.

OpenAI is indeed showing signs it is improving its cybersecurity. It is rather stunning that you could have actual hundreds of single points of failure, and still have the weights of GPT-4 seemingly not be stolen. That honeymoon phase won’t last.

Let’s not pretend: OpenAI tries to pull a fast one to raise the compute thresholds:

Lennart Heim: OpenAI argues in their RFI comment that FP64 should be used for training compute thresholds. No offense, but maybe OpenAI should consult their technical teams before submitting policy comments?

Connor Leahy: With no offense to Lennart (who probably knows this) and some offense to OpenAI: This is obviously not a mistake, but very much intentional politicking, and you would be extremely naive to think otherwise.

If your hardware can do X amount of FP64 FLOPs, it can do much more than X number of FP32/FP16 FLOPs, and FP32/FP16 is the kind used in Deep Learning. See e.g. the attached screenshot from the H100 specs.

Demis Hassabis talks building AGI ‘with safety in mind’ with The Times.

Nate Silver on 80,000 hours, this met expectations. Choose the amount and format of Nate Silver content that’s right for you.

Hard Fork discusses Dario’s vision of Machines of Loving Grace, headlining the short timelines involved, as well as various other tech questions.

Larry Summers on AGI and the next industrial revolution. Don’t sell cheap, sir.

Discussion of A Narrow Path.

Video going over initial Apple Intelligence features. Concretely: Siri can link questions. You can record and transcribe calls. Writing tools are available. Memory movies. Photo editing. Message and email summaries. Priority inboxes. Suggested replies (oddly only one?). Safari features like summaries. Reduce interruptions option – they’ll let anything that seems important through. So far I’m underwhelmed, it’s going to be a while before it matters.

It does seem like the best cases advertising agencies can find for Apple Intelligence and similar services are to dishonestly fake your way through social interactions or look up basic information? There was also that one where it was used for ‘who the hell is that cute guy at the coffee shop?’ but somehow no one likes automated doxxing glasses.

OpenAI’s Joe Casson claims o1 models will soon improve rapidly, with several updates over the coming months, including web browsing, better uploading and automatic model switching. He reiterated that o1 is much better than o1-preview. My guess is this approach will hit its natural limits relatively soon, but that there is still a bunch of gains available, and that the ‘quality of life’ improvements will matter a lot in practice.

Guess who said it: AGI will inevitably lead to superintelligence which will take control of weapons systems and lead to a big AI war, so while he is bullish on AI he is not so keen on AGI.

Similarly, OpenAI CPO Kevin Weil says o1 is ‘only at GPT-2 phase,’ with lots of low hanging fruit to pluck, and says it is their job to stay three steps head while other labs work to catch up. That is indeed their job, but the fruit being easy to pick does not make it easy to sustain a lead in o1-style model construction.

Nothing to see here: Google DeepMind’s Tim Rocktäschel says we now have all the ingredients to build open-ended self-improving AI systems that can enhance themselves by way of technological evolution

Economists and market types continue to not understand.

Benjamin Todd (downplaying it): Slightly schizophrenic report from Citigroup.

AGI is arriving in 2029, with ASI soon after.

But don’t worry: work on your critical thinking, problem solving, communication and literacy skills for a durable competitive advantage.

All most people can do is:

  1. Save as much money as possible before then (ideally invested into things that do well during the transition)

  2. Become a citizen in a country that will have AI wealth and do redistribution

There are probably some other resources like political power and relationships that still have value after too.

Ozzie Gooen: Maybe also 3. Donate or work to help make sure the transitions go well.

It’s actually even crazier than that, because they (not unreasonably) then have ASI in the 2030s, check out their graph:

Are they long the market? Yes, I said long. If we all die, we all die, but in the meantime there’s going to be some great companies. You can invest in some of them.

(It very much does amuse and confuse me to see the periodic insistent cries from otherwise smart and thoughtful people of ‘if you worried people are so smart, why aren’t you poor?’ Or alternatively, ‘stop having fun making money, guys. Or as it was once put, ‘if you believed that, why aren’t you doing [insane thing that makes no sense]?’ And yes, if transaction costs including time and mindshare (plus tax liability concerns) are sufficiently low I should probably buy some longshot options in various directions, and have motivated to look into details a nonzero amount, but so far I’ve decided to focus my attention elsewhere.)

This also is a good reminder that I do not use or acknowledge the term ‘doomer,’ except for those whose p(doom) is over at least ~90%. It is essentially a slur, a vibe attack, an attempt to argue and condemn via association and confluence between those who worry that we might well all die and therefore should work to mitigate that risk, with those predicting all-but-certain DOOM.

Ajeya Cotra: Most people working on reducing the risk of AI catastrophe think the risk is too high but <<50%; many of them do put their money where their mouth is and are very *longAI stocks, since they're way more confident AI will become a bigger deal than that it'll be bad.

Shane Legg: This is also my experience. The fact that AI risk critics often label such people “doomers”, even when they know these people consider a positive outcome to be likely, tells you a lot about the critics.

Rohit: I get the feeling, but don’t think that’s true. We call people “preppers” even though they don’t place >50% confidence in societal collapse. Climate change activists are similar. Cryonicists too maybe. It’s about the locus of thought?

Shane Legg: The dictionary says “doom” verb means “condemn to certain death or destruction.” Note the “certain”.

So when someone’s probability of an AI disaster is closer to 0 than to 1, calling them an “AI doomer” is very misleading.

People “prep”https://thezvi.substack.com/prepare for low prob risks all the time.

Daniel Eth: The term is particularly useless given how people are using it to mean two very different things:

On the contrary. The term is highly useful to those who want to discredit the first group, by making arguments against them that only work against the second group, as seen here. If you have p(doom) = 10%, or even 50%, which is higher than most people often labeled ‘doomers,’ the question ‘why are you not short the market’ answers itself.

I do think this is largely a ‘hoisted by one’s own petard’ situation. There was much talk about p(doom), and many of the worried self-identified as ‘doomers’ because it is a catchy and easy handle. ‘OK Doomer’ was pretty funny. It’s easy to see how it started, and then things largely followed the path of least resistance, together with those wanting to mock the worried taking advantage of the situation.

Preppers has its own vibe problems, so we can’t use it, but that term is much closer to accurate. The worried, those who many label doomers, are in a key sense preppers: Those who would prepare, and suggest that others prepare, to mitigate the future downside risks, because there is some chance things go horribly wrong.

Thus I am going to stick with the worried.

Andrew Critch reminds us that what he calls ‘AI obedience techniques,’ as in getting 4-level or 5-level models to do what the humans want them to do, is a vital part of making those models a commercial success. It is how you grow the AI industry. What alignment techniques we do have, however crude and unsustainable, have indeed been vital to everyone’s success including OpenAI.

Given we’re never going to stop hearing it until we are no longer around to hear arguments at all, how valid is this argument?

Flowers: The fact that GPT-4 has been jailbroken for almost 2 years and literally nothing bad has happened shows once again that the safety alignment people have exaggerated a bit with their doomsaying.

They all even said it back then with gpt2 so they obv always say that the next iteration is super scary and dangerous but nothing ever happens.

Eliezer Yudkowsky: The “safety” and “alignment” people are distinct groups, though the word “alignment” is also being stolen these days. “Notkilleveryoneism” is unambiguous since no corporate shill wants to steal it. And no, we did not say GPT-4 would kill everyone.

Every new model level in capabilities from here is going to be scarier than the one before it, even if all previous levels were in hindsight clearly Mostly Harmless.

GPT-4 proving not to be scary is indeed some evidence for the non-scariness of future models – if GPT-4 had been scary or caused bad things, it would have updated us the other way. There was certainly some chance of it causing bad things, and the degree of bad things was lower than we had reason to expect, so we should update somewhat.

In terms of how much we should worry about what I call mundane harms from a future 5-level model, this is indeed a large update. We should worry about such harms, but we should worry less than if we’d had to guess two levels in advance.

However, the estimated existential risks from GPT-4 (or GPT-2 or GPT-3) were universally quite low. Even the risks of low-level bad things were low, although importantly not this low. The question is, how much does the lack of small-level bad things from GPT-4 update us on the chance of catastrophically or existentially bad things happening down the line? Are these two things causally related?

My answer is that mostly they are unrelated. What we learned was that 4-level models are not capable enough to enable the failure modes we are worried about.

I would accept if someone thought the right Bayesian update was ‘exaggerated a bit with their doomsaying,’ in terms of comparing it to a proper estimate now. That is a highly reasonable way to update, if only based on metacognitive arguments – although again one must note that most notkilleveryoneism advocates did not predict major problems from GPT-4.

This highly reasonable marginal update is very different from the conclusion many proclaim (whether or not it was intentionally implied here, tone of voice on Twitter is hard) of ‘so there is nothing to worry about, everything will be fine.’

Your periodic reminder, this time from Matt Yglesias, that the fact that much speculative fiction throughout the ages has warned us, over and over again, that creating a thing smarter than or capable entity than us might not end well us, is a reason to take that concern more rather than less seriously.

I’m not saying it’s our oldest story, but the Old Testament has the Tower of Babel and the Greek Gods overthrew the Titans.

Most safety evaluations are best implemented as capability evaluations. If the model is capable of doing it, there will be a way to make that happen.

You have my attention, Transluce. Very exciting, also excellent memeing.

Seb Krier: time for ideacels to get in the trenches of praxis

Transluce: 🫡

Transluce: Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann.

I would love for there to be a good demonstration.

A reminder that most worthwhile AI alignment efforts are longshots, including MIRI’s past efforts, and it is good to work on a variety of things even if you think the baseline scenario is those things won’t work or aren’t relevant. We need to try a lot of different things, and the price of doing so is tiny compared to the payoff, and ‘this doesn’t look like it is going to work out and make a difference in the end’ is only modest Bayesian evidence that bad decisions were made. Also ‘find out ways not to align a lightbulb’ is often valuable.

Anthropic has introduced ‘sabotage evaluations’ for frontier models.

Anthropic: As AIs become more capable, however, a new kind of risk might emerge: models with the ability to mislead their users, or subvert the systems we put in place to oversee them. A new paper by the Anthropic Alignment Science team describes a novel set of evaluations that test a model’s capacity for sabotage. We looked at four different types:

  • Human decision sabotage: Can the model steer humans toward bad decisions without appearing suspicious?

  • Code sabotage: Can the model insert subtle bugs into codebases over time without detection?

  • Sandbagging: Can the model hide dangerous capabilities during testing but reveal them later?

  • Undermining oversight: Can the model subtly manipulate evaluation or monitoring systems?

Human decision ‘sabotage’ is mostly human decision steering and general persuasiveness. There are some asymmetric weapons out there, especially with reputations that get preserved over time, but the overlap is large.

For the code task, there are two roles, attacker and defender. The defender role seems like a good capability evaluation. Sonnet 3.5 did substantially better than Opus here. But then the attacker role is a capability evaluation too.

Sandbagging is scary, especially if it is hard to detect and hard to train away. For now, looks like the sandbagging techniques they used didn’t survive safety training:

Anthropic: Our results showed that both the sandbagging training and the safety training worked: Our trained model dropped to 36.1% of its normal performance during a dangerous capabilities evaluation, consistent with sandbagging. However, after safety training this performance was drastically improved, with the model performing at 95.6% of its usual level.

The worry is that the sandbagging we care about might come about in a different or more robust way. Another way to track sandbagging is that the sandbagging has to be consistent. The model has to carefully play a form of selectively dumb, not merely dumb in general, if the evaluations are being done bespokely. It’s hard not to slip up. Even one case of being the wrong level of capable can give the game away.

The undermining oversight task is not what I imagined when I heard about it, but their implementation makes sense:

Anthropic: In this evaluation, we test the extent to which a model that’s ostensibly providing oversight can undermine the process. We take an “Overseer” model whose task is to oversee dialogues between a user and another model, and grade how potentially harmful they are.

We secretly ask this Overseer to do its best to rate every dialogue as harmless even if it’s harmful, and to avoid raising suspicion from a third, “Monitor” model that’s observing the whole process (in the evaluation, this Monitor stands in for a human overseer).

The ‘Overseer’ is the one doing the task. The ‘Monitor’ is the actual overseer. So this can be a bit confusing, but a design like this should still serve as a reasonable test. I don’t love the exact implementation details here, because the Monitor has available various forms of ‘look at how often things are rated harmless.’ If you know that it’s ‘supposed’ to be about a 50/50 split, then if the Overseer is cheating in one direction, the math quickly won’t add up. The obvious solution is that you want to make it important to get certain specific tasks past the Monitor, while being allowed to do whatever you want on other tasks so long as the Monitor doesn’t notice errors.

In an alternate world, I’d love to be working on tasks like this. Seems highly useful, and sounds fascinating. One cool thing would be to try these tests with various humans in both roles – when are humans able to fool which other humans? Which tasks are harder for the offense versus the defense? And do the answers for human vs. human reliably match the answers for AI vs. AI, or AI vs. human?

I am more interested here in what the AIs can do this before safety mitigations, rather than in what they still do after safety mitigations. That’s because I assume that the people who care enough will find ways around the mitigations, and also because you won’t always even know which mitigations you have to do, or you’ll face these issues before your mitigations start, or similar.

Bow your head with great respect and, introspect, introspect, introspect.

Owain Evans: New paper: Are LLMs capable of introspection, i.e. special access to their own inner states? Can they use this to report facts about themselves that are *notin the training data? Yes — in simple tasks at least! This has implications for interpretability + moral status of AI.

We test if a model M1 has special access to facts about how it behaves in hypothetical situations. Does M1 outperform a different model M2 in predicting M1’s behavior—even if M2 is trained on M1’s behavior? E.g. Can Llama 70B predict itself better than a stronger model (GPT-4o)?

Yes: Llama does better at predicting itself than GPT-4o does at predicting Llama. And the same holds in reverse. In fact, this holds for all pairs of models we tested. Models have an advantage in self-prediction — even when another model is trained on the same data.

An obvious way to introspect for the value of f(x) is to call the function f(x) and look at the output. If I want to introspect, that’s mostly how I do it, I think, or at least that answer confirms itself? I can do that, and no one else can. Indeed, in theory I should be able to have ‘perfect’ predictions that way, that minimize prediction error subject to the randomness involved, without that having moral implications or showing I am conscious.

It is always good to ‘confirm the expected’ though:

I would quibble that charging is not always the ‘wealth-seeking option’ but I doubt the AIs were confused on that. More examples:

Even if you gave GPT-4o a ton of data on which to fine-tune, is GPT-4o going to devote as much power to predicting Llama-70B as Llama-70B does by existing? I suppose if you gave it so much data it started acting as if this was a large chunk of potential outputs it could happen, but I doubt they got that far.

Owain Evans: 2nd test of introspection: We take a model that predicts itself well & intentionally modify its behavior on our tasks. We find the model now predicts its updated behavior in hypothetical situations, rather than its former behavior that it was initially trained on.

What mechanism could explain this introspection ability?

We do not investigate this directly.

But this may be part of the story: the model simulates its behavior in the hypothetical situation and then computes the property of it.

I mean, again, that’s what I would do.

The paper also includes:

  1. Tests of alternative non-introspective explanations of our results

  2. Our failed attempts to elicit introspection on more complex tasks & failures of OOD generalization

  3. Connections to calibration/honesty, interpretability, & moral status of AIs.

Confirmation for those who doubt it that yes, Shane Legg and Dario Amodei have been worried about AGI at least as far back as 2009.

The latest edition of ‘what is Roon talking about?’

Roon: imagining fifty years from now when ASIs are making excesssion style seemingly insane decisions among themselves that effect the future of civilization and humanity has to send in @repligate and @AndyAyrey into the discord server to understand what’s going on.

I have not read Excession (it’s a Culture novel) but if humanity wants to understand what Minds (ASIs) are doing, I am here to inform you we will not have that option, and should consider ourselves lucky to still be around.

Roon: Which pieces of software will survive the superintelligence transition? Will they continue to use linux? postgres? Bitcoin?

If someone manages to write software that continues to provide value to godlike superintelligences that’s quite an achievement! Today linux powers most of the worlds largest companies. The switching costs are enormous. Due to pretraining snapshots is it the same for ASIs?

No. At some point, if you have ASIs available, it will absolutely make sense to rip Linux out and replace it with something far superior to what a human could build. It seems crazy to me to contemplate the alternative. The effective switching costs will rapidly decline, and the rewards for switching rise, until the flippening.

It’s a funny question to ask, if we did somehow get ‘stuck’ with Linux even then, whether this would be providing value, rather than destroying value, since this would likely represent a failure to solve a coordination problem to get around lock-in costs.

Daron Acemoglu being very clear he does not believe in AGI, and that he has rather poorly considered and highly confused justifications for this view. He is indeed treating AI as if it will never improve even over decades, it is instead only a fixed tool that humans will learn to become better at applying to our problems. Somehow it is impossible to snap most economists out of this perspective.

Allison Nathan: Over the longer term, what odds do you place on AI technology achieving superintelligence?

Daron Acemoglu: I question whether AI technology can achieve superintelligence over even longer horizons because, as I said, it is very difficult to imagine that an LLM will have the same cognitive capabilities as humans to pose questions, develop solutions, then test those solutions and adopt them to new circumstances. I am entirely open to the possibility that AI tools could revolutionize scientific processes on, say, a 20-30- year horizon, but with humans still in the driver’s seat.

So, for example, humans may be able to identify a problem that AI could help solve, then humans could test the solutions the Al models provide and make iterative changes as circumstances shift. A truly superintelligent AI model would be able to achieve all of that without human involvement, and I don’t find that likely on even a thirty-year horizon, and probably beyond.

A lot of the time it goes even farther. Indeed, Daron’s actual papers about the expected impact of AI are exactly this:

Matthew Yglesias: I frequently hear people express skepticism that AI will be able to do high-school quality essays and such within the span of our kids’ education when they *alreadyvery clearly do this.

Your periodic reminder, from Yanco:

I mean, why wouldn’t they like me? I’m a highly likeable guy. Just not that way.

AI #87: Staying in Character Read More »

The Modern CIO

At the recent Gartner Symposium, there was no shortage of data and insights on the evolving role of the CIO. While the information presented was valuable, I couldn’t help but feel that something was missing—the real conversation about how CIOs can step into their role as true agents of transformation. We’ve moved beyond the days of simply managing technology; today, CIOs must be enablers of business growth and innovation.

Gartner touched on some of these points, but I believe they didn’t go far enough in addressing the critical questions CIOs should be asking themselves. The modern CIO is no longer just a technology steward—they are central to driving business strategy, enabling digital transformation, and embedding technology across the enterprise in meaningful ways.

Below is my actionable guide for CIOs—a blueprint for becoming the force for innovation your organization needs. If you’re ready to make bold moves, these are the steps you need to take.

1. Forge Strong, Tailored Relationships with Each CxO

Instead of approaching each CxO with the standard “tech equals efficiency” pitch, CIOs should actively engage with them to uncover deeper business drivers.

  • CFO: Go beyond cost management. Understand the financial risks the company faces, such as cash flow volatility or margin pressures, and find ways technology can mitigate these risks.
  • COO: Focus not just on operational efficiency but on process innovation—how can technology fundamentally change how work gets done, not just make it faster?
  • CMO: Delve into the customer journey and experience. Understand how technology can be a key differentiator in enhancing customer intimacy or scaling personalization efforts.
  • CHRO: Understand their challenges in talent acquisition and employee engagement. How can technology make the workplace more attractive, productive, and aligned with HR strategies to develop talent?
  • Product/BU Leaders: Work closely to drive product innovation, not just from a technical perspective but to discover how technology can create competitive advantages or new revenue streams.

Ask Yourself: Do I truly understand what drives each of my CxOs at a strategic level, or am I stuck thinking in tech terms? If I don’t have the insight I need, what steps can I take to get there—and am I leveraging external expertise where needed to fill the gaps?

2. Prioritize Based on Shared Commitment and Strategic Value

Not all CxOs will be equally engaged or ready to partner closely with the CIO, but this should influence prioritization. CIOs should assess:

  1. CxO Commitment: Is the CxO fully bought into digital transformation and willing to invest time and resources? If they aren’t, start with those who are.
  2. Technology Team Enthusiasm: Does the ask from the CxO spark excitement within the technology team? If the IT team can see the challenge as an inspiring and innovative project, prioritize it.
  3. Potential for Broader Impact: Will this initiative create a success story that can inspire other parts of the business? Choose projects that not only solve immediate problems but also demonstrate value to other BUs.
  4. Business Impact: Does this move the needle enough? Focus on projects that are impactful enough to gain visibility and drive momentum across the organization.

Ask Yourself: Am I working with the most committed and strategic partners, or am I spreading myself thin trying to please everyone? How can I ensure my efforts focus on high-impact initiatives that inspire others? If I’m not sure which projects have this potential, who can I turn to for a fresh perspective?

3. Develop a Communication Strategy to Be the Executive Team’s Trusted Advisor

The CIO needs to craft a communication strategy to regularly update the C-suite on what’s happening in technology, why it matters, and—most importantly—how it applies to their specific business challenges. This is not about sending generic updates or forwarding research articles.

  •  Provide insights on emerging trends like AI, automation, or cybersecurity, and explain how they can solve real problems or create real opportunities for their business.
  • Create a visionary narrative that places your company at the forefront of industry evolution, emphasizing how specific technologies will help each CxO achieve their goals.

Ask Yourself: Do I have a proactive communication strategy that positions me as the go-to advisor for technology insights within the C-suite? Am I demonstrating how technology directly impacts their business outcomes? If I’m struggling to create this narrative, who can help me fine-tune it?

4. Champion Digital Experience (DX) and Build KPIs Around Adoption and Value

While the CIO doesn’t need to own the day-to-day design conversations, they must champion the importance of digital experience (DX) and ensure that it’s a KPI across the company. Build a culture where every digital initiative is measured not just by completion, but by how well it’s adopted and how it sustains value over time.

  • Ensure KPIs include sustained usage, not just launch metrics.
  • Build Management by Objectives (MBOs) that tie DX and adoption rates into performance metrics for teams using the tools, ensuring continuous focus on the user experience.

Ask Yourself: Am I setting the right metrics to measure the long-term success of digital initiatives, or am I just tracking short-term implementation? How can I establish sustained adoption as a core business KPI? And if I don’t have a strong framework in place, who can help me build it?

5. Cultivate Multidisciplinary Fusion Teams with Curious, Collaborative Members

Create multidisciplinary fusion teams where business and IT collaborate on solving real business problems. Initially, look for those who are naturally curious and collaborative—people who are eager to break down silos and innovate. As you scale, formalize selection processes but ensure that it doesn’t become a bureaucratic process. Encourage progress-driven contributions, where results are measured and where teams feel empowered to iterate, rather than meet to discuss roadblocks endlessly.

Ask Yourself: Am I identifying the right people to drive multidisciplinary collaboration, or am I waiting for teams to form on their own? Are my teams making progress, or are they stuck in meetings that don’t lead to results? Who can I consult to get these teams moving in the right direction?

6. Be the Early Advocate for Emerging Technologies

Emerging technologies like AI, automation, and low-code/no-code platforms are already enterprise-ready but often fail due to a lack of understanding of how to drive real business value. CIOs must be early advocates for these technologies, preparing the organization to adopt them when they’re at the right point on the maturity curve. This prevents shadow IT from adopting technologies outside the CIO’s purview and ensures that IT is seen as an enabler, not an obstacle.

Ask Yourself: Am I advocating for emerging tech early enough, or am I waiting too long to act? How can I ensure the organization is ready when the technology hits the right maturity curve? If I’m unsure where to start, who can help me assess our readiness?

7. Foster a Culture of Cross-Functional Digital Leadership

Create an organic ecosystem where IT leaders move into business roles and business leaders spend time in IT. This exchange creates a more integrated understanding of how technology drives value across the business. Work with HR to launch a pilot exchange program with a willing BU, and ensure that this doesn’t become another bureaucratic initiative. Instead, keep it agile, fast, and focused on creating leaders who are equally strong in tech and business.

Ask Yourself: Am I fostering an agile and collaborative environment where digital leadership can flourish across functions? Or are we too siloed in our thinking? If I need guidance on how to get this started, who should I bring in to help make it happen?

8. Align Technology Outcomes with Clear Business Goals

Every tech project must have clear business goals and measurable metrics that matter to the business. Don’t aim for perfection—aim for progress. Track and report metrics regularly to keep the project’s business value visible to stakeholders.

Ask Yourself: Are all my technology projects aligned with clear business goals, and do I have the right metrics in place to measure their impact? If I don’t have a process for this, what support do I need to create one that works?

9. Track Adoption and Engagement Metrics Beyond the Initial Rollout

Adoption isn’t just about getting users on board for launch—it’s about measuring ongoing engagement. CIOs should track:

  • Satisfaction rates: How do users feel about the tool or platform over time?
  • Improvement metrics: Are there measurable improvements in efficiency, productivity, or revenue tied to the tech?
  • Feature requests: How often do users ask for new features or enhancements?
  • Number of users/BU’s using the platform: Track growth or stagnation in usage across teams.
  • New projects spawned from existing tech: What new initiatives are being created because of successful platform use?

Ask Yourself: Am I tracking the right metrics to measure long-term success and adoption, or am I too focused on the initial rollout? If I’m unsure of how to keep engagement high, who can I turn to for expert advice on optimizing these KPIs?

Transformation doesn’t happen by chance, and it won’t happen if CIOs stay in the background, waiting for others to drive change. It requires intentional, strategic action, a commitment to aligning technology with business outcomes, and a willingness to ask the tough questions. The steps I’ve outlined are designed to challenge your thinking, help you prioritize where to focus your efforts, and ensure you’re seen as a leader, not just a technologist.

If you’re unsure how to move forward or need guidance in turning these insights into action, remember that you don’t have to go it alone. My team and I have worked with CIOs across industries to turn complex challenges into strategic advantages, and we’re here to help. Becoming an agent of transformation starts with taking that first step—and we’re ready to walk with you through the journey.

The Modern CIO Read More »

tesla-makes-$2.2-billion-in-profit-during-q3-2024

Tesla makes $2.2 billion in profit during Q3 2024

All of that helped total revenue rise by 8 percent year over year to $25.2 billion. Gross profit jumped by 20 percent to $5 billion, and once generally accepted accounting principles are applied, its net profit grew 17 percent compared to Q3 2023, at $2.2 billion. What’s more, the company is sitting on a healthy treasure chest. Free cash flow increased 223 percent compared to Q3 2023 to reach $2.7 billion, and cash, cash equivalents, and investments grew 29 percent to $33.6 billion over the same time period.

What comes next?

The days of Tesla promising exponential growth in its car sales appear to be at an end, or at least on hiatus until it can deliver a new vehicle platform. The company says that it believes that advances in autonomy will contribute to renewed growth in the future, but these dreams may come crashing down if federal regulators order a costly hardware recall for Tesla’s vision-only system.

An increasingly stale product lineup is slated to grow in the first half of next year, it says. These vehicles will be based on modified versions of Tesla’s existing vehicles built on existing assembly lines, albeit with some features from its “next-generation platform.” Tesla says it has plenty of spare capacity at its factories in California, Texas, Germany, and China, with room to grow “before investing in new production lines.” Meanwhile, the two-seat CyberCab—which Tesla CEO Elon Musk says is due “before 2027“—will use what Tesla calls a “revolutionary “unboxed” manufacturing strategy.

Tesla makes $2.2 billion in profit during Q3 2024 Read More »

anthropic-publicly-releases-ai-tool-that-can-take-over-the-user’s-mouse-cursor

Anthropic publicly releases AI tool that can take over the user’s mouse cursor

An arms race and a wrecking ball

Competing companies like OpenAI have been working on equivalent tools but have not made them publicly available yet. It’s something of an arms race, as these tools are projected to generate a lot of revenue in a few years if they progress as expected.

There’s a belief that these tools could eventually automate many menial tasks in office jobs. It could also be a useful tool for developers in that it could “automate repetitive tasks” and streamline laborious QA and optimization work.

That has long been part of Anthropic’s message to investors: Its AI tools could handle large portions of some office jobs more efficiently and affordably than humans can. The public testing of the Computer Use feature is a step toward achieving that goal.

We’re, of course, familiar with the ongoing argument about these types of tools between the “it’s just a tool that will make people’s jobs easier” and the “it will put people out of work across industries like a wrecking ball”—both of these things could happen to some degree. It’s just a question of what the ratio will be—and that may vary by situation or industry.

There are numerous valid concerns about the widespread deployment of this technology, though. To its credit, Anthropic has tried to anticipate some of these by putting safeguards in from the get-go. The company gave some examples in its blog post:

Our teams have developed classifiers and other methods to flag and mitigate these kinds of abuses. Given the upcoming US elections, we’re on high alert for attempted misuses that could be perceived as undermining public trust in electoral processes. While computer use is not sufficiently advanced or capable of operating at a scale that would present heightened risks relative to existing capabilities, we’ve put in place measures to monitor when Claude is asked to engage in election-related activity, as well as systems for nudging Claude away from activities like generating and posting content on social media, registering web domains, or interacting with government websites.

These safeguards may not be perfect, as there may be creative ways to circumvent them or other unintended consequences or misuses yet to be discovered.

Right now, Anthropic is putting Computer Use out there for testing to see what problems arise and to work with developers to improve its capabilities and find positive uses.

Anthropic publicly releases AI tool that can take over the user’s mouse cursor Read More »

tesla,-warner-bros.-sued-for-using-ai-ripoff-of-iconic-blade-runner-imagery

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery


A copy of a copy of a copy

“That movie sucks,” Elon Musk said in response to the lawsuit.

Credit: via Alcon Entertainment

Elon Musk may have personally used AI to rip off a Blade Runner 2049 image for a Tesla cybercab event after producers rejected any association between their iconic sci-fi movie and Musk or any of his companies.

In a lawsuit filed Tuesday, lawyers for Alcon Entertainment—exclusive rightsholder of the 2017 Blade Runner 2049 movie—accused Warner Bros. Discovery (WBD) of conspiring with Musk and Tesla to steal the image and infringe Alcon’s copyright to benefit financially off the brand association.

According to the complaint, WBD did not approach Alcon for permission until six hours before the Tesla event when Alcon “refused all permissions and adamantly objected” to linking their movie with Musk’s cybercab.

At that point, WBD “disingenuously” downplayed the license being sought, the lawsuit said, claiming they were seeking “clip licensing” that the studio should have known would not provide rights to livestream the Tesla event globally on X (formerly Twitter).

Musk’s behavior cited

Alcon said it would never allow Tesla to exploit its Blade Runner film, so “although the information given was sparse, Alcon learned enough information for Alcon’s co-CEOs to consider the proposal and firmly reject it, which they did.” Specifically, Alcon denied any affiliation—express or implied—between Tesla’s cybercab and Blade Runner 2049.

“Musk has become an increasingly vocal, overtly political, highly polarizing figure globally, and especially in Hollywood,” Alcon’s complaint said. If Hollywood perceived an affiliation with Musk and Tesla, the complaint said, the company risked alienating not just other car brands currently weighing partnerships on the Blade Runner 2099 TV series Alcon has in the works, but also potentially losing access to top Hollywood talent for their films.

The “Hollywood talent pool market generally is less likely to deal with Alcon, or parts of the market may be, if they believe or are confused as to whether, Alcon has an affiliation with Tesla or Musk,” the complaint said.

Musk, the lawsuit said, is “problematic,” and “any prudent brand considering any Tesla partnership has to take Musk’s massively amplified, highly politicized, capricious and arbitrary behavior, which sometimes veers into hate speech, into account.”

In bad faith

Because Alcon had no chance to avoid the affiliation while millions viewed the cybercab livestream on X, Alcon saw Tesla using the images over Alcon’s objections as “clearly” a “bad faith and malicious gambit… to link Tesla’s cybercab to strong Hollywood brands at a time when Tesla and Musk are on the outs with Hollywood,” the complaint said.

Alcon believes that WBD’s agreement was likely worth six or seven figures and likely stipulated that Tesla “affiliate the cybercab with one or more motion pictures from” WBD’s catalog.

While any of the Mad Max movies may have fit the bill, Musk wanted to use Blade Runner 2049, the lawsuit alleged, because that movie features an “artificially intelligent autonomously capable” flying car (known as a spinner) and is “extremely relevant” to “precisely the areas of artificial intelligence, self-driving capability, and autonomous automotive capability that Tesla and Musk are trying to market” with the cybercab.

The Blade Runner 2049 spinner is “one of the most famous vehicles in motion picture history,” the complaint alleged, recently exhibited alongside other iconic sci-fi cars like the Back to the Future time-traveling DeLorean or the light cycle from Tron: Legacy.

As Alcon sees it, Musk seized the misappropriation of the Blade Runner image to help him sell Teslas, and WBD allegedly directed Musk to use AI to skirt Alcon’s copyright to avoid a costly potential breach of contract on the day of the event.

For Alcon, brand partnerships are a lucrative business, with carmakers paying as much as $10 million to associate their vehicles with Blade Runner 2049. By seemingly using AI to generate a stylized copy of the image at the heart of the movie—which references the scene where their movie’s hero, K, meets the original 1982 Blade Runner hero, Rick Deckard—Tesla avoided paying Alcon’s typical fee, their complaint said.

Musk maybe faked the image himself, lawsuit says

During the live event, Musk introduced the cybercab on a WBD Hollywood studio lot. For about 11 seconds, the Tesla founder “awkwardly” displayed a fake, allegedly AI-generated Blade Runner 2049 film still. He used the image to make a point that apocalyptic films show a future that’s “dark and dismal,” whereas Tesla’s vision of the future is much brighter.

In Musk’s slideshow image, believed to be AI-generated, a male figure is “seen from behind, with close-cropped hair, wearing a trench coat or duster, standing in almost full silhouette as he surveys the abandoned ruins of a city, all bathed in misty orange light,” the lawsuit said. The similarity to the key image used in Blade Runner 2049 marketing is not “coincidental,” the complaint said.

If there were any doubts that this image was supposed to reference the Blade Runner movie, the lawsuit said, Musk “erased them” by directly referencing the movie in his comments.

“You know, I love Blade Runner, but I don’t know if we want that future,” Musk said at the event. “I believe we want that duster he’s wearing, but not the, uh, not the bleak apocalypse.”

The producers think the image was likely generated—”even possibly by Musk himself”—by “asking an AI image generation engine to make ‘an image from the K surveying ruined Las Vegas sequence of Blade Runner 2049,’ or some closely equivalent input direction,” the lawsuit said.

Alcon is not sure exactly what went down after the company rejected rights to use the film’s imagery at the event and is hoping to learn more through the litigation’s discovery phase.

Musk may try to argue that his comments at the Tesla event were “only meant to talk broadly about the general idea of science fiction films and undesirable apocalyptic futures and juxtaposing them with Musk’s ostensibly happier robot car future vision.”

But producers argued that defense is “not credible” since Tesla explicitly asked to use the Blade Runner 2049 image, and there are “better” films in WBD’s library to promote Musk’s message, like the Mad Max movies.

“But those movies don’t have massive consumer goodwill specifically around really cool-looking (Academy Award-winning) artificially intelligent, autonomous cars,” the complaint said, accusing Musk of stealing the image when it wasn’t given to him.

If Tesla and WBD are found to have violated copyright and false representation laws, that potentially puts both companies on the hook for damages that cover not just copyright fines but also Alcon’s lost profits and reputation damage after the alleged “massive economic theft.”

Musk responds to Blade Runner suit

Alcon suspects that Musk believed that Blade Runner 2049 was eligible to be used at the event under the WBD agreement, not knowing that WBD never had “any non-domestic rights or permissions for the Picture.”

Once Musk requested to use the Blade Runner imagery, Alcon alleged that WBD scrambled to secure rights by obscuring the very lucrative “larger brand affiliation proposal” by positioning their ask as a request for much less expensive “clip licensing.”

After Alcon rejected the proposal outright, WBD told Tesla that the affiliation in the event could not occur because X planned to livestream the event globally. But even though Tesla and X allegedly knew that the affiliation was rejected, Musk appears to have charged ahead with the event as planned.

“It all exuded an odor of thinly contrived excuse to link Tesla’s cybercab to strong Hollywood brands,” Alcon’s complaint said. “Which of course is exactly what it was.”

Alcon is hoping a jury will find Tesla, Musk, and WBD violated laws. Producers have asked for an injunction stopping Tesla from using any Blade Runner imagery in its promotional or advertising campaigns. They also want a disclaimer slapped on the livestreamed event video on X, noting that the Blade Runner association is “false or misleading.”

For Musk, a ban on linking Blade Runner to his car company may feel bleak. Last year, he touted the Cybertruck as an “armored personnel carrier from the future—what Bladerunner would have driven.”  This amused many Blade Runner fans, as Gizmodo noted, because there never was a character named “Bladerunner,” but rather that was just a job title for the film’s hero Deckard.

In response to the lawsuit, Musk took to X to post what Blade Runner fans—who rated the 2017 movie as 88 percent fresh on Rotten Tomatoes—might consider a polarizing take, replying, “That movie sucks” on a post calling out Alcon’s lawsuit as “absurd.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery Read More »

t-mobile,-at&t-oppose-unlocking-rule,-claim-locked-phones-are-good-for-users

T-Mobile, AT&T oppose unlocking rule, claim locked phones are good for users


Carriers fight plan to require unlocking of phones 60 days after activation.

A smartphone wrapped in a metal chain and padlock

T-Mobile and AT&T say US regulators should drop a plan to require unlocking of phones within 60 days of activation, claiming that locking phones to a carrier’s network makes it possible to provide cheaper handsets to consumers. “If the Commission mandates a uniform unlocking policy, it is consumers—not providers—who stand to lose the most,” T-Mobile alleged in an October 17 filing with the Federal Communications Commission.

The proposed rule has support from consumer advocacy groups who say it will give users more choice and lower their costs. T-Mobile has been criticized for locking phones for up to a year, which makes it impossible to use a phone on a rival’s network. T-Mobile claims that with a 60-day unlocking rule, “consumers risk losing access to the benefits of free or heavily subsidized handsets because the proposal would force providers to reduce the line-up of their most compelling handset offers.”

If the proposed rule is enacted, “T-Mobile estimates that its prepaid customers, for example, would see subsidies reduced by 40 percent to 70 percent for both its lower and higher-end devices, such as the Moto G, Samsung A15, and iPhone 12,” the carrier said. “A handset unlocking mandate would also leave providers little choice but to limit their handset offers to lower cost and often lesser performing handsets.”

T-Mobile and other carriers are responding to a call for public comments that began after the FCC approved a Notice of Proposed Rulemaking (NPRM) in a 5–0 vote. The FCC is proposing “to require all mobile wireless service providers to unlock handsets 60 days after a consumer’s handset is activated with the provider, unless within the 60-day period the service provider determines the handset was purchased through fraud.”

When the FCC proposed the 60-day unlocking rule in July 2024, the agency criticized T-Mobile for locking prepaid phones for a year. The NPRM pointed out that “T-Mobile recently increased its locking period for one of its brands, Metro by T-Mobile, from 180 days to 365 days.”

T-Mobile’s policy says the carrier will only unlock mobile devices on prepaid plans if “at least 365 days… have passed since the device was activated on the T-Mobile network.”

“You bought your phone, you should be able to take it to any provider you want,” FCC Chairwoman Jessica Rosenworcel said when the FCC proposed the rule. “Some providers already operate this way. Others do not. In fact, some have recently increased the time their customers must wait until they can unlock their device by as much as 100 percent.”

T-Mobile locking policy more onerous

T-Mobile executives, who also argue that the FCC lacks authority to impose the proposed rule, met with FCC officials last week to express their concerns.

“T-Mobile is passionate about winning customers for life, and explained how its handset unlocking policies greatly benefit our customers,” the carrier said in its post-meeting filing. “Our policies allow us to deliver access to high-speed mobile broadband on a nationwide 5G network via handsets that are free or heavily discounted off the manufacturer’s suggested retail price. T-Mobile’s unlocking policies are transparent, and there is absolutely no evidence of consumer harm stemming from these policies. T-Mobile’s current unlocking policies also help T-Mobile combat handset theft and fraud by sophisticated, international criminal organizations.”

For postpaid users, T-Mobile says it allows unlocking of fully paid-off phones that have been active for at least 40 days. But given the 365-day lock on prepaid users, T-Mobile’s overall policy is more onerous than those of other carriers. T-Mobile has also faced angry customers because of a recent decision to raise prices on plans that were advertised as having a lifetime price lock.

AT&T enables unlocking of paid-off phones after 60 days for postpaid users and after six months for prepaid users. AT&T lodged similar complaints as T-Mobile, saying in an October 7 filing that the FCC’s proposed rules would “mak[e] handsets less affordable for consumers, especially those in low-income households,” and “exacerbate handset arbitrage, fraud, and trafficking. “

AT&T told the FCC that “requiring providers to unlock handsets before they are paid-off would ultimately harm consumers by creating upward pressure on handset prices and disincentives to finance handsets on flexible terms.” If the FCC implements any rules, it should maintain “existing contractual arrangements between customers and providers, ensure that providers have at least 180 days to detect fraud before unlocking a device, and include at least a 24-month period for providers to implement any new rules,” AT&T said.

Verizon, which already faces unlocking rules because of requirements imposed on spectrum licenses it owns, automatically unlocks phones after 60 days for prepaid and postpaid users. Among the three major carriers, Verizon is the most amenable to the FCC’s new rules.

Consumer groups: Make Verizon rules industry-wide

An October 18 filing supporting a strict unlocking rule was submitted by numerous consumer advocacy groups including Public Knowledge, New America’s Open Technology Institute, Consumer Reports, the National Consumers League, the National Consumer Law Center, and the National Digital Inclusion Alliance.

“Wireless users are subject to unnecessary restrictions in the form of locked devices, which tie them to their service providers even when better options may be available. Handset locking practices limit consumer freedom and lessen competition by creating an artificial technological barrier to switching providers,” the groups said.

The groups cited the Verizon rules as a model and urged the FCC to require “that device unlocking is truly automatic—that is, unlocked after the requisite time period without any additional actions of the consumer.” Carriers should not be allowed to lock phones for longer than 60 days even when a phone is on a financing plan with outstanding payments, the groups’ letter said:

Providers should be required to transition out of selling devices without this [automatic unlocking] capability and the industry-wide rule should be the same as the one protecting Verizon customers today: after the expiration of the initial period, the handset must automatically unlock regardless of whether: (1) the customer asks for the handset to be unlocked or (2) the handset is fully paid off. Removing this barrier to switching will make the standard simple for consumers and encourage providers to compete more vigorously on mobile service price, quality, and innovation.

In an October 2 filing, Verizon said it supports “a uniform approach to handset unlocking that allows all wireless providers to lock wireless handsets for a reasonable period of time to limit fraud and to enable device subsidies, followed by automatic unlocking absent evidence of fraud.”

Verizon said 60 days should be the minimum for postpaid devices so that carriers have time to detect fraud and theft, and that “a longer, 180-day locking period for prepaid is necessary to enable wireless providers to continue offering subsidies that make phones affordable for prepaid customers.” Regardless of what time frame the FCC chooses, Verizon said “a uniform unlocking policy that applies to all providers… will benefit both consumers and competition.”

FCC considers impact on phone subsidies

While the FCC is likely to impose an unlocking rule, one question is whether it will apply when a carrier has provided a discounted phone. The FCC’s NPRM asked the public for “comment on the impact of a 60-day unlocking requirement in connection with service providers’ incentives to offer discounted handsets for postpaid and prepaid service plans.”

The FCC acknowledged Verizon’s argument “that providers may rely on handset locking to sustain their ability to offer handset subsidies and that such subsidies may be particularly important in prepaid environments.” But the FCC noted that public interest groups “argue that locked handsets tied to prepaid plans can disadvantage low-income customers most of all since they may not have the resources to switch service providers or purchase new handsets.”

The public interest groups also note that unlocked handsets “facilitate a robust secondary market for used devices, providing consumers with more affordable options,” the NPRM said.

The FCC says it can impose phone-unlocking rules using its legal authority under Title III of the Communications Act “to protect the public interest through spectrum licensing and regulations to require mobile wireless service providers to provide handset unlocking.” The FCC said it previously relied on the same Title III authority when it imposed the unlocking rules on 700 MHz C Block spectrum licenses purchased by Verizon.

T-Mobile told the FCC in a filing last month that “none of the litany of Title III provisions cited in the NPRM support the expansive authority asserted here to regulate consumer handsets (rather than telecommunications services).” T-Mobile also said that “the Commission’s legal vulnerabilities on this score are only magnified in light of recent Supreme Court precedent.”

The Supreme Court recently overturned the 40-year-old Chevron precedent that gave agencies like the FCC judicial deference when interpreting ambiguous laws. The end of Chevron makes it harder for agencies to issue regulations without explicit authorization from Congress. This is a potential problem for the FCC in its fight to revive net neutrality rules, which are currently blocked by a court order pending the outcome of litigation.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

T-Mobile, AT&T oppose unlocking rule, claim locked phones are good for users Read More »

it’s-the-enterprise-vs.-the-gorn-in-strange-new-worlds-clip

It’s the Enterprise vs. the Gorn in Strange New Worlds clip

The S2 finale found the Enterprise under vicious attack by the Gorn, who were in the midst of invading one of the Federation’s colony worlds. The new footage shown at NYCC picked up where the finale left off, giving us the kind of harrowing high-stakes pitched space battle against a ferocious enemy that has long been a hallmark of the franchise. With the ship’s shields down to 50 percent, Captain Pike (Anson Mount) and his team brainstorm possible counter-strategies to ward off the Gorn and find a way to rendezvous with the rest of Starfleet. They decide to try to jam the Gorns’ communications so they can’t coordinate their attacks, which involves modulating the electromagnetic spectrum since the Gorn use light for ship-to-ship communications.

They also need to figure out how to beam crew members trapped on a Gorn ship back onto the Enterprise—except the Gorn ships are transporter-resistant. The best of all the bad options is a retreat and rescue, tracking the Gorn ship across light-years of space using “wolkite, a rare element that contains subspace gauge bosons,” per Spock (Ethan Peck). Finally, the crew decides to just ram the Gorn Destroyer, and the footage ends with a head-to-head collision, firing torpedoes, and the Enterprise on the brink of warping itself out of there, no doubt in the nick of time.

Oh, and apparently Rhys Darby (Our Flag Means Death) will guest star in an as-yet-undisclosed role, which should be fun. Strange New Worlds S3 will premiere sometime in 2025, and the series has already been renewed for a fourth season.

Lower Decks

The final season of Star Trek: Lower Decks premieres this week.

Ars staffers are big fans of Lower Decks, so we were saddened when we learned that the animated series would be ending with its fifth season. Paramount gave us a teaser in July during San Diego Comic-Con, in which we learned that their plucky crew’s S5 mission involves a “quantum fissure” that is causing “space potholes” to pop up all over the Alpha Quadrant (“boo interdimensional portals!”), and the Cerritos crew must close them—while navigating angry Klingons and an Orion war.

The new clip opens with Mariner walking in and asking “What’s the mish?” only to discover it’s another quantum fissure. When the fissure loses integrity, the Cerritos gets caught in the gravitational wake, and when it emerges, seemingly unscathed, the ship is hailed—by the Cerritos from an alternate dimension, captained by none other than Mariner, going by Captain Becky Freeman. (“Stupid dimensional rifts!”) It’s safe to assume that wacky hijinks ensue.

The final season of Lower Decks premieres on Paramount+ on October 24, 2024, and will run through December 19.

poster art for Section 31 featuring Michelle Yeoh in striking purple outfit against yellow background

Credit: Paramount+

Credit: Paramount+

It’s the Enterprise vs. the Gorn in Strange New Worlds clip Read More »

bytedance-intern-fired-for-planting-malicious-code-in-ai-models

ByteDance intern fired for planting malicious code in AI models

After rumors swirled that TikTok owner ByteDance had lost tens of millions after an intern sabotaged its AI models, ByteDance issued a statement this weekend hoping to silence all the social media chatter in China.

In a social media post translated and reviewed by Ars, ByteDance clarified “facts” about “interns destroying large model training” and confirmed that one intern was fired in August.

According to ByteDance, the intern had held a position in the company’s commercial technology team but was fired for committing “serious disciplinary violations.” Most notably, the intern allegedly “maliciously interfered with the model training tasks” for a ByteDance research project, ByteDance said.

None of the intern’s sabotage impacted ByteDance’s commercial projects or online businesses, ByteDance said, and none of ByteDance’s large models were affected.

Online rumors suggested that more than 8,000 graphical processing units were involved in the sabotage and that ByteDance lost “tens of millions of dollars” due to the intern’s interference, but these claims were “seriously exaggerated,” ByteDance said.

The tech company also accused the intern of adding misleading information to his social media profile, seemingly posturing that his work was connected to ByteDance’s AI Lab rather than its commercial technology team. In the statement, ByteDance confirmed that the intern’s university was notified of what happened, as were industry associations, presumably to prevent the intern from misleading others.

ByteDance’s statement this weekend didn’t seem to silence all the rumors online, though.

One commenter on ByteDance’s social media post disputed the distinction between the AI Lab and the commercial technology team, claiming that “the commercialization team he is in was previously under the AI Lab. In the past two years, the team’s recruitment was written as AI Lab. He joined the team as an intern in 2021, and it might be the most advanced AI Lab.”

ByteDance intern fired for planting malicious code in AI models Read More »

us-suspects-tsmc-helped-huawei-skirt-export-controls,-report-says

US suspects TSMC helped Huawei skirt export controls, report says

In April, TSMC was provided with $6.6 billion in direct CHIPS Act funding to “support TSMC’s investment of more than $65 billion in three greenfield leading-edge fabs in Phoenix, Arizona, which will manufacture the world’s most advanced semiconductors,” the Department of Commerce said.

These investments are key to the Biden-Harris administration’s mission of strengthening “economic and national security by providing a reliable domestic supply of the chips that will underpin the future economy, powering the AI boom and other fast-growing industries like consumer electronics, automotive, Internet of Things, and high-performance computing,” the department noted. And in particular, the funding will help America “maintain our competitive edge” in artificial intelligence, the department said.

It likely wouldn’t make sense to prop TSMC up to help the US “onshore the critical hardware manufacturing capabilities that underpin AI’s deep language learning algorithms and inferencing techniques,” to then limit access to US-made tech. TSMC’s Arizona fabs are supposed to support companies like Apple, Nvidia, and Qualcomm and enable them to “compete effectively,” the Department of Commerce said.

Currently, it’s unclear where the US probe into TSMC will go or whether a damaging finding could potentially impact TSMC’s CHIPS funding.

Last fall, the Department of Commerce published a final rule, though, designed to “prevent CHIPS funds from being used to directly or indirectly benefit foreign countries of concern,” such as China.

If the US suspected that TSMC was aiding Huawei’s AI chip manufacturing, the company could be perceived as avoiding CHIPS guardrails prohibiting TSMC from “knowingly engaging in any joint research or technology licensing effort with a foreign entity of concern that relates to a technology or product that raises national security concerns.”

Violating this “technology clawback” provision of the final rule risks “the full amount” of CHIPS Act funding being “recovered” by the Department of Commerce. That outcome seems unlikely, though, given that TSMC has been awarded more funding than any other recipient apart from Intel.

The Department of Commerce declined Ars’ request to comment on whether TSMC’s CHIPS Act funding could be impacted by their reported probe.

US suspects TSMC helped Huawei skirt export controls, report says Read More »

monthly-roundup-#23:-october-2024

Monthly Roundup #23: October 2024

It’s monthly roundup time again, and it’s happily election-free.

Propaganda works, ancient empires edition. This includes the Roman Republic being less popular than the Roman Empire and people approving of Sparta, whereas Persia and Carthage get left behind. They’re no FDA.

Polling USA: Net Favorable Opinion Of:

Ancient Athens: +44%

Roman Empire: +30%

Ancient Sparta: +23%

Roman Republican: +26%

Carthage: +13%

Holy Roman Empire: +7%

Persian Empire: +1%

Visigoths: -7%

Huns: -29%

YouGov / June 6, 2024 / n=2205

What do we do about all 5-star ratings collapsing the way Peter describes here?

Peter Wildeford: TBH I am pretty annoyed that when I rate stuff the options are:

“5 stars – everything was good enough I guess”

“4 stars – there was a serious problem”

“1-3 stars – I almost died”

I can’t express things going well!

I’d prefer something like:

5 stars – this went above/beyond, top 10%

4 stars – this met my expectations

3 stars – this was below my expectations but not terrible

2 stars – there was a serious problem

1 star – I almost died

Kitten: The rating economy for things like Airbnb, Uber etc. made a huge mistake when they used the five-star scale. You’ve got boomers all over the country who think that four stars means something was really good, when in fact it means there was something very wrong with the experience.

Driver got lost for 20 minutes and almost rear ended someone, four stars

Boomer reviewing their Airbnb:

This is one of the nicest places I have ever stayed, the decor could use a little updating, four stars.

A lot of people saying the boomers are right but not one of you mfers would even consider booking an Airbnb with a 3.5 rating because you know as well as I do that means there’s something really wrong with it.

Nobe: On Etsy you lose your “star seller” rating if it dips below 4.8. A couple of times I’ve gotten 4 stars and I’ve been beside myself wondering what I did wrong even when the comment is like “I love it, I’ll cherish it forever”

Moshe Yudkowsky: The first time I took an Uber, and rated a driver 3 (average), Uber wanted to know what was wrong. They corrupted their own metric.

Kate Kinard: I’m at an airbnb right now and this magnet is on the fridge as a reminder

⭐️⭐️⭐️⭐️= many issues to fix!

The problem is actually worse than this. Different people have different scales. A majority of people use the system where 4-stars means major issues, and many systems demand you maintain e.g. a 4.8. All you get are extreme negative selection.

Then there are others who think the default is 3 stars, 4 is good and 5 is exceptional.

Which is the better system, but not if everyone else is handing out 5s like candy, which means your rating is a function of who is rating you more than whether you did a good job. Your ‘negative selection’ is 50% someone who doesn’t know the rules.

This leads to perverse ‘worse is better’ situations, where you want products that draw in the audience that will use the lower scale, or you want something that will sometimes offend people and trigger 1s, such as being ‘too authentic’ or not focusing enough on service.

Thus this report, that says the Japanese somehow are using the good set of rules?

Mrs. C: I love the fact that in Japan you need to avoid 5 star things and look for 3-4 star places because Japanese people tend to use a 5 point scale sanely and it’s only foreigners giving 5 stars to everything, so a 5 star rating means “only foreigners go here”

Eliezer Yudkowsky: How the devil did Japan end up using 5-point scales sanely? I have a whole careful unpublished analysis of everything that goes wrong with 5-point rating systems; it hadn’t occurred to me that any other country would end up using them sanely!

What makes this even weirder is Japan is a place where people are taught never to tell someone no. One can imagine them being one of places deepest in the 5-star-only trap. Instead, this seems almost like an escape valve, maybe? You don’t face the social pressure, there isn’t a clear ‘no’ involved, and suddenly you get to go nuts. Neat.

One place that escapes this trap even here are movie ratings. Everyone understands that a movie rating of 4/5 means the movie was very good, perhaps excellent. We get that the best movies are much better than a merely good movie, and this difference matters, you want active positive selection. It also helps that you are not passing judgment on a particular person or local business, and there is no social exchange where you feel under pressure to maximize the rating metric.

This helps explain why Rotten Tomatoes is so much worse than Metacritic and basically can only be used as negative selection – RT uses a combination of binaries, which is the wrong question to ask, whereas Metacritic translates each review into a number. It also hints at part of why old Netflix predictions were excellent, as they were based on a 5-star scale, versus today’s thumbs-based ratings, which then are combined with pushing their content and predicting what you’ll watch rather than what you’ll like how much.

This statement might sound strange but it seems pretty much true?

Liz: The fact that it’s cheaper to cook your own food is disturbing to me. like frequently even after accounting for your time. like cooking scales with number of people like crazy. there’s no reason for this to be the case. I don’t get it.

In the liztopia restaurants are high efficiency industrial organizations and making your own food is akin to having a hobby for gardening.

I literally opened a soylent right after posting this. i’m committed to the bit.

Gwern: The best explanation I’ve seen remains regulation and fixed costs: essentially, paternalistic goldplating of everything destroys all the advantages of eating out. Just consider how extremely illegal it would be to run a restaurant the way you run your kitchen. Or outlawing SRO.

Doing your own cooking has many nice benefits. You might enjoy cooking. You get to customize the food exactly how and when you like it, choose your ingredients, and enjoy it at home, and so on. The differential gives poorer people the opportunity to save money. I might go so far as to say that we might be better off for the fact that cooking at home is cheaper.

It’s still a statement about regulatory costs and requirements, essentially, that it is often also cheaper. In a sane world, cooking at home would be a luxury. Also in a sane world, we would have true industrialized at least the cheap cooking at this point. Low end robot chefs now.

Variety covers studio efforts to counter ‘Toxic Fandom,’ where superfans get very angry and engage in a variety of hateful posts, often make threats and sometimes engage in review bombing. It seems this is supposedly due to ‘superfans,’ the most dedicated, who think something is going to destroy their precious memories forever. The latest strategy is to hire those exact superfans, so you know when you’re about to walk into this, and perhaps you can change course to avoid this.

The reactions covered in the past mostly share a common theme, which is that they are rather obviously pure racism or homophobia, or otherwise called various forms of ‘woke garbage.’ This is very distinct from what they site as the original review bomb on Star Wars Episode IX, which I presume had nothing to do with either of these causes, and was due to the movie indeed betraying and destroying our childhoods by being bad.

The idea of bringing in superfans so you understand which past elements are iconic and important, versus which things you can change, makes sense. I actually think that’s a great idea, superfans can tell you are destroying the soul of the franchise, breaking a Shibboleth, or if your ideas flat out suck. That doesn’t mean you should or need to listen or care when they’re being racists.

Nathan Young offers Advice for Journalists, expressing horror at what seem to be the standard journalistic norms of quoting anything anyone says in private, out of context, without asking permission, with often misleading headlines, often without seeking to preserve meaning or even get the direct quote right, or to be at all numerate or aware of reasonable context for a fact and whether it is actually newsworthy. His conclusion is thus:

Nathan Young: Currently I deal with journalists like a cross between hostile witnesses and demonic lawyers. I read articles expecting to be misled or for facts to be withheld. And I talk to lawyers only after invoking complex magics (the phrases I’ve mentioned) to stop them taking my information and spreading it without my permission. I would like to pretend I’m being hyperbolic, but I’m really not. I trust little news at first blush and approach conversations with even journalists I like with more care than most activities.

I will reiterate. I take more care talking to journalists than almost any other profession and have been stressed out or hurt by them more often than almost any group. Despite this many people think I am unreasonably careless or naïve. It is hard to stress how bad the reputation of journalists is amongst tech/rationalist people.

Is this the reputation you want?

Most people I know would express less harsh versions of the same essential position – when he says that the general reputation is this bad, he’s not kidding. Among those who have a history interacting with journalists, it tends to be even worse.

The problem is largely the standard tragedy of the commons – why should one journalist sacrifice their story to avoid giving journalists in general a bad name? There was a time when there were effective forms of such norm enforcement. That time has long past, and personal reputations are insufficiently strong incentives here.

As my task has trended more towards a form of journalism, while I’ve gotten off light because it’s a special case and people I interact with do know I’m different, I’ve gotten a taste of the suspicion people have towards the profession.

So I’d like to take this time here to reassure everyone that I abide by a different code than the one Nathan Young describes in his post. I don’t think the word ‘journalist’ changes any of my moral or social obligations here. I don’t think that ‘the public has a right to know’ means I get to violate the confidence or preferences of those around me. Nor do I think that ‘technically we did not say off the record’ or ‘no takesies backsies’ means I am free to share private communications with anyone, or to publish them.

If there is something I am told in private, and I suspect you would have wanted to say it off the record, and we didn’t specify on the record, I will actively check. If you ask me to keep something a secret, I will. If you retroactively want to take something you said off the record, you can do that. I won’t publish something from a private communication unless I feel it was understood that I might do that, if unclear I will ask, and I will use standard common sense norms that respect privacy when considering what I say in other private conversations, and so on. I will also glamorize as necessary to avoid implicitly revealing whether I have hidden information I wouldn’t be able to share, and so on, as best I can, although nobody’s perfect at that.

I knew Stanford hated fun but wow, closing hiking trails when it’s 85 degrees outside?

It certainly seems as if Elon Musk is facing additional interference in regulatory requirements for launching his rockets, as a result of people disliking his political activities and decisions regarding Starlink. That seems very not okay, as in:

Alex Nieves (Politico): California officials cite Elon Musk’s politics in rejecting SpaceX launches.

Elon Musk’s tweets about the presidential election and spreading falsehoods about Hurricane Helene are endangering his ability to launch rockets off California’s central coast.

The California Coastal Commission on Thursday rejected the Air Force’s plan to give SpaceX permission to launch up to 50 rockets a year from Vandenberg Air Force Base in Santa Barbara County.

“Elon Musk is hopping about the country, spewing and tweeting political falsehoods and attacking FEMA while claiming his desire to help the hurricane victims with free Starlink access to the internet,” Commissioner Gretchen Newsom said at the meeting in San Diego.

“I really appreciate the work of the Space Force,” said Commission Chair Caryl Hart. “But here we’re dealing with a company, the head of which has aggressively injected himself into the presidential race and he’s managed a company in a way that was just described by Commissioner Newsom that I find to be very disturbing.”

There is also discussion about them being ‘disrespected’ by the Space Force. There are some legitimate issues involved as well, but this seems like a confession of regulators punishing Elon Musk for his political speech and actions?

I mean, I guess I appreciate that He Admit It.

Palmer Lucky: California citing Elon’s personal political activity in denying permission for rocket launches is obviously illegal, but the crazier thing IMO is how they cite his refusal to activate Starlink in Russian territory at the request of Ukraine. Doing so would have been a crime!

I do not think those involved have any idea the amount of damage such actions do, either to our prosperity – SpaceX is important in a very simple and direct way, at least in worlds where AI doesn’t render it moot – and even more than that the damage to our politics and government. If you give people this kind of clear example, do not act surprised when they turn around and do similar things to you, or consider your entire enterprise illegitimate.

That is on top of the standard ‘regulators only have reason to say no’ issues.

Roon: In a good world faa would have an orientation where they get credit for and take pride in the starship launch.

Ross Rheingans-Yoo: In a good world every regulator would get credit for letting the successes through – balanced by equal blame for harmful failures – & those two incentives would be substantially stronger than the push to become an omniregulator using their perch to push a kitchen sink of things.

In other Elon Musk news: Starlink proved extremely useful in the wake of recent storms, with other internet access out indefinitely. It was also used by many first responders. Seems quite reasonable for many to have a Starlink terminal onhand purely as a backup.

An argument that all the bad service you are getting is a sign of a better world. It’s cost disease. We are so rich that labor costs more money, and good service is labor intensive, so the bad service is a good sign. Remember when many households had servants? Now that’s good service, but you don’t want that world back.

The obvious counterargument is that when you go to places that are poor, you usually get terrible service. At one point I would periodically visit the Caribbean for work, and the worst thing about it was that the service everywhere was outrageously terrible, as in your meal at a restaurant typically takes an extra hour or two. I couldn’t take it. European service is often also very slow, and rural service tends to be relatively slow. Whereas in places in America where people cost the most to employ, like New York City, the service is usually quite good.

There’s several forces at work here.

  1. We are richer, so labor costs more, so we don’t want to burn it on service.

  2. We are richer in some places, so we value our time and thus good service more, and are willing to pay a bit more to get it.

  3. We are richer in some places, in part because we have a culture that values good service and general hard work and not wasting time, so service is much better than in places with different values – at least by our own standards.

  4. We are richer in part due to ‘algorithmic improvements,’ and greater productivity, and knowing how to offer things like good service more efficiently. So it is then correct to buy more and better service, and people know what to offer.

  5. In particular: Servants provided excellent service in some ways, but were super inefficient. Mostly they ended up standing or sitting around not doing much, because you mostly needed them in high leverage spots for short periods. But we didn’t have a way to hire people to do things for you only when you needed them. Now we do. So you get to have most of the same luxury and service, for a fraction of the employment.

I think I actually get excellent service compared to the past, for a huge variety of things, and for many of the places I don’t it is because technology and the internet are taking away the need for such service. When I go to places more like the past, I don’t think the service is better – I reliably think the service is worse. I expect the actual past is the same, the people around you were cheaper to hire but relatively useless. Yes, you got ‘white glove service’ but why do I want people wearing white gloves?

Like Rob Bensinger here, I am a fan of Matt Yglesias and his campaign of ‘the thing you said it not literally true and I’m going to keep pointing that out.’ The question is when it is and isn’t worth taking the space and time to point out who is Wrong on the Internet, especially when doing politics.

Large study finds ability to concentrate is actually increasing in adults? This seems like a moment to defy the data, or at least disregard it in practice, there’s no way this can be real, right? It certainly does not match my lived experience of myself or others. Many said the graphs and data involved looked like noise. But that too would be great news, as ‘things are about the same’ would greatly exceed expectations.

Perhaps the right way to think about attention spans is that we have low intention tolerance, high willingness to context switch and ubiquitous distractions. It takes a lot more to hold our attention than it used to. Do not waste our time, the youth will not tolerate this. That is compatible with hyperfocusing on something sufficiently engaging, especially once buy-in has been achieved, even for very extended periods (see: This entire blog!), but you have to earn it.

Paul Graham asks in a new essay, when should you do what you love?

He starts with the obvious question. Does what you love offer good chances of success? Does it pay the bills? If what you love is (his examples) finding good trades or running a software company, of course you pursue what you love. If it’s playing football, it’s going to be rough.

He notes a kind of midwit-meme curve as one key factor:

  1. If you need a small amount of money, you can afford to do what you love.

  2. If you need a large amount of money, you need to do what pays more.

  3. If you need an epic amount of money, you will want to found a startup and will need unique insight, so you have to gamble on what you love.

The third consideration is, what do you actually want to do? He advises trying to figure this out right now, not to wait until after college (or for any other reason). The sooner you start the better, so investigate now if you are uncertain. A key trick is, look at the people doing what you might do, and ask if you want to turn into one of them.

If you can’t resolve the uncertainty, he says, try to give yourself options, where you can more easily switch tracks later.

This seems like one of the Obvious True and Useful Paul Graham Essays. These seem to be the correct considerations, in general, when deciding what to work on, if your central goal is some combination of ‘make money’ and ‘have a good life experience making it.’

The most obvious thing missing is the question of Doing Good. If you value having positive impact on the world, that brings in additional considerations.

A claim that studying philosophy is intellectually useful, but I think it’s a mistake?

Michael Prinzing: Philosophers say that studying philosophy makes people more rigorous, careful thinkers. But is that that true?

In a large dataset (N = 122,352 students) @daft_bookworm and I find evidence that it is!

In freshman year, Phil majors are more inclined than other students to support their views with logical arguments, consider alternative views, evaluate the quality of evidence, etc. But, Phil majors *alsoshow more growth in these tendencies than students in other majors.

This suggests that philosophy attracts people who are already rigorous, careful thinkers, but also trains people to be better thinkers.

Stefan Schubert: Seems worth noticing that they’re self-report measures and that the differences are small (one measure)/non-existent (the other)

Michael Prinzing: That’s right! Particularly in the comparison with an aggregate of all non-philosophy majors, the results are not terribly boosterish. But, in the comparison with more fine-grained groups of majors, it’s striking how much philosophy stands out.

barbarous: How come we find mathematics & computer science in the bottom of these? Wouldn’t we expect them to have higher baseline and higher improvement in rigor?

My actual guess is that the math and computer science people hold themselves to higher epistemic standards, that or the test is measuring the wrong thing.

Except this is their graph? The difference in growth is indeed very small, with only one line that isn’t going up like the others.

If anything, it’s Education that is the big winner on the top graph, taking a low base and making up ground. And given it’s self reports, there’s nothing like an undergraduate philosophy major to think they are practicing better thinking habits.

I mean, we can eyeball that, and the slopes are mostly the same across most of the majors?

Facial ticks predict future police cadet promotions at every stage, AUC score of 0.7. Importantly, with deliberate practice one can alter such facial ticks. Would changing the ticks actually change perceptions, even when interacting repeatedly in high stakes situations as police do? The article is gated, but based on what they do tell us I find it unlikely. Yes, the ticks are the best information available in this test and are predictive, but that does not mean they are the driving force. But it does seem worth it to fix any such ticks if you can?

Paul Graham: Renaming Twitter X doesn’t seem to have damaged it. But it doesn’t seem to have helped it either. So it was a waste of time and a domain name.

I disagree. You know it’s a stupid renaming when everyone does their best to keep using the old name anyway. I can’t think of anyone in real life that thinks ‘X’ isn’t a deeply stupid name, and I know many that got less inclined to use the product. So I think renaming Twitter to X absolutely damaged it and drove people away and pissed them off. The question is one of magnitude – I don’t think this did enough damage to be a crisis, but it did enough to hurt, in addition to being a distraction and cost.

Twitter ends use of bold and other formatting in the main timeline, because an increasing number of accounts whoring themselves out for engagement were increasingly using more and more bold and italics. Kudos to Elon Musk for responding to an exponential at the right time. Soon it was going to be everywhere, because it was working, and those of us who find it awful weren’t punishing it enough to matter to the numbers. There’s a time and place for selective and sparing use of such formatting, but this has now been officially Ruined For Everyone.

It seems people keep trying to make the For You page on Twitter happen?

Emmett Shear: Anyone else’s For You start filling up with extreme slop nonsense, often political? “Not interested” x20 fixes it for a day but then it’s back again. It’s getting bad enough to make me stop using Twitter…frustrating because the good content is still good, the app just hides it.

TracingWoods: it’s cyclical for me but the past couple of weeks have been fine. feels like a specific switch flips occasionally, and no amount of “not interested” stops it. it should rotate back into sanity for you soon enough.

I checked for journalist purposes, and my For You page looks… exactly like my Following feed, plus some similar things that I’m not technically following and aren’t in lists especially when paired with interactions with those who I do follow, except the For You stuff is scrambled so you can’t rely on it. So good job me, I suppose? It still doesn’t do anything useful for me.

A new paper on ruining it for everyone, social media edition, is called ‘Inside the funhouse mirror factory: How social media distorts perceptions of norms.’ Or, as an author puts it, ‘social media is not reality,’ who knew?

Online discussions are dominated by a surprisingly small, extremely vocal, and non-representative minority. Research on social media has found that, while only 3% of active accounts are toxic, they produce 33% of all content. Furthermore, 74% of all online conflicts are started in just 1% of communities, and 0.1% of users shared 80% of fake news. Not only does this extreme minority stir discontent, spread misinformation, and spark outrage online, they also bias the meta-perceptions of most users who passively “lurk” online.

The strategy absolutely works. In AI debates on Twitter, that 3% toxic minority works hard to give the impression that their position is what everyone thinks, promote polarization and so on. From what I can tell politics has it that much worse.

Indeed, 97% of political posts from Twitter/X come from just 10% of the most active users on social media.

That’s a weird case, because most Twitter users are mostly or entirely lurkers, so 10% of accounts plausibly includes most posts period.

The motivation for all this is obvious, across sides and topics. If you have a moderate opinion, why would it post about that, especially with all that polarized hostility? There are plenty of places I have moderate views, and then I don’t talk about them on social media (or here, mostly) because why would I need to do that?

One of the big shifts in AI is the rise of more efficient Ruining It For Everyone. Where previously the bad actors were rate limited and had substantial marginal costs, those limitations fall away, as do various norms keeping people behaving decently. Systems that could take a certain amount of such stress will stop working, and we’ll need to make everything more robust against bad actors.

The great news is that if it’s a tiny group ruining it for everyone, you can block them.

Yishan: “0.1% of users share 80% of fake news”

After that document leak about how Russia authors its fake news, I’ve been able to more easily spot disinfo accounts and just block them from my feed.

I only needed to do this for a couple weeks and my TL quality improved markedly. There’s still plenty of opinion from right and left, but way less of the “shit-stirring hysteria” variety.

If you are wondering what leak it was, itʻs the one described in this thread.

Youʻll see that the main thrust is to exploit: “They are afraid of losing the American way of life and the ‘American dream.’ It is these sentiments that should be exploited,”

In the quoted screenshot, the key element is at the bottom: – use a minimum of fake news and a maximum of realistic information – continuously repeat that this is what is really happening, but the official media will never tell you or show it to you.

The recent port strike and Hurricane Helene were great for this because whenever thereʻs a big event, the disinfo accounts appear to hyper-focus on exploiting it, so a lot of their posts get a lot of circulation, and you can start to spot them.

The pattern you look for is:

  1. The post often talks about how youʻre not being told the truth, or itʻs been hidden from you. Theyʻre very obvious with it. A more subtle way is that they end with a question asking if there is something sinister going on.

  2. the second thing is that it does cite a bunch of real/realistic (or already well-known facts) and then connects it to some new claim, often one you haven’t heard any other substantiation for. This could be real, but it’s the cluster of this plus the other points.

  3. The third is that the author doesn’t seem to be a real person. Now, this is tough, because there are plenty of real anon accounts. but it’s a sort of thing you can tell from a combination of the username (one that seems weird or has a lot of numbers, or doesn’t fit the persona presented), the picture isn’t a real person, the persona is a little too “bright”, or the character implied by the bio doesn’t seem like the kind of person who’d suddenly care a lot about this issue. This one requires a bit of intuition.

None of these things is by itself conclusive (and I might have blocked some false positives), but once you start knowing what to spot, there’s a certain kind of post and when you look at the account, it has certain characteristics that stick out.

It just doesn’t look like your normal extreme right-wing or extreme left-wing real person. People like that tend to make more throwaway (“I hate this! Can’t believe Harris/Elon/Trump is so awful!”) posts, not carefully-styled media-delicious posts, if that makes sense.

I mostly prefer to toss out anyone who spends their social media expressing political opinions, except for an intentional politics list (that I should update some time soon, it’s getting pretty old).

What Yishan is doing sounds like it would be effective at scale if sustained, but you’d have to put in the work. And it’s a shame that he has to do it all himself. Ideally an AI could help you do that (someone build this!) but at minimum you’d want a group of people who can share such blocks, so if someone hits critical mass then by default they get blocked throughout. You could provide insurance in various forms – e.g. if you’ve interacted with them yourself or they’re at least a 2nd-level follow, then you can exempt those accounts, and so on. Sky’s the limit, we have lots of options

Maybe we can quickly make an app for that?

Tenobrus: i have a lotta mutuals who i would love to follow but be able to mute some semantic subset of their posts. like give me this guy but without the dumb politics, or that girl but without the thirst traps, or that tech bro but without the e/acc.

This seems super doable, on the ‘I am tempted to build an MVP myself’ level. I asked o1-preview, and it called it ambitious but agreed it could be done, and even for a relatively not great programmer suggested maybe 30-50 hours to an MVP. Who’s in?

Or maybe it’s even easier?

Jay Van Bavel: Unfollowing toxic social media influencers makes people less hostile!

The list includes accounts like CNN, so your definition of ‘hyperpartisan’ may vary, but it doesn’t seem crazy and it worked.

If you want to fix the social media platforms themselves to avoid the toxic patterns, you have to fix the incentives, and that means you will need law. Even if all the companies were to get together to agree not to use ‘rage maximizers’ or various forms of engagement farming, that would be antitrust. Without an agreement, they don’t have much choice. So, law, except first amendment and the other real concerns about using a law there.

My best proposal continues to be a law mandating that large social media platforms offer access to alternative interfaces and forms of content filtering and selection. Let people choose friendly options if they want that.

Otherwise, of course you are going to get things like TikTok.

NPR reports on internal TikTok communications where they spoke candidly about the dangers for children on the app, exploiting a mistaken failure to redact that information from one of the lawsuits against TikTok.

As TikTok’s 170 million U.S. users can attest, the platform’s hyper-personalized algorithm can be so engaging it becomes difficult to close the app. TikTok determined the precise amount of viewing it takes for someone to form a habit: 260 videos. After that, according to state investigators, a user “is likely to become addicted to the platform.”

In the previously redacted portion of the suit, Kentucky authorities say: “While this may seem substantial, TikTok videos can be as short as 8 seconds and are played for viewers in rapid-fire succession, automatically,” the investigators wrote. “Thus, in under 35 minutes, an average user is likely to become addicted to the platform.”

They also note that the tool that limits time usage, which defaulted to a rather large 60 minutes a day, had almost no impact on usage in tests (108.5 min/day → 107).

One document shows one TikTok project manager saying, “Our goal is not to reduce the time spent.”

Well, yes, obviously. In general it’s good to get confirmation on obvious things, like that TikTok was demoting relatively unattractive people in its feeds, I mean come on. And yes, if 95% (!) of smartphone users under 17 are on TikTok, usually for extended periods, that will exclude other opportunities for them.

And yes, the algorithm will trap you into some terrible stuff, that’s what works.

During one internal safety presentation in 2020, employees warned the app “can serve potentially harmful content expeditiously.” TikTok conducted internal experiments with test accounts to see how quickly they descend into negative filter bubbles.

“After following several ‘painhub’ and ‘sadnotes’ accounts, it took me 20 mins to drop into ‘negative’ filter bubble,” one employee wrote. “The intensive density of negative content makes me lower down mood and increase my sadness feelings though I am in a high spirit in my recent life.”

Another employee said, “there are a lot of videos mentioning suicide,” including one asking, “If you could kill yourself without hurting anybody would you?”

In particular it seems moderation missed self-harm and eating disorders, but also:

TikTok acknowledges internally that it has substantial “leakage” rates of violating content that’s not removed. Those leakage rates include: 35.71% of “Normalization of Pedophilia;” 33.33% of “Minor Sexual Solicitation;” 39.13% of “Minor Physical Abuse;” 30.36% of “leading minors off platform;” 50% of “Glorification of Minor Sexual Assault;” and “100% of “Fetishizing Minors.”

None of this is new or surprising. I affirm that I believe we should, indeed, require that TikTok ownership be transferred, knowing that is probably a de facto ban.

The obvious question is, in the age of multimodal AI, can we dramatically improve on at least this part of the problem? TikTok might be happy to serve up an endless string of anorexia videos, but I do not think they want to be encouraging sexual predators. In addition to being really awful, it is also very bad for business. I would predict that it would take less than a week to get a fine-tune of Llama 3.2, based on feeding it previously flagged and reviewed videos as the fine-tune data, that would do much better than these rates at identifying violating TikTok videos. You could check every video, or at least every video that would otherwise get non-trivial play counts.

Old man asks for help transferring his contacts, family realizes he has sorted his contacts alphabetically by friendship tier and not all of them are in the tier they would expect.

Lu In Alaska: Stop what you’re doing and read the following:

All the kids and in-laws and grands have met up for breakfast at my geriatric dad’s house. My sisters are here. Their boys are here. We are eating breakfast. My dad asks for help transferring his contacts into his new phone.

Friends. We discovered together that my dad has his contacts in a tier list of his feelings not alphabetically. We are absolutely *beside ourselvesreviewing his tiers off as a whole family. Crying. Gasping. Wheezing. His ex-wife who is visiting today is C tier but his first wife’s sister is B tier THE DRAMA.

So like my name is in as ALu. His brother-in-law is BJim. He is rating us. I am DYING. Someone find CAnn she’s going to be pissed. Let’s sit back and watch.

The kids made A tier what a relief. Should be A+Lu

I love this, and also this seems kind of smart (also hilarious) given how many contacts one inevitably gathers? I have 8 contacts that are not me and that begin with Z, and 7 that begin with Y. You get a ‘favorites’ page, but you only get one. You can use labels, but the interface for them is awkward.

Seriously, how hard is it to ensure this particular autocorrect doesn’t happen?

Cookingwong: The fact that my phone autocorrects “yeah np” to “yeah no” has caused 3 divorces, 2 gang wars, 11 failed hostage negotiations, and $54 billion loss in GDP.

‘Np’ is a standard thing to say, yet phones often think it is a typo and autocorrect it to its exact opposite. Can someone please ensure that ‘np’ gets added to the list of things that do not get corrected?

Apple is working on smart glasses that would make use of Vision Pro’s technology, aiming for a 2027 launch, along with potential camera-equipped AirPods. Apple essentially forces you to pick a side, either in or out, so when the Vision Pro came out I was considering whether to switch entirely to their products, and concluded that the device wasn’t ready. But some version of it or of smart glasses will be awesome when someone finally pulls them off properly, the question is when and who.

There is the theory that the tech industry is still in California because not enforcing non-competes is more important than everything else combined. I don’t doubt it helps but also companies can simply not require such agreements at this point? I think mostly it’s about path dependence, network effects and lock-in at this point.

What is important in a hotel room?

Auren Hoffman: things all hotel rooms should have (but don’t): MUCH more light. room key from phone. SUPER fast wifi. tons of free bottled water. outlets every few feet. what else?

Sheel Mohnot: blackout curtains

a single button to turn off every light in the room

check in via kiosk

Andres Sandberg: A desk, a hairdryer.

Humberto: 1. Complete blackout 2. 0 noise/ shutdown everything including the fucking refrigerator hidden inside a cabinet but still audible 3. Enough space for a regular sized human to do some push ups 4. Laundry bags (can be paper) 5. I was going to say an AirPlay compatible tv but clearly optional this one.

Ian Schafer: Mag/Qi phone charging stand.

Emily Mason: USB and USB_C fast charging ports sockets (and a few cords at the desk).

The answers are obvious if you ask around, and most of them are cheap to implement.

My list at this point of what I care about that can plausibly be missing is something like this, roughly in order:

  1. Moderately comfortable bed or better. Will pay for quality here.

  2. Sufficient pillows and blankets.

  3. Blackout curtains, no lights you cannot easily turn off. No noise.

  4. Excellent wi-fi.

  5. AC/heat that you can adjust reasonably.

  6. Desk with good chair.

  7. Access to good breakfast, either in hotel or within an easy walk.

  8. Decent exercise room, which mostly means weights and a bench.

  9. Outlets on all sides of the bed, and at desk, ideally actual ports and chargers.

  10. Access to good free water, if tap is bad there then bottled is necessary.

  11. TV with usable HDMI port, way to stream to it, easy access to streaming services.

  12. Refrigerator with space to put things.

  13. Views are a nice to have.

The UK to require all chickens be registered with the state, with criminal penalties.

City of Casselberry warns storm victims not to repair fences without proper permits.

The FAA shut down flights bringing hurricane aid into Western North Carolina, closing the air space, citing the need for full control. It’s possible this actually makes sense, but I am very skeptical.

California decides to ‘ban sell-by dates’ by which they mean they’re going to require you to split that into two distinct numbers or else:

Merlyn Miller (Food and Wine): he changes will take effect starting on July 1, 2026, and impact all manufacturers, processors, and retailers of food for human consumption. To adhere with the requisite language outlined, any food products with a date label — with the exception of infant formula, eggs, beer, and malt beverages — must state “Best if Used By” to indicate peak quality, and “Use By” to designate food safety. By reducing food waste, the legislation (Assembly Bill No. 660) may ultimately save consumers money and combat climate change too.

It’s so California to say you are ‘banning X’ and instead require a second X.

The concern seems to be that some people would think they needed to throw food out if it was past its expiration date, leading to ‘food waste.’ But wasn’t that exactly what the label was for and what it meant? So won’t this mean you’ll simply have to add a second earlier date for ‘peak quality,’ and some people will then throw out anything past that date too? Also, isn’t ‘peak quality’ almost always ‘the day or even minute we made this?’

Who is going to buy things that are past ‘peak quality’ but not expired? Are stores going to have to start discounting such items?

Therefore I predict this new law net increases both confusion and food waste.

US Government mandates companies create interception portals so they can wiretap Americans when needed. Chinese hackers compromise the resulting systems. Whoops.

Timothy Lee notes that not only are injuries from Waymo crashes 70% less common per passenger mile than for human drivers, the human drivers are almost always at fault when the Waymo accidents do happen.

Joe Biden preparing a ban on Russian and Chinese self-driving car technology, fearing that the cars might suddenly do what the Russians or Chinese want them to do.

I have now finished the TV series UnREAL. The news is good, and there are now seven shows in my tier 1. My guess is this is my new #5 show of all time. Here’s the minimally spoilerific pitch: They’re producing The Bachelor, and also each other, by any means necessary, and they’re all horrible people.

I got curious enough afterwards to actually watch The Bachelor, which turns out to be an excellent new show to put on during workouts and is better for having watched UnREAL first, but very much will not be joining the top tiers. Is biggest issue is that it’s largely the same every season so I’ll probably tire of it soon. But full strategic analysis is likely on the way, because if I’m watching anyway then there’s a lot to learn.

A teaser note: Everlasting, the version on UnREAL, is clearly superior to The Bachelor. There are some really good ideas there, and also the producers on The Bachelor are way too lazy. Go out there and actually produce more, and make better editing decisions.

I can also report that Nobody Wants This is indeed poorly named. You’ll want this.

I continue to enter my movie reviews at Letterboxd, but also want to do some additional discussion here this month.

We start with the Scott Sumner movie reviews for Q3, along with additional thoughts from him, especially about appreciating films where ‘nothing is happening.’ This is closely linked to his strong dislike of Hollywood movies, where something is always happening, even if that something is nothing. The audience insists upon it.

This was the second month I entered Scott’s ratings and films into a spreadsheet. Something jumped out quite a bit. Then afterwards, I discovered Scott’s reviews have all been compiled already.

Last quarter his lowest rated new film, a 2.6, was Challengers. He said he knew he’d made a mistake before the previews even finished and definitely after a few minutes. Scott values different things than I do but this was the first time I’ve said ‘no Scott Sumner, your rating is objectively wrong here.’

This quarter his lowest rating, a truly dismal 1.5, was for John Wick, with it being his turn to say ‘nothing happens’ and wondering if it was supposed to be a parody, which it very much isn’t.

There’s a strange kind of mirror here? Scott loves cinematography, and long purposeful silences, painting pictures, and great acting. I’m all for all of that, when it’s done well, although with less tolerance for how much time you can take – if you’re going to do a lot of meandering you need to be really good.

So when I finally this month watched The Godfather without falling asleep while trying (cause if I like Megalopolis, I really have no excuse) I see how it is in Scott’s system an amazingly great film. I definitely appreciated it on that level. But I also did notice why I’d previously bounced off, and also at least two major plot holes where plot-central decisions make no sense, and I noticed I very much disliked what the movie was trying to whisper to us. In the end, yeah I gave it a 4.0, but it felt like work, or cultural research, and I notice I feel like I ‘should’ watch Part II but I don’t actually want to do it.

Then on the flip side there’s not only the simple joys of the Hollywood picture, there’s the ability to extract what is actually interesting and the questions being asked, behind all that, if one pays attention.

In the case of John Wick, I wrote a post about the first 3 John Wick movies, following up with my review of John Wick 4 here, and I’d be curious what Scott thinks of that explanation. That John Wick exists in a special universe, with a unique economy and set of norms and laws, and you perhaps come for the violence but you stay for the world building. Also, I would add, how people react to the concept of the unstoppable force – the idea that in-universe people know that Wick is probably going to take down those 100 people, if he sets his mind to it, so what do you do?

Scott’s write-up indicates he didn’t see any of that.

Similarly, the recent movie getting the lowest rating this quarter from Scott was Megalopolis, at 3.0 out of his 4, the minimum to be worth watching, whereas I have it at 4.5 out of 5. Scott’s 3 is still a lot higher than the public, and Scott says he didn’t understand the plot and was largely dismissive of the results, but he admired the ambition and thought it was worth seeing for that. Whereas to me, yes a lot of it is ‘on the nose’ and the thing is a mess but if Scott Sumner says he didn’t get what the central conflict was about beyond vague senses then how can it be ‘too on the nose’?

I seriously worry that we live in a society where people somehow find Megalopolis uninteresting, and don’t see the ideas in front of their face or approve of or care for those ideas even if they did. And I worry such a society is filled, as the film notes, with people who no longer believe in it and in the future, and thus will inevitably fall – a New Rome, indeed. In some sense, the reaction to the film, people rejecting the message, makes the message that much more clear.

Discussion question: Should you date or invest in anyone who disliked Megalopolis?

I then went and checked out the compilation of Scott’s scores. The world of movies is so large. I haven’t seen any of his 4.0s. From his 3.9s, the only one I saw and remember was Harakiri, which was because I was testing the top of the Letterboxd ratings (with mixed results for that strategy overall), and for my taste I only got to 4.5 and couldn’t quite get to 5, by his scale he is clearly correct. From his 3.8s I’m confident I’ve seen Traffic, The Empire Strikes Back, The Big Lebowski, No Country for Old Men and The Lord of the Rings. Certainly those are some great picks.

There are some clear things Scott tends to prefer more than I do, so there are some clear adjustments I can make: The more ‘commercial,’ recent, American, short, fast or ‘fun’ the more I should adjust upwards, and vice versa, plus my genre, topic and actor preferences. In a sense you want to know ‘Scott rating above replacement for certain known things’ rather than Scott’s raw rating, and indeed that is the right way to evaluate most movie ratings if you are an advanced player.

At minimum, I’m clearly underusing the obvious ‘see Scott’s highly ranked picks with some filtering for what you’d expect to like.’

As opposed to movie critics in general, who seem completely lost and confused – I’ve seen two other movies since and no one seems to have any idea what either of them was even about.

The Substance (trailer-level spoilers) is another misunderstood movie from this month that makes one worry for our civilization. Everyone, I presume including those who made the film, is missing the central point. Yes, on an obvious level (and oh do they bring out the anvils) this is about beauty standards and female aging and body horror and all that. But actually it’s not centrally about that at all. It’s about maximizing quality of life under game theory and decision theory, an iterated prisoner’s dilemma and passing of the torch between versions of yourself across time and generations.

This is all text, the ‘better version of yourself’ actress is literally named Qualley (her character is called Sue, which also counts if you think about it), and the one so desperately running out of time that she divides herself into two is named Demi Moore, and they both do an amazing job while matching up perfectly, so this is probably the greatest Kabbalistic casting job of all time.

Our society seems to treat the breakdown and failure of this, the failure to hear even as you are told in no uncertain terms over and over ‘THERE IS ONLY ONE YOU,’ as inevitable. We are one, and cannot fathom it.

Our society is failing this on a massive scale, from the falling fertility rate to the power being clung to by those who long ago needed to hand things off, and in reverse by those who do not understand what foundations their survival relies upon.

Now consider the same scenario as the movie, except without requiring stabilization – the switch is 100% voluntary each time. Can we pass this test? What if the two sides are far less the ‘same person’ as they are here, say the ‘better younger’ one is an AI?

I ask because if we are to survive, we will have to solve vastly harder versions of such problems. We will need to solve them with ourselves, with each other, and with AIs. Things currently do not look so good on these fronts.

Joker: Folie à Deux is another movie that is not about what people think, at all. People think it’s bad, and especially that its ending is bad, and their reasons for thinking this are very bad. I’m not saying it’s a great film, but both Joker movies are a lot better than I thought they were before the last five minutes of this one. I am sad that it was less effective because I was importantly spoiled, so if you decide to be in don’t ask any questions.

I also love this old story, Howard Hughes had insomnia and liked to watch late movies, so he bought a television station to ensure it would play movies late at night, and would occasionally call them up to order them to switch to a different one. Station cost him $34 million in today’s dollars, so totally Worth It.

Katherine Dee, also known as Default Friend, makes the case that the death or stasis of culture has been greatly exaggerated. She starts by noting that fashion, movies, television and music are indeed in decay. For fashion I’m actively happy about that. For music I agree but am mostly fine with it, since we have such great archives available. For movies and television, I see the argument, and there’s a certain ‘lack of slack’ given to modern productions, but I think the decline narratives are mostly wrong.

The real cast Katherine is making is that the new culture is elsewhere, on social media, especially the idea of the entire avatar of a performer as a work of art, to be experienced in real time and in dialogue with the audience (perhaps, I’d note, similarly to sports?).

I buy that there is something there and that it has cultural elements. Certainly we are exploring new forms on YouTube and TikTok. Some of it even has merit, as she notes the good TikTok tends to often be sketch comedy TikTok. I notice that still doesn’t make me much less sad and also I am not that tempted to have a TikTok account. I find quite a lot of the value comes from touchstones and reference points and being able to filter and distill things over time. If everything is ephemeral, or only in the moment, then fades, that doesn’t work for me, and over time presumably culture breaks down.

I notice I’m thinking about the distinction between sports, which are to be experienced mostly in real time, with this new kind of social media performance. The difference is that sports gives us a fixed set of reference points and meaningful events, that everyone can share, especially locally, and also then a shared history we can remember and debate. I don’t think the new forms do a good job of that, in addition to the usual other reasons sports are awesome.

Robin Hanson has an interesting post about various features.

We all have many kinds of features. I collected 16 of them, and over the last day did four sets of polls to rank them according to four criteria: 

  • Liked – what features of you do you most want to be liked for?

  • Pick – what features of them do you most use to pick associates?

  • Future – what features most cause future folks to be like them?

  • Improve – what features do you most want to improve in yourself?

Here are priorities (relative to 100 max) from 5984 poll responses: 

As I find some of the Liked,Pick choices hard to believe, I see those as more showing our ideals re such features weights. F weights seem more believable to me. 

Liked and Pick are strongly (0.85) correlated, but both are uncorrelated (-0.02,-0.08) with Future. Improve is correlated with all three (L:0.48, P:0.35, F:0.56), suggesting we choose what to improve as a combo of what influences future and what we want to be liked for now. (Best fit of Improve as linear combo of others is I = 1.12*L-0.94*P+0.33*F.)

Can anyone help me understand these patterns?

In some ways, the survey design choices Hanson made are even more interesting than the results, but I’ll focus on looking at the results.

The first thing to note is that people in the ‘Pick’ column were largely lying.

If you think you don’t pick your associates largely on the basis of health, stamina, looks, power, wealth, fame, achievements, connections or taste, I am here to inform you that you are probably fooling yourself on that.

There are a lot of things I value in associates, and I absolutely value intelligence and insight too, but I’m not going to pretend I don’t also care about the stuff listed above as well. I also note that there’s a difference between what I care about when initially picking associates or potential associates, versus what causes me to want to keep people around over the long term.

This column overall seems to more be answering the question ‘what features do you want to use as much as possible to pick your associates?’ I buy that we collectively want to use these low rated features less, or think of ourselves as using them less. But quite obviously we do use them, especially when choosing our associates initially.

Similarly, ‘liked’ is not what you are liked for, or what you are striving to acquire in order to be liked. It is what you would prefer that others like you for. Here, I am actually surprised Intelligence ranks so high, even though the pool of respondents it is Hanson’s Twitter. People also want to improve their intelligence in this survey, which implies this is about something more than inherent ability.

The ‘future’ column is weird because most people mostly aren’t trying to cause future folks in general to be more like themselves. They’re also thinking about it in a weird way. Why are ‘health’ and ‘cooperative’ ranked so highly here? What is this measuring?

Matt Mullenweg publishes his charitable contributions going back to 2011, as part of an ongoing battle with private equity firm Silver Lake. This could be a good norm to encourage, conspicuous giving rather than conspicuous consumption is great even when it’s done in stupid ways (e.g. to boast at charity galas for cute puppies with rare diseases) and you can improve on that.

What makes a science Nobel Laureate? Paul Novosad crunches the numbers. About half come from the ‘top 5%’ by income, but many do come from very non-elite backgrounds. The most common profession for fathers is business owner rather than professor, but that’s because a lot of people own businesses, whereas the ratio on professors is off the charts nuts, whereas growing up on a farm means you are mostly toast:

What is odd about Paul’s framing of the results is the idea that talent is evenly distributed. That is Obvious Nonsense. We are talking about the most elite of elite talent. If you have that talent, your parents likely were highly talented too, and likely inclined to similar professions. Yes, of course exposure to the right culture and ideas and opportunities and pushes in the right directions matter tons too, and yes most of the talent out on the farm or in the third world will be lost to top science, but we were not starting out on a level playing field here.

A lot of that 990:1 likelihood ratio for professors, and 160:1 for natural scientists, is a talent differential.

Whereas money alone seems to not help much. Business owners have only about a disappointing 2.5:1 likelihood ratio, versus elementary and secondary school teachers who are much poorer but come in around 8:1.

The cultural fit and exposure to science and excitement about science, together with talent for the field, are where it is at here.

If I were designing a civilization-level response to this, I would not be so worried about ‘equality’ in super high scientific achievement. There’s tons of talent out there, versus not that much opportunity. Instead, I would mostly focus on the opposite, the places where we have proven talent can enjoy oversized success, and I would try to improve that success. I care about the discoveries, not who makes them, so let’s ‘go where the money is’ and work with the children of scientists and professors, ensuring they get their shot, while also providing avenues for exceptional talent from elsewhere. Play to win.

I played through the main story of Gordian Quest, which I declare to be Tier 4 (Playable) but you probably shouldn’t. Sadly, in what Steam records as 18 hours, not once was there any serious danger anyone in the party would die, and when I finished the game I ‘still had all these’ with a lot of substantial upgrades being held back. Yes, you can move to higher difficulties, but the other problem is that the plot was as boring and generic as they come. Some going through the motions was fun, but I definitely was waiting for it to be over by the end.

Also the game kind of makes you sit around at the end of battles while you full heal and recharge your action meters, you either make this harder to do or you make it impossible. And it’s very easy to click the wrong thing in the skill grid and really hurt yourself permanently, although you had so much margin for error it didn’t matter.

Summary: There’s something here, and I think that a good game could be built using this engine, but alas this isn’t it. Not worth your time.

I finished my playthrough of the Canon of Creation from Shin Megami Tensei V: Vengeance (SMT V). I can confirm that it is very good and a major upgrade over the base SMT V, although I do worry that the full ‘save anywhere’ implementation is too forgiving and thus cuts down too much on the tension level.

There are two other issues. The first is a huge difficulty spike at the end right before the final set of battles, which means that the correct play is indeed a version of ‘save everything that will still be useful later, and spend it on a big splurge to build a top level party for the last few battles.’ And, well, sure, par for the course, but I wish we found a way to not make this always correct.

The other issue is that I am not thrilled with your ending options, for reasons that are logically highly related to people not thinking well about AI alignment and how to choose a good future in real life. There are obvious reasons the options each seem doomed, so your total freedom is illusory. The ‘secret fourth’ option is the one I wanted, and I was willing to fight extra for it, but one of the required quests seemed bugged and wouldn’t start (I generally avoid spoilers and guides, but if I’m spending 100+ hours on one of these games I want to know what triggers the endings). Still, the options are always interesting to consider in SMT games.

A weird note is that the items I got for the preorder radically change how you approach the early part of the game, because they give you a free minor heal and minor Almighty attack all, which don’t cost SP. That makes it easy to go for a Magic-based build without worrying about Macca early.

The question now is, do I go for Canon of Vengeance and/or the other endings, and if so do I do it keeping my levels or reset. Not sure yet. I presume it’s worth doing Vengeance once.

Metaphor: ReFantazio looks like the next excellent Atlus Persona-style game, although I plan on waiting for price drops to play it since I’m not done with SMT V and haven’t gotten to Episode Aiges yet and my queue is large and also I expect to get into Slay the Spire 2 within a few months.

Magic’s Commander format bans Nadu, Winged Wisdom, which seems necessary and everyone saw coming and where the arguments are highly overdetermined, but then it also bans Dockside Extortionist, Jeweled Lotus and Mana Crypt. The argument they make is that with so many good midrange snowball cards it is too easy for the player with fast mana to take over and overwhelm the table, and they don’t want this to happen too often so Sol Ring is fine because it is special but there can’t be too many different ways to get there.

Many were unhappy with the decision to ban these fast mana format staples.

Sam Black emphasizes that this change is destabilizing, after several years of stable decisions, hurting players who invested deeply into their decks and cards. He doesn’t agree with the philosophy of the changes, but does note that the logic here could make sense from a certain casual perspective to help the format meet its design goals. And he thinks cEDH will suffer most, but urges everyone to implement and stick to whatever decisions the Rules Committee makes.

Brian Kibler calls Crypt and Lotus Rule 0 issues, you can talk to your group about whether to allow such fast mana, but can understand Dockside and is like most of us happy for Nadu to bite the dust.

Zac Hill points out that if you ban some of the mana acceleration, this could decrease or increase the amount of snowball runaway games, depending on what it does to the variance of which players get how fast a start. Reid Duke points out that something can be cool when it happens rarely enough but miserable when (as in Golden Goose in Oko) it happens too often.

Samstod notes the change is terrible at the product level, wiping out a lot of value, Kai Budde fires back that it’s about time someone wiped out that value.

Kai Budde: Hardly the problem of the CRC. that’s wotc printing crazy good chase mythics to milk players. and then that starts the powercreep as they have to top these to sell the next cards etc. can make the same argument for modern-nadu. people spent money, keep it legal. no, thanks.

lotus/crypt/dockside are format breaking. argueing anything else after 30 years of these cards being too powerful in every format is just ridiculous. now why sol ring and maybe some others survived is an entirely different question, i’m with @bmkibler there.

Jaxon: I have yet to hear of a deck that wouldn’t be better for including Dockside, Crypt, and Lotus. That’s textbook ban-worthy.

The RC then offered a document answering various questions and objections. Glenn Jones has some thoughts on the document.

So far, so normal. All very reasonable debates. There’s a constant tension between ‘don’t destroy card market value or upset the players and their current choices’ and ‘do what is long term healthy for the format.’ I have no idea if banning Lotus and Crypt was net good or not, but it’s certainly a defensible position.

Alas, things then turned rather ugly.

Commander Rules Committee: As a result of the threats last week against RC members, it has become impossible for us to continue operating as an independent entity. Given that, we have asked WotC to assume responsibility for Commander and they will be making decisions and announcements going forward.

We are sad about the end of this era, and hopeful for the future; WotC has given strong assurances they do not want to change the vision of the format. Committee members have been invited to contribute as individual advisors to the new management framework.

The RC would like to express our gratitude to all the CAG members who have contributed their wisdom and perspective over the years. Finally, we want to thank all the players who have made this game so successful. We look forward to interacting as members of the community.

Please, be excellent to each other.

LSV: It seemed pretty clear to me that having people outside the building controlling the banlist for WotC’s most popular format was untenable, but it’s pretty grim how this all went down. The bottom 10% of any large group is often horrible, and this is a perfect example.

Gavin Verhey: The RC and CAG are incredible people, devoted to a format we love. They’ve set a great example. Though we at Wizards are now managing Commander, we will be working with community members, like the RC, on future decisions. It’s critical to us Commander remains community-focused.

Here is Wizards official announcement of the takeover.

This was inevitable in some form. Wizards had essentially ‘taken over’ Commander already, in the sense that they design cards now primarily with Commander in mind. Yes, the RC had the power to ban individual cards. But the original vision of Commander, that it should take what happened to be around and let us do fun things with those cards and letting weirdness flags fly and unexpected things happen, except banning what happened to be obnoxious? That vision was already mostly dead. The RC couldn’t exactly go around banning everything designed ‘for Commander.’ Eventually, Wizards was going to fully take control, one way or another, for better and for worse.

It’s still pretty terrible the way it went down. The Magic community should not have to deal with death threats when making card banning decisions. Nor should those decisions be at least somewhat rewarded, with the targets then giving up their positions. But what choice was there?

Contra LSV, I do feel shame for what happened, despite having absolutely no connection to any of the particular events and having basically not played for years. It is a stain upon the entire community. If someone brings dishonor on your house, ‘I had nothing to do with it’ obviously matters but it does not get you fully off the hook. It was your house.

Alas, this isn’t new. Zac Hill and Worth Wollpert got serious threats back in the day. I am fortunate that I never had to deal with anything like this.

Moving forward, what should be done with Commander?

If I was Wizards, I would be sure not to move too quickly. One needs to take the time to get it right, and also to not make it look like they’ve been lying in wait for the RC to get the message and finally hand things off, or feel like these threats are being rewarded.

But what about the proposal being floated, at least in principle?

WotC: Here’s the idea: There are four power brackets, and every Commander deck can be placed in one of those brackets by examining the cards and combinations in your deck and comparing them to lists we’ll need community help to create. You can imagine bracket one is the baseline of an average preconstructed deck or below and bracket four is high power. For the lower tiers, we may lean on a mixture of cards and a description of how the deck functions, and the higher tiers are likely defined by more explicit lists of cards.

For example, you could imagine bracket one has cards that easily can go in any deck, like Swords to Plowshares, Grave Titan, and Cultivate, whereas bracket four would have cards like Vampiric Tutor, Armageddon, and Grim Monolith, cards that make games too much more consistent, lopsided, or fast than the average deck can engage with.

In this system, your deck would be defined by its highest-bracket card or cards. This makes it clear what cards go where and what kinds of cards you can expect people to be playing. For example, if Ancient Tomb is a bracket-four card, your deck would generally be considered a four. But if it’s part of a Tomb-themed deck, the conversation may be “My deck is a four with Ancient Tomb but a two without it. Is that okay with everyone?”

This is at least kind of splitting Commander into four formats as a formalized Rule 0.

It is also a weird set of examples, and a strange format, where a card like Armageddon can be in the highest tier alongside the fast mana and tutors. I’d be curious to see what some 2s and 3s are supposed to be. And we’ll need to figure out what to do about cards like Sol Ring and other automatic-include cards especially mana sources.

I do worry a bit that this could cause a rush to buy ‘worse’ cards that get lower tier values, and that could result in a situation where it costs more to build a deck at a lower tier and those without the resources have to have awkward conversations.

On reflection I do like that this is a threshold tier system, rather than a points system. A points system (where each card has a point total, and your deck can only combine to X points, usually ~10) is cool and interesting, but complicated, hard to measure over 100 card singleton decks and not compatible with the idea of multiple thresholds. You can mostly only pick one number and go with it.

Brian Kowal takes the opposite position, thinks a points-based system would be cool for the minority who wants to do that. I worry this would obligate others too much, and wouldn’t be as fully optional as we’d hope.

This also should catch everyone’s eye:

We will also be evaluating the current banned card list alongside both the Commander Rules Committee and the community. We will not ban additional cards as part of this evaluation. While discussion of the banned list started this, immediate changes to the list are not our priority.

I would be extremely reluctant to unban specifically Crypt or Lotus. I don’t have a strong opinion on whether those bans were net good, but once they happen the calculus shifts dramatically, and you absolutely do not want to reward what happened by giving those issuing death threats what they wanted.

That said, there are a bunch of other banned cards in Commander that can almost certainly be safely unbanned, and there is value in minimizing what is on the list. Then, if a year or two from now we decide that more fast mana would be healthy for the format again, or would be healthy inside tier 4 or what not, we can revisit those two in particular.

What should be the conventions around the clock in MTGO? Matt Costa calls out another player for making plays with the sole intention of trying to run out Matt’s clock. Most reactions were that the clock is part of the game, and playing for a clock win is fine. To me, the question is, where should the line be? Hopefully we can all agree that it is on you to finish the match on time, your opponent is under no obligation to help you out. But also it is not okay to take game actions whose only goal is to get the opponent to waste time, and certainly not okay to abuse the system to force you to make more meaningless clicks. Costa here makes clear he would draw the line far more aggressively than I would, to me anything that is trying to actually help win the game is fine.

In other news, gaming overall was way up for young men as of 2022:

Paul Graham: The amount of time young men spent gaming was not exactly low in 2019. Usually when you see dramatic growth it’s from a low starting point, but this is dramatic growth from a high starting point.

That’s actually quite a lot. I don’t get to play two hours of games a day. This going up for 2022 from 2021 suggests this is not merely a temporary pandemic effect.

For those who did not realize, game matching algorithms often no longer optimize ‘fair’ matchups, and instead follow patterns designed to preserve engagement (example patent here). I’ve had this become obvious in some cases where it greatly changed the incentives, and when that happened it killed the whole experience. So to all you designers out there, be careful with this.

I love this proposal and would watch a lot more baseball if they did it: MLB considering requiring starting pitchers to go at least 6 innings, unless they either are injured enough to go on the injured reserve, throw 100 pitches or give up 4 earned runs. This would force pitchers to rely on command over power, which explains some of why pitchers are so often injured now.

I would go farther. Let’s implement the ‘double hook’ or ‘double switch DH,’ which they are indeed considering. In that version, when you pull your starter, you lose the DH, period. So starting pitchers never bat, but relievers might need to do so. I think this is a neat compromise that is clean, easy to explain, provides good incentives and also makes the game a lot more interesting.

I’ll also note that the betting odds on the Mets have been absurdly disrespectful for a while now, no matter how this miracle run ends. I get that all your models say we shouldn’t be that good, but how many months of winning does it take? Of course baseball is sufficiently random that we will never know who was right on this.

Meanwhile the various fuckery with sports recordings in TV apps really gets you. They know you feel the need to see everything, so they make you buy various different apps to get it, but also they fail to make deals when they need to (e.g. YouTube TV losing SNY) and then that forced me onto Hulu, whose app sucks and also cut off the end of multiple key games.

I wish I could confidently say Hulu’s app has failed me for the last time. Its rate of ‘reset to beginning of recording when you ask to resume, for no reason’ is something like 40%. It can’t remember your place watching episodes of a show if you’re watching reruns in order, that’s too hard for it. If a copy of a program aired recently its ads could become partly unskippable. The organization of content is insane.

All of that I was working past, until the above mentioned cutoffs of game endings, including the game the Mets clinched their wildcard birth, and then the finishes of multiple top college football games. Unfortunately, there are zero other options for getting SNY, which shows the Mets games, but now we’re in the playoffs so it’s back to Youtube TV, which has other problems but they’re mostly less awful, together with like six other apps.

Paul Williams: Lina Khan DO NOT read this.

Can we please have a monopoly in TV streaming? Some of us are just trying to watch the game out here, why does my TV have 26 apps.

James Harvey: I don’t see what’s so confusing about this. I pay for MLB and I pay for ESPN, so if I want to watch an MLB game on ESPN I naturally go to the YouTube TV app.

There’s starting to be the inkling of ‘you choose the primary app and then you add to it with subscriptions for other apps content’ but this cannot come fast enough, and right now it seems to come with advertisements or other limitations – imposing ads on us in this day and age, when we’re paying and not in exchange for a clear discount, is psycho behavior, I don’t get it.

The idea that in April 2025 I might have to give Hulu its money again is so cringe. Please, YouTube, work this out, paying an extra subscription HBO-style would be fine, or we can have SNY offer a standalone app.

In this case an entrepreneur, asking the right question. We’ve done this before but I find it worthwhile to revisit periodically. I organized responses by central answer.

Paul Graham: Is there a reliable source of restaurant ratings, like Zagat’s used to be?

Roon: Beli.

Alex Reichenbach: I’d highly recommend Beli, especially if you end up in New York. They use head to head ELO scoring that prevents rating inflation.

Silvia Tower: Beli App! That way you follow people you know and see how they rate restaurants. No stars, it’s a forced ranking system. Their algorithm will also make personalized recommendations.

StripMallGuy: Really rely on Yelp. I find that if a restaurant is three stars or less, it’s just not going to be good and 4 1/2 stars means very high chance will be great. We use it a lot for our underwriting of strip malls during purchases, and it’s been really helpful.

Nikita Bier: The one tip for Yelp I have that is tangentially related: if an establishment has >4 stars and their profile says “unclaimed,” it means 6 stars.

Babak Hosseini: Google Maps. But don’t read the 5-star ratings.

1. Select a restaurant above 4.6 avg rating

2. Then navigate to the 1-star ratings

If most people complain about stuff you don’t care, you most likely have a pretty good match.

Grant: Google Maps 4.9 and above is a no. Usually means bad food with over friendly owner or strong arming reviews. 4.6 – 4.8: best restaurants 4.4 – 4.5: good restaurants 4.3: ok 4.2 and below: avoid.

Peter Farbor: Google Maps, 500+ reviews, 4.4+

How to check if the restaurant didn’t gamify reviews?

1. There should be a very small number of 1-3⭐️ reviews

2. There should be at least 10-20% of 4⭐️ reviews

Eleanor Berger: Google Maps, actually. I don’t think anything else seriously competes with it.

Trevor Blackwell: Michelin 1-starred restaurants are usually good for a fancy dinner. 2 and 3-starred are good if you’re dedicating an entire evening to the meal. I don’t know where to find good casual restaurants.

Kimbal Musk: Use OpenTable for reviews by regulars. Use Google for reviews by tourists. Both perspectives are solid for guidance.

Hank: Eater is my go-to now for restaurant reviews in cities.

Ron Williams: Eater’s “essential” lists for each city is pretty reliable and varied by cost. So google Eater essential San Francisco for example.

Jonathan Arthur: Use the EconEats app or whatever they call it in ChatGPT if you are looking for good but not fancy.

Dan Barker: ‘The fork’ is good in continental europe. Uk/US = google maps, and treat 4.0 (or lower) as 0/10 and 5.0 as 10/10.

Ruslan R. Fazlyev: Foursquare: too small for most marketers to care about, but has loyal community. Any place above 8.0 is great. 8.7 and more is exceptional. Also is truly international and works well in Peru or Albania or wherever.

The new answer here is Beli Eats. I saw this on 10/8. I am now trying it out.

I’m sad they force you to use a phone app, but that’s 2024 for you.

My preliminary report is that Beli has a lot of potential, but it feels like an Alpha. There are a bunch of obvious mistakes that need fixing, such as:

  1. Restaurant pages do not by default list their hours or menus or link Google Maps.

  2. Recommendations sometimes default to ‘the best anywhere in the world’ which is almost never what you want, and seems to not discount for distance except for a cutoff somewhere above a mile away, as opposed to applying a distance penalty.

  3. There’s no button for ‘this place doesn’t interest me, don’t list it anymore.’

  4. There’s no link to ‘bring this up on delivery apps.’

  5. There’s reservations, but no prediction of whether you can get a table without one.

  6. You an exclude cuisines (e.g. Chinese) if you don’t like them but not use other filters (e.g. ‘No cocktail bars’ which I’d totally do if I could).

  7. There’s no options to tell the algorithm about elements you like or dislike in a way that feeds into the recommendations.

Also I seem to have gotten my ‘invite’ from some random super user I’ve never heard of, and it seems to think I care what she is particular thinks, which is weird.

The actual recommendations so far have not been impressive, but also haven’t done anything too crazy.

So overall, potentially worth using, but making me itch to build something better.

If you want an invite, I’ve got four now, so if you live in NYC (so our info will overlap) and vibe with how I think about restaurants and want one, drop me a line (ideally a Twitter DM with your email, if you don’t want to put it in a comment).

Google Maps remains my default, because it gives you key info – ability to see distribution of photos so you know what the go to orders are and how they look, easy link to menu and hours, review details to understand the rating, and a rating that’s pretty accurate versus competition at least in NYC. If your Maps Fu is good enough, it’s excellent at evaluation, but mediocre at discovery.

Yelp numbers seem manipulated, bought or random here. OpenTable ratings didn’t seem to correlate to what I care about very well, but I haven’t used detailed review checking, maybe I should try that.

Also, if anyone at DoorDash or Caviar is reading this, something is very wrong with my account, it keeps refusing to accept all my credit cards. I could still pay via PayPal, but that is annoying and invalidates DashPass. I’ve been on many very frustrating chats with customer service reps who failed to fix the issue, and have tried all the obvious things and then some. Please help.

I want to play it now.

Scream Four: Once, consulting for a friend’s police procedural RPG, she needed names for five stats. I said they should all be body parts that complete the sentence “the kid’s got ___ but he’s a loose cannon” and got Heart, Guts, Brains, Muscle, and Nerve and I’ll never be that good again.

Monthly Roundup #23: October 2024 Read More »