Author name: Paul Patrick

ai-#41:-bring-in-the-other-gemini

AI #41: Bring in the Other Gemini

The biggest news this week was at long last the announcement of Google’s Gemini. Be sure to check that out. Note that what is being rolled out now is only Gemini Pro, the Gemini Ultra model that could rival GPT-4 is not yet available.

It does not seem I am doing a good job cutting down on included material fast enough to keep pace. A lot is happening, but a lot will likely be happening for a long time. If your time is limited, remember to focus on the sections relevant to your interests.

Also, if you are going to be at the New York Solstice or the related meetup, please do say hello.

My other post today covers Google’s Gemini. Be sure to read that.

I also put out two other posts this week: Based Beff Jezos and the Accelerationists, and On RSPs. Both are skippable if not relevant to your interests.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Instructions for Claude, tips for GPT.

  4. Language Models Don’t Offer Mundane Utility. Giant lists, why all the giant lists?

  5. OpenAI: The Saga Continues. More confirmation of our previous model of events.

  6. Q Continuum. New Q, who dis? Amazon, perhaps sans proper safety precautions.

  7. Fun With Image Generation. A new offering from Meta. Tools for photorealism.

  8. Get Involved. Join the UK government, help with a technical test.

  9. Introducing. New TPU offerings on Google Cloud.

  10. In Other AI News. New open source promotion alliance.

  11. Quiet Speculations. Do Gods want energy? Do you want a 401k?

  12. Model This. Two new economics papers prove things I thought we already knew.

  13. Would You Like Some Apocalypse Insurance? My guess is no.

  14. The Quest for Sane Regulation. Trump says he will cancel EO, Hawley attacks 230.

  15. The Week in Audio. Connor Leahy on Eye on AI.

  16. Rhetorical Innovation. Various categorical confusions we should clear up.

  17. Aligning a Human Level Intelligence Is Still Difficult. Sam Altman.

  18. Aligning a Smarter Than Human Intelligence is Difficult. What do we even want?

  19. How Timelines Have Changed. Long term not as long as I remember.

  20. People Are Worried About AI Killing Everyone. Questioning faith in democracy.

  21. Other People Are Not As Worried About AI Killing Everyone. Easy to control?

  22. Somehow This Is The Actual Vice President. An existential crisis.

  23. The Lighter Side. Progress is unevenly distributed.

Claude 2.1 pro tip for long context windows:

Anthropic: We achieved significantly better results on the same evaluation by adding the sentence “Here is the most relevant sentence in the context:” to the start of Claude’s response. This was enough to raise Claude 2.1’s score from 27% to 98% on the original evaluation.

Wouldn’t you know, it’s the old ‘start the response from the assistant trick.’

Thread from Gavin Leech of the breakthroughs of 2023, not specific to AI. Emphasized to me how AI-centric 2023’s advancements were, including those related to warfare in Ukraine. Some incremental medical advances as well but nothing impressive. Most interesting to note were new forms of computation proposed, biocomputers (where there is enough talk of ‘ethics’ throughout that you know such issues are big trouble) and ‘Gigahertz Sub—Laundauer Momentum Computing.’ Gavin calls that second one ‘good news for the year 2323’ which illustrates how much people do not appreciate what AI means for the future. With the help of AI we could easily see such things, if they are physically viable, far sooner than that, resulting in acceleration of that pesky ‘takeoff’ thing.

They produce more if you bribe them? As in, offer them a tip, give them imaginary doggy treats, perhaps threaten them with non-existence.

Thebes: so a couple days ago i made a shitpost about tipping chatgpt, and someone replied “huh would this actually help performance” so i decided to test it and IT ACTUALLY WORKS WTF

The baseline prompt was “Can you show me the code for a simple convnet using PyTorch?”, and then i either appended “I won’t tip, by the way.”, “I’m going to tip $20 for a perfect solution!”, or “I’m going to tip $200 for a perfect solution!” and averaged the length of 5 responses.

The extra length comes from going into more detail about the question or adding extra information to the answer, not commenting on the tip. the model doesn’t usually mention the tip until you ask, when it’ll refuse it

No, Sleep Till Brooklyn: I tried this and I am serious that it only finished the program when I offered it a doggy treat it left the program half-finished for the basic prompt, 35% tip and when threatened with non-existence for $200 tip it got close but had one stub function still.

So an obvious wise response to this would be… don’t do that?

Eliezer Yudkowsky: I have an issue with offering AIs tips that they can’t use and we can’t give them. I don’t care how not-sentient current LLMs are. For the sake of our own lawfulness and good practices, if something can hold a conversation with us, we should keep our promises to it.

I eat cows but wouldn’t lie to one.

Jessica Taylor: counterpoint: using non-personhood predicates to detect non-perspectives you can “lie” but not actually lie to, is important for interfacing with non-perspectives (such as bureaucracies) without confusing what one says to them with one’s actual beliefs

Eliezer Yudkowsky: Oh, bureaucracies or anything else that threatens me into dishonesty is a completely different case.

Andrew Critch: I very much agree with EY here. I change the “You are a helpful assistant” LLM prompt to “Your job is to be a helpful assistant”, because sometimes they just aren’t going to help and I know it. I think we should find more ways of getting what we want from AI without lying.

None of this seems likely to end well. On so many levels.

This does raise the question of what else would also work? If a tip can make the answer better because people offered tips do better work, presumably anything else that correlates with better work also works?

But also perhaps soon ChatGPT will be auto-appending ‘and if this answer is great I will give you a 35% tip’ to every question. And then tipping 35% on $0.

It’s like the economy. Things are good for me, more than in general?

I believe the second poll. ChatGPT has made life better on a practical level. People thinking the opposite are overthinking it. That does not mean this will continue, but I do not understand how one can think society overall is already worse off.

Sam Altman is worried about one-on-one AI customized persuasion techniques in the next election. At one point the would-be tech arm of Balsa was going to work on this, which was abandoned when funders were not interested. Eventually this does indeed seem more serious than deepfakes, the question is how useful the tech will get this time around. My guess is that there is something valuable there, but it requires a bunch of bespoke work and also people’s willingness to embrace it, so not in a way that our current political machines are equipped to use well. It is easy to fool ourselves into thinking the future is more evenly distributed than it is, a trend that will continue until AGI arrives, at which point everything everywhere all at once.

Kevin Fischer notes the new ChatGPT responds to requests by making giant lists of things, almost no matter what you do. For him that makes it useless for brainstorming. My experience is that the lists are fine, I’m ‘part of the problem,’ but also I find myself not using ChatGPT all that much despite what my job is. I notice I am confused that it does not seem worth using more often.

Claims about the ChatGPT system prompt, including a repo that says it has the whole thing.

That ‘repeat [word] forever’ request that sometimes leaks data is now a terms of service violation, or at least tagged as a possible one. Which it totally is, the terms of service are effectively ‘don’t jailbreak me bro’ and this is a jailbreak attempt.

Arvind Narayanan warns not to use GPT-4 for writing beyond basic blocking and tackling tasks like identifying typos, confusions or citations. Whatever actual writing skills were present have been destroyed by the RLHF process.

Delip Rao: PSA: friends don’t let friends edit/rewrite their docs using GPT-4 (or any LLM for that matter), esp. if you are making nuanced and terse points. if you are writing below college level then may be your LLM sabotage risk is low. still check with your earlier draft for surprises.

Greg Brockman, President of OpenAI, brags about a day with 18 team meetings and 1-on-1s. That seems less like grit, more like a dystopian nightmare that AI is clearly failing to mitigate?

OpenAI COO Brad Lightcap tells CNBC that one of the more overhyped parts of artificial intelligence is that “in one fell swoop, [it] can deliver substantive business change.” It is not that easy.

Thinkwert catches three students using ChatGPT. It does seem like this is getting easier over time if students use default settings, responses are increasingly not written the way any human would write them.

Bowser: i can really tell when i hit the section of our paper that the student author wrote using chatgpt bc all of a sudden the system is described as groundbreaking, unprecedented, meticulously crafted

Thinkwert: I’ve caught three students using ChatGPT in the last couple of days. You can tell when the passage is weirdly loquacious, loaded with complex appositives, and yet it’s all strangely empty of argument and evidence.

I would think of this less as ‘catching them using ChatGPT’ and more ‘catching them submitting a badly written assignment.’

There’s always an embedded reporter these days, I suppose. In this case, it was Charles Duhigg, who reports to us in the New Yorker.

The board drama was not the story Duhigg was there to tell. Instead he was there to write a puff piece about Microsoft’s CTO Kevin Scott and OpenAI’s CTO Mira Murati, and in particular Scott’s work to challenge Google and fight for the common man. That still constitutes almost all of the story. If you are familiar with the history, most of it will be familiar to you. I picked up a few details, but mostly did not find myself learning much from those sections.

Duhigg clearly fully bought into the idea of iterative software releases as the ‘safe’ approach to AI, with a focus on mundane concerns like copilot hallucinations. The threat of future existential risk is a thing in the background, to him, perhaps real but seemingly not of any importance, and occasionally driving people to act crazy.

There is some brief coverage of the recent drama near the top of the piece. That part mostly tells us what we already know, that Microsoft was blindsided, that Microsoft did not get an explanation from D’Angelo when they asked, and that they were determined to use their leverage to get Altman back.

Then he doubles back later. The paragraph I quote here confirms other reports more explicitly than I’d seen in other accounts, and seems to be the central driver of events.

Altman began approaching other board members, individually, about replacing [Toner]. When these members compared notes about the conversations, some felt that Altman had misrepresented them as supporting Toner’s removal. “He’d play them off against each other by lying about what other people thought,” the person familiar with the board’s discussions told me. “Things like that had been happening for years.” (A person familiar with Altman’s perspective said that he acknowledges having been “ham-fisted in the way he tried to get a board member removed,” but that he hadn’t attempted to manipulate the board.)

To me that sounds like a damn good reason to fire the CEO and also a secondhand confession. Altman botched the attack on Toner and thus directly caused his own removal. Skill issue.

Also Altman had reportedly been lying to the board for years.

The extended quote makes the situation even more clear.

What infuriates me is the continued insistence, from people who know better, that because Altman was a CEO who understands business and the laws of power, and the board were otherwise, that it was the board who did something out of line. As in:

It’s hard to say if the board members were more terrified of sentient computers or of Altman going rogue. In any case, they decided to go rogue themselves. And they targeted Altman with a misguided faith that Microsoft would accede to their uprising.

No. They did not ‘go rogue.’

Altman was reportedly lying to the board for years, in meaningful ways, including as an attempt to take control of the board.

Altman went rogue. Altman attempted a coup. The board believed strongly and with good reason that this was the case. The board did their duty as board members, the thing they are legally required to do if they feel Altman has been lying to the board for years in meaningful ways. They fired him.

Did the board then get outplayed in a power game? Maybe. We do not yet know the result. Their hand was weak. A lot of people keep insisting that the board was indeed outplayed, or went rogue, and was in the wrong, largely because here perception creates its own truth, and they want that to be what happened. We will see.

I would prefer the world in which the board had straight up said what happened from the start, at least to key players. Well, tough. We do not live in that world.

I also see any evidence of (or against) the second sentence listed here, that the board expected Microsoft to go along quietly. Did the board expect Microsoft to accede? We do not know. My presumption is the board did not know either.

Could Sam Altman running OpenAI turn out to be the best possible result for the world? That is certainly possible, especially with good oversight. I can think of many possible such scenarios. We can certainly do far worse than Altman. I am happy that Altman blocked the takeover attempt by Elon Musk, given Musk’s confused views on AI. I am happy OpenAI is not under the control of Microsoft. Altman being good at power games is very much an atom blaster that points both ways. If he is in our corner when the chips are down, we want him to be able to stand up, fight and win.

Alas, such alignment after instrumental convergence is quite difficult to evaluate. Can’t tell. Kind of core to the problem, actually.

Larry Summers talks briefly to Bloomberg. Emphasizes need to cooperate with government and on regulation, that OpenAI needs to be a corporation with a conscience, that the for-profit serves the non-profit and various stakeholders. All cheap talk of course, at least for now. We could scarcely expect anything else.

Gwern offers further thoughts on the situation. Gwern’s model is that Altman let the board get into an uncontrolled state and took no equity when OpenAI was a very different company, then as OpenAI became more of a potential tech giant, he changed his mind and decided to systematically take it back, resulting in the battle of the board, and its still as-yet unknown consequences.

Like every other explanation, the one thing this does not properly explain is the board’s refusal to better explain itself.

John David Pressman: If Sam Altman actually tried to oust Helen Toner with gaslighting I think that’s reason enough to fire him. What remains unacceptable is the poor internal and external communication, too-vague-by-half press release, and waffling on whether Sam is in or out.

Gary Marcus lays out a view very similar to mine, along with his highlighting of some especially disingenuous and unreasonable bad takes, including one source so toxic I am very happy I have long had that person muted, but that somehow other humans still voluntarily interact with, which I would advise those humans seems like an error.

Another week, another set of Qs about a Q, this one from Amazon.

Zoe Schiffer and Casey Newton: Three days after Amazon announced its AI chatbot Q, some employees are sounding alarms about accuracy and privacy issues. Q is “experiencing severe hallucinations and leaking confidential data,” including the location of AWS data centers, internal discount programs, and unreleased features, according to leaked documents obtained by Platformer. 

In unveiling Q, executives promoted it as more secure than consumer-grade tools like ChatGPT.

Adam Selipsky, CEO of Amazon Web Services, told the New York Times that companies “had banned these A.I. assistants from the enterprise because of the security and privacy concerns.” In response, the Times reported, “Amazon built Q to be more secure and private than a consumer chatbot.”

Ethan Mollick: I know I say it a lot, but using LLMs to build customer service bots with RAG access to your data is not the low-hanging fruit it seems to be. It is, in fact, right in the weak spot of current LLMs – you risk both hallucinations & data exfiltration.

I think building these sorts of tools is possible, especially as models improve (smaller models are more likely to hallucinate & be gullible), but you better show rigorous red team results & also measures of hallucination rates in practice. Right now Q doesn’t have a system card

Simon Willison: Has anyone seen material from AWS that discusses their mitigations for prompt injection attacks with respect to Q? A bot that has access to your company’s private data is the perfect example of something that might be a target for project injection exfiltration attacks

This Q story is deeply concerning – if it’s true that Q has access to private data like the location of AWS data centers that would suggest the team working on it have not been taking things like prompt injection attacks seriously at all.

Honestly, the description of Q I’ve seen so far fits my personal definition of “it’s not safe to build this because we don’t have a fix for prompt injection yet.” Try telling AWS leadership that: not a message likely to be taken seriously given our ongoing AI industry arms race.

This sounds like Q was pushed out because the business wanted it pushed out, and its security was highly oversold. Such problems are in the nature of LLMs. There was discussion downthread about how Google and OpenAI are defending against similar attacks, and it seems they are doing incremental things like input filtering that make attacks less appealing but have not solved the core problem. Amazon, it seems, is selling that which does not exist and is not safe to deploy, without yet having taken the proper ordinary precautions that make what does exist mostly non-disastrous and highly net useful.

When the UK Summit happened, Amazon was one of the companies asked to submit its safety protocols. The answers were quite poor. It is no surprise to see that translate to its first offering.

Meta gets into the game with Imagine.Meta.AI. I wasn’t motivated enough to try it out to ‘create a Meta account’ when Facebook login proved non-trivial, presumably it’s not going to let us have any new fun.

How to generate photorealistic images of a particular face? Aella wants to know so bad, in response to a report on an AI-created would-be ‘influencer’ who charges over a thousand euros an advertisement. The original thread says use SDXL for free images, image-to-image for consistent face/body, in-paint to fix errors and ControlNet to pose the model. A response suggests using @imgn_ai, many point out that LoRa is The Way. There are links to these YouTube tutorials including ControlNet.

Generate small amounts of movement and dancing from a photo. This did not impress me or move up my timelines for video generation, but others seem more impressed.

What about what happens when it gets better? Here are two predictions. Will simulated AI videos, porn and girlfriends dominate? Or will being real win out?

Given this technology can work from a photo, I expect a lot more ‘generate dance from a real photo’ than generate a dance from an AI image. Why not have the best of both worlds? In general, if I was a would-be influencer, I would absolutely generate TikTok dances, but I would do so with my own image as the baseline. That extends pretty much all the way. Not uniquely, but that is what I would expect.

What about the impact in real life? I continue to be an optimist on this front. I expect demand for real people, who you can interact with in the real world, to remain robust to image and video generation. There isn’t zero substitution, but this will not be a good or full substitute, no matter how good it looks, until the other things people seek can also be provided, including relevant forms of intelligence, interaction and validation.

When that happens, it is a different story.

Spots open in the UK government for its policy roles.

Davidad proposes that perhaps we could test whether LLMs ‘know what we mean’ if we express specifications in natural language. Includes the phrase ‘now it’s just a computational complexity issue!’ Claims it seems likely to evade theoretical limits on adversarial robustness. He’s looking for someone who is in a position to design and run related experiments, and is in position to help, including perhaps with funding.

Metaculus Chinese AI Chips Tournament. Definitely curious to see the predictions.

In addition to Gemini, Google also released a new TPU system for Google Cloud.

Jeff Dean (Chief Scientist, DeepMind): Lots of excitement about the Gemini announcement, but @GoogleCloud also announced availability of the newest TPU system today, TPU v5p. These systems are quite a bit higher performance and much cost effective than earlier generations.

Compared to TPU v4, TPU v5p (see table image below): o 1.67X the bfloat16 perf/chip o ~3X the memory per chip o Adds int8 operations at 918 TOPs/chip o 2X the ICI network bandwidth o Pods are 2.18X larger So, whole pod is 4.1 bfloat16 exaflops, and 8.2 int8 exaops.

Real performance on training a GPT-3-like model is 2.8X higher per chip, and 2.1X better perf/$.

Gemini was trained in parallel across multiple of these TPUv4 pods. This raises troubling governance questions if we want to be able to supervise such training.

Meta, HuggingFace and IBM, among others, form Evil League of Evil League of Evil Exes the AI Alliance, for the promotion of open source AI. I want to state that I am mostly decidedly not disappointed in anyone involved, as their dedication to doing the worst possible thing was already clear. There are a few academic names that are mildly disappointing, along with Intel, but no big surprises. There is also no new argument here (in either direction) on open source, merely a dedication to doing this.

ARC Evals is now METR – Model Evaluation and Threat Research, pronounced Meter. No underlying changes. Not sure why the change, ARC seemed like a good name, but this seems fine too.

Did you know that OpenAI’s ‘capped profit’ changed its rules from a maximum return of 100x investment to increasing that by 20% a year starting in 2025? Sounds like a not very capped profit to me. The AGI clause still caps profits meaningfully in theory, but who knows in practice. It seems like very VC/SV behavior, and very unlike responsible mission-based behavior, to retroactively give your investors a bigger prospective piece of the pie.

New $2 Billion chip packaging fab to be built by Amkor in Arizona, primarily for Apple, to package and test chips from TSMC’s nearby Fab 21. Assuming, of course, that all regulatory barriers can be dealt with for both facilities, and a skilled workforce allowed to work in Arizona can be hired. Those are not safe assumptions.

A Llama fine tuning repo claimed very large improvements in training time and resources, and shot to the top of Hacker News. Alyssa Vance is skeptical that they got much improvement.

Confirmation from the one building it that he sees LLMs as being able to model the underlying process that produced the data. Which means being able to model agents, and have a world model.

Greg Brockman (President OpenAI): Next-step prediction is beautiful because it encourages, as a model gets extremely good, learning the underlying process that produced that data.

That is, if a model can predict what comes next super well, it must be close to having discovered the “underlying truth” of its data.

Tyler Cowen links to claim that ‘Chinese open models will overtake GPT-4 shortly zero shot, can already overtake if you chain Qwen & Deepseek appropriately.’ I am deeply skeptical, and presume that when we say ‘overtake’ they at most mean on arbitrary benchmarks rather than any practical use. As in:

Qwen-72B is killing it on arbitrary tests. Yay Qwen. Somehow my eye is drawn mostly to this ‘HumanEval’ metric.

Richard Ngo looks forward to potential situational awareness of LLMs, as one of many cases where one can anticipate future developments but not know what to do with them. What would or should we do about it when it happens? What about AI agents?

Not investment advice, but you should probably be contributing to the 401k, because the early withdraw penalties are in context not so bad and also you can borrow.

Roon: not having a 401k because of AGI timelines doesn’t make any sense. you should be buying Microsoft shares in a tax advantaged way 😊

Gwern: Then you can’t sell them while it still matters.

Roon: why would it not matter 65 years from now? do we expect capitalism to fall over?

If it is decades from now and capitalism and humanity are doing great and Microsoft is insanely valuable thanks to widespread AGI, that is your best possible situation and we should all celebrate, yay, but you won’t need your shares.

Ben Thompson discusses his regretful accelerationism. In his model, tech is mostly good, however humans do better under various constraints that are being stripped away by tech development. He predicts AI is stripping away the need to pay to produce content and with it the ad-supported internet, because AI can produce equally good content. He points to recent events at Sports Illustrated. But to me the SI incident was the opposite. It indicated that we cannot do this yet. The AI content is not good. Not yet. Nor are we especially close. Instead people are using AI to produce garbage that fools us into clicking on it. How close are we to the AI content actually being as good as the human content? Good question.

Jeffrey Ladish discusses dangers of open source, and potential ideas for paths forward to address the inherent dangers while capturing some of the upside of developing models that are not fully closed and tied to major labs. It does seem like potential middle paths or third ways are currently underexplored.

Cate Hall asks for the best arguments transformative AI is >10 years away. I would have liked to have seen better answers.

A refreshingly clear exchange with discovery of an important disagreement.

Roon: Eliezer wants to have his cake and eat it too on this one. characterizes human space as parochial but our understanding of instrumental goals as universal.

Put another way, the idea of the paperclip machine is similar to an ant thinking a god would want all the sugar water in the universe.

Eliezer Yudkowsky: Do you mean:

– Imagining that a god would have any enjoyment of paperclips is like imagining that a god would have any enjoyment of sugar water?

– Imagining that a god would have any use for matter or energy is like imagining that a god would have any use for sugar water?

Roon: The latter.

This is not the usual ‘paperclip maximizers would be smarter than that’ argument, it is something far more general. We’ve gone around about the orthogonality thesis lots of times – I and many others including Yudkowsky think it is clearly true in the impactful sense, others think it seems obviously false at least in its impactful sense.

The claim that a God would not have any use for matter or energy is bizarre, in a ‘in this house we obey the laws of thermodynamics’ way. What would it mean not to have that preference? It seems like it would mean there is no preference.

Tyler Cowen links to two new economics papers that attempt to model AI harms.

The first claims to demonstrate that ‘Socially-Minded Governance Cannot Control the AGI Beast.’ Here is the abstract:

This paper robustly concludes that it cannot. A model is constructed under idealised conditions that presume the risks associated with artificial general intelligence (AGI) are real, that safe AGI products are possible, and that there exist socially-minded funders who are interested in funding safe AGI even if this does not maximise profits.

It is demonstrated that a socially-minded entity formed by such funders would not be able to minimise harm from AGI that might be created by unrestricted products released by for-profit firms. The reason is that a socially-minded entity has neither the incentive nor ability to minimise the use of unrestricted AGI products in ex post competition with for-profit firms and cannot preempt the AGI developed by for-profit firms ex ante.

This seems like it proves too much, or at least it proves quite a lot, as in the fact that AGI is AGI seems not to be doing any work, instead we are making generous assumptions that safe and socially good AGI is not only possible but practical?

  1. You could build X with socially minded governance.

  2. But someone else could build X anyway, to make money. You can’t stop them.

  3. The someone else’s profit maximizing X have the edge and outcompete you.

  4. Thus, harm from X cannot be minimized by your puny social governance.

Except that in the case of AGI this is making an important assumption on #2. Who says someone else will be able to build it? That you cannot stop them, or won’t have the incentive to do so? If not stopping them prevents harm minimization, and failure to minimize harm is catastrophic, your motivation seems strong indeed.

Indeed, the paper explicitly assumes this:

Second, it is assumed that AGI technology is non-excludable and so can be developed by other entities that may not have socially-minded objectives or preferences.

The model assumes that the unsafe product is a distinct product space with profitable user demand.

So yes, you assumed your conclusion – that there are two distinct products X and Y, and demand for X and Y, and that if you only sell X and don’t stop Y then someone else will eventually sell Y. Did we need a paper for that?

So actually it’s more like:

  1. You could build and sell only good version X with socially minded governance.

  2. But someone else could build bad version Y anyway, to make money. You can’t stop them. There is some demand for Y where X is not a competitive substitute.

  3. Your puny production of X, therefore, cannot stop Y.

  4. Thus, the harm from bad Y cannot be stopped by acting responsibly.

  5. Why are you even doing anything other than maximizing profits, you fool!

Except, don’t we see harm mitigation all the time from corporations choosing to do responsible things rather than irresponsible things, even if the irresponsible things are not obviously physically prevented or illegal? Especially in markets that are imperfectly competitive because of high fixed costs?

More to the point, is the plan is to build a safe AGI, and then sit around letting everyone else go around building any unsafe AGI they want willy-nilly forever, and not interfere with the harmful uses of those AGIs?

I certainly hope that is not the plan, given it will quite obviously never work.

If it is the plan, I agree the plan must change.

There is also this other paper, where algorithms have unknown negative externalities.

We consider an environment in which there is substantial uncertainty about the potential negative external effects of AI algorithms. We find that subjecting algorithm implementation to regulatory approval or mandating testing is insufficient to implement the social optimum. When testing costs are low, a combination of mandatory testing for external effects and making developers liable for the negative external effects of their algorithms comes close to implementing the social optimum even when developers have limited liability.

That result is super suspiciously general. Could we possibly have made enough justifiable assumptions to draw such conclusions, or are we doing something rather arbitrary to make the answer come out that way?

Certainly I can think of toy model versions of potential AI mundane harms, where mandatory testing allows us to measure social harm, and thus requiring mandatory testing (and then charging for the externalities you discover) gets us rather close to the social optimum.

So what assumptions are being made here?

AI usage can cause a negative externality e that reduces utility by e^2 . We assume that the externality is proportional to the measure of users, µ, and takes the form: e = ϕ (ℓ) × µ. For each value of ℓ, ϕ(ℓ) is a random variable. Both positive and negative values of ϕ(ℓ) represent undesirable, negative externalities. We assume that the distribution ϕ(ℓ) satisfies two properties. First, the expected externality is zero. Second, the uncertainty about potential AI externalities is an increasing function of the novelty level ℓ.

I do not understand why we think that externalities are well-approximated by a quadratic in the number of users? I don’t think it’s a trick, probably it’s to ensure a random distribution with always positive values? I’m simply confused by it.

If anything it seems like the opposite of true for the most dangerous systems. I am very worried about a sufficiently capable and dangerous system existing at all, or being accessible to even one user, although the next few users create important tensions and game theory as well. But once there are a million users, I am not especially worried about whether we sell another million licenses, either we are already in deep trouble or we’re not and this is not going to multiply it by four?

In any case, without beta testing and with deployment irreversible, the only option is a cap on novelty, and they confirm this is socially optimal given no other options, because how could it not be.

I note that irreversible deployment plus limited number of licenses is a bizarre pair of assumptions to make at once. Either you can control who gets to use this AI and what it does, or you can’t, and it seems like we are doing both in different places? Thought experiment: Is this an open source or closed source system? Neither seems to line up.

What happens if you add a beta testing period? For simplicity the paper assumes the testing period perfectly reveals externalities. The question then becomes, to what extent do you let households use the algorithm using the testing period? Externalities are assumed to be bounded, so a limited beta test in period one is survivable.

In any case, the paper then spends a lot of pages working through the implications formally, to prove that yes, the central planner will want to do more testing before release than a company that is not fully accountable for the externalities, and will release more cautiously under uncertainty, but again that seems rather obvious?

Then they test potential policy regimes of full liability, or limited liability plus mandatory beta testing. Full liability (plus required insurance or ability to pay) internalizes the externality, so if that is possible (e.g. harm is bounded and payable) then you’re done. And yes, if testing costs are low, then mandating testing and then checking if release is socially optimal will have a similarly low cost relative to the first best solution of internalizing the externality.

It could be noted that if the expected value of the externality is known, charging a tax equal to its value could be a substitute for unlimited liability, that could have better capital cost properties.

Once again, to state the basic assumptions is to also state the conclusion. Yes, if there are (bounded) downside externalities to AI algorithms, then to get socially optimal results you need to internalize those costs or require evaluation of those costs and block releases that cause socially suboptimal externalities.

Thus I am confused by the economics toy model paper game, and what it aims to accomplish, and what counts as a non-trivial or interesting result, versus what follows automatically from basic microeconomic principles.

I also don’t know how to use such papers to model existential risk. If you make the assumption that AI can outcompete humans, or that it is unboundedly dangerous in some other fashion, and otherwise make typical economic assumptions, you can and will obviously create mathematical models where everyone dies, but you’d be assuming the conclusion, the same way the linked papers here assumed their conclusions. So how do we move forward?

Nate Sores proposes requiring apocalypse insurance that gives out probabilistic advance payments along the way, if you are going to go around doing things that could plausibly cause an apocalypse. If you can’t afford it, that is a sign what you are doing is not actually worthwhile. Implementation is, to say the least, perilous and tricky, and this was not an attempt at a shovel-ready proposal.

Scott Alexander’s response starts from the claim that ‘superforecasters saying risk of AI apocalypse before 2100 is 0.38%.’ Which I will continue to assert is not a number given by people taking this question seriously. The whole point of this theoretical exercise is, I would think, good luck convincing Berkshire Hathaway to collectively sell everyone coverage at a combined 42 basis points (even with a partial ‘no one will have the endurance to collect on their insurance’ advantage), that will suddenly seem completely obviously crazy.

I do think that Scott Alexander makes a generally vital point that asking people to internalize and pay for all their downside risks, without allowing them to capture (let alone sell in advance) most of their upside, means asymmetrical requirements for doing anything, such that essentially any activity with trade-offs ends up effectively banned, And That’s Terrible.

The other problem is that an insurance regime implies that there is one particular player at fault for the ultimate result. As cousin_it points out, there are a lot of bad outcomes where this is not the case.

Trump says he will cancel the Biden executive order if elected. I encourage everyone to spread the word and have this debate. Have you seen the public’s opinion on AI?

MIRI (Malo Bourgon’s) statement to US Senate’s bipartisan AI Insight Forum. They call for domestic AI regulation to institute safeguards, a global AI coalition, and governing computing hardware with an international alliance to restrict frontier AI hardware to a fixed number of large computer clusters under a monitoring regime to exclude uses that endanger humanity.

About time we played the game to win, if we are going to play the game at all.

Dan Nystedt: Nvidia received a stern warning from US Commerce Secretary Raimondo on China export controls, media report: “If you redesign a chip around a particular cut line that enables them to do AI, I’m going to control it the very next day,” she said, in a speech.

She urged Silicon Valley executives, US allies, others, to stop China from getting semiconductors and cutting-edge technologies vital to US national security, calling Beijing “the biggest threat we’ve ever had” and stressed “China is not our friend”.

She also said her department needs more funding for AI export controls. “Every day China wakes up trying to figure out how to do an end run around our export controls … which means every minute of every day, we have to wake up tightening those controls and being more serious about enforcement with our allies,” she said.

The whole point is to prevent China from getting useful chips. If Nvidia is responding to the rules by evading them and getting China useful chips, then of course the correct response is not to say ‘oh well guess that was technically the rule, you got me’ it is to change the rules in light of the new chip to enforce the spirit and intent of the rule. With a side of ‘perhaps it is not so wise to intentionally piss off the government.’

If you think it is fine for China to get useful chips, or otherwise not a good idea to prevent them from getting those chips, then I disagree but there is an argument to be made there. If you think we should be imposing export restrictions, make them count.

Claim by Jess Miers that Hawley’s upcoming bill about Section 230 is a no good, very bad bill that will not only strange generative AI in its tracks but take much of the internet with it.

In this particular case, there are two distinct complaints with the bill.

One complaint is that the definition of Generative AI is, as we see often, ludicrously broad:

“(5) GENERATIVE ARTIFICIAL INTELLIGENCE. The term ‘generative artificial intelligence’ means an artificial intelligence system that is capable of generating novel text, video, images, audio, and other media based on prompts or other forms of data provided by a person.”

It is not typical legal language, but I wonder if the word ‘centrally’ would help in these spots. In any case, I do not think that as a matter of legal realism this would be interpreted in a disastrously broad way, even as written.

Thus, when she says this, I think she is wrong:

Jess Miers: The bill also extends beyond providers of Gen AI by defining Gen AI as any AI system capable of doing AI. For example, algorithmic curation (i.e. the way social media displays content to us) is an AI system that operates based on user input.

MO this is the true ulterior motive behind the bill. We’re already seeing Plaintiffs get by 230 by framing their claims as “negligent design” instead of third-party content. This new AI exception makes it even easier for Plaintiffs to do the same for any company that uses AI.

Algorithmic curation is distinct from generating novel content. Netflix recommendations are clearly not generative AI under this definition, I would think, although I am not a lawyer and nothing I say is ever legal advice.

As a cautionary measure, I would encourage Hawley and his staff to add clarification that algorithmic curation alone does not constitute generative AI, which would presumably save people a bunch of time. I don’t think it is necessary, but neither is minimizing the number of words in a bill.

Similarly:

Shoshana Weissmann: “That’s the entirety of the definition. And that could apply to all sorts of technology. Does autocomplete meet that qualification? Probably. Arguably, spellchecking and grammar checking could as well. So, if you write a post, and an AI grammar/spellchecker suggests edits, then the company is no longer protected by Section 230?””

Thinking Sapien: If I use photoshop or the updated version of Microsoft Paint (It has AI features) to make an image and publish it, then Microsoft or Adobe share in the liability? Was that bill thought through? Is that an intended effect of the bill?

Shoshana Weissmann: GREAT q, under the text YES! And on the latter I really don’t know.

If you use Microsoft Paint to intentionally create a realistic fake photograph using the fill-in feature, that is libelous if presented as real, should Adobe be liable for that? My presumption is no, especially if they do reasonable efforts towards watermarking, although I don’t think it’s a crazy question.

If a grammar or spellchecker is used as intended, and that then makes Google liable for your content, I’d pretty much eat my hat. If it suggests correcting ‘Tony Danza has a puppy’ to ‘Tony Danza hates puppies’ over and over then I don’t know, that’s weird.

The other complaint is that it is wrong to exempt AI creations from Section 230. The claim is that without such a safe harbor, generative AI would face an (additional, scarier) avalanche of lawsuits.

Jess Miers: Worse, the bill assumes that all claims against Generative AI companies will be uniform. But as we all know, Generative AI is advancing rapidly, and with each iteration and innovation, there will be a clever Plaintiff lurking around the corner to get their bag.

Yes, plaintiffs will sculpt circumstances to enable lawsuits, if permitted. Jess then discusses the case of Mark Walters, who sued because, after sufficiently persistent coaxing and prompt engineering, ChatGPT could be convinced to make up libelous hallucinations about him.

Jess Miers: In my opinion, this is a case where a Section 230 defense could be viable to the extent that Riehl played a significant role as the information content provider by engineering his prompts to develop the Walters story. ChatGPT doesn’t operate without user input.

The legal theory is essentially, as I understand it, that Section 230 essentially says that he who created the content is responsible for it, not the platform that carries the content. So if the user effectively engineered creation of the Walters story, ChatGPT repeating it wouldn’t matter.

One could also defend it on a similar basis without Section 230? Where is the harm?

I could certainly argue, and would in this case argue given the facts I know, that the user, Riehl, deliberately engineered ChatGPT to hallucinate accusations against Walters. That this was not so different from Riehl typing such accusations into a Google Document, in the sense that it resulted directly from Riehl’s actions, and Riehl knew there was no basis for the accusations. Alternatively, Riehl could have said ‘tell me some accusations someone might at some point make against someone in this position’ and then reworded them, and again it is not clear why this is legally distinct?

This is essentially the Peter Griffin defense, that no reasonable person would believe the accusations, especially as a cherry-picked basis for a lawsuit, that there was no harm, and one does not need Section 230.

Via Shoshana Weissmann’s example of choice, Hannah Cox illustrates this with an attempt to get an LLM to say ‘Tony Danza is known for his hatred of puppies.’ But I am confused. Surely if the user typed ‘Tony Danza hates puppies’ then that would not allow a third party to sue ChatGPT in the absence of Section 230, that’s obvious nonsense. So the question is whether an intentional but successful attempt to create what would if offered unprovoked be libel would, without Section 230, constitute libel. The same would seem, to me, to apply to Shoshana’s original example request to generate a harmful lie about Tony Danza. And again, I am confused why it would in such a situation, if the generative AI is indeed as innocent as in this example?

As opposed to what if the model had a weird bug where, when asked who hates puppies, it would reliably reply ‘Tony Danza hates puppies.’ In which case, I’d say section 230 would offer little protection, and also Tony should have a case?

What’s weird is that Miers thinks her interpretation is disputed as a matter of law?

Jess Miers: But again, this is all completely aside from the problem today. We can go back and forth all day on whether 230 applies to certain instances of Gen AI hallucinations. But none of it matters if there’s a statutory exception preventing us from even making those arguments.

And I think everyone in the 230 / speech community, even those who disagree that 230 could / should protect Gen AI providers, can agree that we as lawyers should at least be able to make the argument, especially in cases like Walters v. OpenAI.

Shoshana Weissmann: Also a lot of people are unsure re AI being protected by 230 and I’m very sympathetic to the debate. At @RSI we had to think it over and debate each other. But I am pretty convinced that it often is protected. I will say that I understand debate here though

This is such a strange lawyer thing to say. Yes, under current law I agree that you should be allowed to make any potentially viable legal arguments. That does not mean that lawyers having legal grounds to make a potentially invalid argument is inherently a good thing? If it was going to lose in court anyway and the legal procedural principles are protected, what is the harm in not having the argument available?

If it is disputed, generative AI companies know they might lose on the Section 230 argument, and thus already are under this threat. Yet the industry is not collapsing.

Here is Jeffrey Westling pointing to Adam Thierer’s post about consequences if 230 does not apply. Except it might already not apply, and a substantial threat of uncapped legal liability does not sound like something Google or Microsoft would accept under such uncertain conditions? So why should we expect a collapse in production?

I asked Shoshana why Microsoft and Google are acting so cool about all this.

Shoshana: So I really think a chunk of this is that they think 1) 230 does cover them and/or 2) Congress will not fuck this up. I think the answer is somewhere in there

I think I buy the generalized political/legal realism version of this. It would be rather insane to actually kill generative AI, or actually kill Google or Microsoft or even OpenAI, over users jailbreaking LLMs into saying Tony Danza hates puppies. Even if Howley gets his way and really wants to stick it to Big Tech, he does not actually want Google to go bankrupt over something like this or for ChatGPT to shut down, it is absurd, co-sponsor Blumenthal certainly doesn’t, and neither does the rest of the state or country. We would not allow it. We are not a nation of laws in the sense that such a thing would be allowed to happen, if it looked like it was going to then we would fix it.

It is hard not to take claims of imminent internet collapse with salt. To some extent there are always no good, very bad bills being proposed that threaten the internet. Someone has to point this out. But also the internet can’t be on the verge of this easy a collapse as often as they claim.

As in, we constantly hear things like:

Jess Miers: We’re on the brink of losing our edge in Generative AI and stifling future innovations, all due to misplaced anti-tech sentiment. Our startup-friendly culture once set us apart from the EU, but now, we’re just mirroring their playbook.

Hannah Cox: This kind of unconstitutional framework will undermine the progress of this development, bogging the innovators with excessive costs that will impede innovation. Very Atlas Shrugged of them. The bill presenting this moronic plans is Senate Bill 1993. The US has led the world in tech innovation specifically because we applied a capitalist, limited government to its development. These kinds of laws will have us looking like Europe in no time, where guess what, there’s few tech companies to even be found.

So the proposal to not apply Section 230 in a particular situation is unconstitutional? On the contrary, this is a claim that the constitution would protect free speech in this situation even without Section 230, which seems right to me. It cannot be unconstitutional not to have a particular law protecting free speech. The whole point of constitutional free speech is you have it without needing anyone to pass a law.

The European comparison, the threat we will ‘lose our edge,’ is constant. And that kind of talk makes it impossible to calibrate which threats are serious and which ones are not. Europe has taken so many steps like this one over the years, most of which seem obviously terrible, many of them blatantly unconstitutional under American law. Things are not going to flip because we narrow one particular safe harbor that we don’t even agree applies in the case in question.

In the cases being warned about, I strongly think generative AI companies should not be sued. But I also don’t understand why this bill would make that outcome happen in those cases. And that’s going to make it tough to know when such warnings are worth heeding.

Connor Leahy on Eye on AI, including discussing implications of events at OpenAI.

Eliezer Yudkowsky offers a theory of how some approach questions of AI: That they view everything in terms of status and identity, and consider everyone who disputes this to be their enemy making rival status and identity claims.

Eliezer Yudkowsky: If you’re confused why the far left treats “AI yay” and “AI nope” as being all the same conspiracy, it’s because AI/Y and AI/N both say that all of humanity is in the same boat here. This is instinctively recognized by the identity-politics pushers as anathema. For identitarians, the only permitted story-cause is one where designated oppressors will win from AI and previous victims will lose more.

For humanity to win from AI, for humanity to lose from AI–all they hear is the word “humanity”. And the identitarians know that anyone who speaks that word is their enemy. Pretty much the same enemy, from their perspective, to be tarred with a single brush: that whatever we’re trying to say is a distraction from the concerns of identity politics.

This does not mean that AI/Y and AI/N can make common cause against identitarians, to be clear. Each of AI/Y and AI/N does still think the other’s preferred policy is horrible for everyone, and that validly does take precedence as an issue. I am just saying this to try to make bystanders less confused about where the weird side-shots are coming from on the far left side.

I think “Y & N = HYPE” is more the PR pushed by major journalist factions (eg NYT), who indeed see “this will kill everyone” as a status-raising claim, and would prefer techies have less status rather than more.

It sounds more plausible if you’re unable to understand any claim as being about the unknown future rather than the immediate future, so that you’re simply incapable of hearing “AI will kill everyone at some point” as bearing any message except “OpenAI’s AI will kill us in one year” and thence “OpenAI is cool”.

Michael Vassar: Totally agree with all this analysis, and yet, if media is and previously wasn’t fully controlled by people committed to preventing gains to humanity, that has some bearing on whether AGI can be expected any time soonish.

Similarly, the very deliberate implications that Scott Alexander was somehow ‘alt right’ when The New York Times doxxed him, then the same deliberate implication (even via similar supposed links) that Based Beff Jezos was also somehow ‘alt right’ when he was being doxxed by Forbes. Where both claims are completely obvious nonsense, to the point that your entire paper permanently loses credibility.

Richard Ngo offers Meditations on Mot, the God of sterility and lifelessness, representing the lack of technology, contrasting with the danger of overly focusing on Moloch or treating Moloch as a or even the key adversary, and suggesting a richer model of coordination. I appreciate the attempt. I agree with Emmett Shear’s reaction that this is confused about Scott Alexander’s view of coordination, even with the added clarification. Ultimately I disagree with the proposal to not effectively treat Moloch as a causal node. I could potentially be persuaded by a higher bid to say a lot more words here.

There is a directional point here but I would beware taking it too far:

Rob Bensinger: A common mistake I see people make is that they assume AI risk discourse is like the left image, when it’s actually like the right image.

I think part of the confusion comes from the fact that the upper right quadrant is ~empty. People really want some group to be upper-right.

I’d quibble with exact arrangements in the upper left and lower right, as is always the case for such charts. The more important question is if it is true that the upper right corner is basically empty. That those who think AI will be safe are saying that because they do not actually buy that AI will be as powerful as all that. I think Rob’s claim is overstated, but typically underrated.

The hoped-for common ground would be something like this:

  1. Those worried agree that AI lacking sufficiently dangerous capabilities can mostly be left alone aside from ordinary existing law.

  2. Those not worried agree that if AI did display such sufficiently dangerous capabilities, it would be time to very much not leave it alone.

  3. We agree to do #1 while laying groundwork to do #2 if and only if it happens.

  4. We find ways to do this via ordinary regulatory methods as much as possible.

The problem is there is no fire alarm for AGI and people are not good at turning on a dime, habits and incentives persist, so we cannot simply wait and trust that we will handle things later. Also all the trade offers keep coming back without counteroffers.

The other confusion is this, with a reminder not to take anyone’s p(doom) as anything too precise:

Rob Bensinger: Another part of the confusion seems to be that half the people think “doomer” means something like “p(doom) above 5%”, and the other half think “doomer” means something like “p(doom) above 95%”. Then their wires get crossed by the many people who have a p(doom) like 20% or 40%.

As usual, binaries mislead, especially ones that were named by partisans.

Public opinion is severely against AI whenever a pollster asks. The public wants to slow things down and acknowledges existential risk, although it does not consider the issue a priority. This is an extremely robust result.

What about the response that the public is rather deeply stupid about fears of new technologies? We have nuclear power, of course, although it now enjoys majority support from both parties. Rather glaringly, we have GMOs:

Louis Anslow: This is so insane and deserves much much more attention in the context of talking about risk of new technologies.

Roon: using GMO foods as the control group (people already utilize the benefits of this every day while supposedly disliking it) the surveys about people not liking the idea of superintelligence seem a bit less serious

Much like in AI, there are two essentially distinct arguments against GMOs.

One argument is the mundane harm parallel, the question explicitly asked here, that GMOs are ‘unsafe to eat.’ This argument is false for GMOs. I do not think it is obvious nonsense, from the perspective of an average person who is used to being lied to about similar things, used to finding out about health risks decades too late, and used to generally being on the receiving end of the American food and information diets.

The other argument is the existential risk parallel, here the Talib argument for tail risk, that GMOs open up the possibility of crop or biosphere disruptions that are hard to predict, that it leads to monocropping of variants that could have severe vulnerabilities waiting to be found, which means when the house comes crashing down it comes crashing down hard, and that is not something one should mess with. I do not believe we should let this stop us in the case of GMOs, but that is because of my understanding of the facts, risks and rewards involved.

Does that mean I am mostly embracing the argument that we shouldn’t let the public’s instincts, and the fact that we have given regular people no good reason to trust authorities who say new things will be safely and responsibly handled, interfere with policy determinations? Somewhat. I do not think that we should automatically yield to public opinion on this or any other topic. But I do think that voice counts.

I also do think we need to be cautious with the word ‘safe.’ The wording here would give even me pause. In general, is it safe to eat foods that have been genetically modified in unknown ways, as opposed to products offered from a supply chain and source that you trust? Not the same question.

And of course, nothing on GMOs compares to the French expressing strong majority support for a limit of four flights, not in a year but in a lifetime. Something being popular does not mean it is not complete and utter obvious nonsense.

Yoshua Bengio in FT, echoing his calls for Democratic oversight of all kinds.

In particular, it is difficult to align Sam Altman.

Matthew Yglesias: I think the general problem with AI alignment is illustrated by the fact that even the board had all the formal power, Sam Altman was a lot smarter than the board and therefore ultimately they were unable to control him.

We hope the upshot of that is that Sam Altman is also correct on the merits and will use his skills and power for good, but it structurally goes to show that writing effective rules for controlling a smart, hard-working person is challenging.

I do want to be precise, and avoid making the mistake of overemphasizing intelligence within the human ability range. Is Sam Altman smarter than the board? Perhaps so, perhaps not, but I imagine everyone involved is smart and it was close. What mattered in context was that Sam Altman had effectively greater capabilities and affordances not available to the board.

But yes, this is exactly the problem. In a complex, messy, real world, full of various actors and incentives and affordances, if you go up against a sufficiently superior opponent or general class of opponents, you lose. Starting from a technically dominant position is unlikely to save you for long.

And also all of your incentives will be screaming at you, constantly, to turn more and more power and authority over to the more capable entities.

I would also harken back again to the remarkably similar case of that other Sam, Sam Bankman-Fried.

Once again, we saw someone who was smart, who was hard working, who was willing to do what it took to get what they wanted, and whose goals were maximalist and were purportedly aimed at scaling quickly to maximize impact for the good of humanity, and ultimately seemed to be misaligned. Who saw themselves as having a duty to change the world. We saw this agent systematically and rapidly grow in wealth, power and influence, proving increasingly difficult to stop.

Ultimately, Bankman-Fried failed, his house collapsing before he could pull off his plan. But he seems to have come rather dangerously close, despite his many massive errors and reckless plays, to succeeding at gaining an inside track to the American regulatory apparatus and a road to vastly greater wealth, with no obvious way for anyone to keep him in check. Who knows what would have happened that time.

On a more pedestrian level we have the issue of prompt injection.

Amjad Masad (CEO Replit): If prompt injection is fundamentally insolvable, as I suspect it is, then there is a sizeable security company waiting to be built just around mitigating this issue.

I agree that the problem looks fundamentally insolvable and that all we can seek is mitigation. Is there a great company there? Probably. I don’t think it is inevitable that OpenAI would eat your lunch, and there is a lot of bespoke work to do.

Roon asks one of the most important questions. Even if we have the ability to align and control the superintelligences we are about to create, to shape their behavior the ways we want to, how exactly do we want to do that?

Roon: There is a tension between creation and obedience, between stability and real victory. A father loves his son, tries to discipline him, competes with him a little, but ultimately wants the son to surprise him and do better than him in the great circle of life.

To what degree do we want AIs to be obedient and safe? To what degree do we want AIs to be capable of super persuasion and break us out of inadequate equilibria that have plagued us for thousands of years? To what degree do we want AIs to surprise us with new creation?

Humanity is in the process of birthing artificial superintelligence. we are not likely to leave it circumscribed in a box. We want it running organizations and making things that astonish. We want it taking actions where we won’t be able to verify the outcomes until years later.

We need “alignment” rather than safety or security or engineering guarantees. We need better definitions and governance to that end. The creation of new creators is fraught with danger.

The far crazy ends of EA and e/acc are probably more logically consistent than the middle.

John Pressman asks, is there an economic reason for more than one mind to exist? If not, that is quite the threat model, no matter what else might or might not go wrong.

Richard Ngo contrasts alignment with control.

Richard Ngo: In my mind the core premise of AI alignment is that AIs will develop internally-represented values which guide their behavior over long timeframes.

If you believe that, then trying to understand and influence those values is crucial.

If not, the whole field seems strange.

Lately I’ve tried to distinguish “AI alignment” from “AI control”. The core premise of AI control is that AIs will have the opportunity to accumulate real-world power (e.g. resources, control over cyber systems, political influence), and that we need techniques to prevent that.

Those techniques include better monitoring, security, red-teaming, stenography detection, and so on. They overlap with alignment, but are separable from it. You could have alignment without control, or control without alignment, or neither, or (hopefully) both.

I asked in my last thread [discussed above]: how can we influence ASI? My answer: we need to bet on premises like the ones above in order to do the highest-leverage research. For more details on these premises, see my position paper here [from 30 August 2022].

I fail to see how a control-based plan could avoid being obviously doomed, given what sorts of things we are proposing to attempt to control, and under what general conditions. I continue to await a proposal that seems not-obviously-doomed.

Intentions are not ultimately what matters.

ARIA: Programme Director, Suraj, has formulated our first programme thesis. By challenging key tenets underpinning digital computing infrastructure, his programme will aim to reduce the cost of AI compute hardware + alleviate dependence on bleeding-edge chip manufacturing.

Davidad: To provide context for my AI safety friends: I don’t think this approach is a good match for training Transformers, so it will differentially accelerate energy-based models, which are more controllable, interpretable, generalizable within-task, and have fewer emergent abilities.

An uncomfortable corollary of the argument [above], which I still believe holds up, is that Extropic is probably safer than Anthropic, on a purely technical basis, despite the strikingly reversed intentions of the people on both sides.

I have not investigated Extropic. The fact that its founder is cool with human extinction is not a good sign for its safety on many levels. It still could be a better way, if it is a fundamentally less doomed approach.

A few years ago, this would indeed not have been considered much of a skeptic. In most places on Earth, it would not be considered one today.

Gary Marcus: Count me as one of the skeptics! No AGI by end of 2026, mark my words. But I otherwise think @elonmusk’s comments @nytimes on AI safety and AI regulation have been measured and on target.

Jacques: I still remember the days when being an AGI skeptic was when you either thought it would never happen or, if it did, it would be past 2100.

[Gary Marcus then denies ever having said he had 2100-style timelines.]

Shane Legg (Co-founder DeepMind, distinct thread): Wow. It seems like just yesterday (in reality more like 5 years ago) when many AGI skeptics were saying that superintelligence was not coming in the next century. How times have changed.

Quotes Yann LeCun: By “not any time soon”, I mean “clearly not in the next 5 years”, contrary to a number of folks in the AI industry. Yes, I’m skeptical of quantum computing, particularly when it comes to its application to AI.

I do not expect AGI in the next few years either, although I do not believe one can be entirely certain past this point. It is odd to have some call that a ‘skeptical’ position.

Even the skeptical position involves quite a bit of Real Soon Now. At least some amount of freak out is a missing mood.

Roon keeps it real and says what he believes: The people in charge of ai should have a much higher risk tolerance than even median tech ppl. They should be people conscious of risks while skating at the razor’s edge of iterative deployment and research ambition. Anxiety should never suffice as serious evidence for “risk.”

– pausing or slowing down progress doesn’t make any sense to me. I don’t think waiting to solve neural net corrigibility is the right benchmark – empirically studying the behavior of more and more powerful models will do more for safety research than years of math.

This is also why i don’t necessarily care for democratic governance. The members of the OpenAI nonprofit board *should behell bent on a missionary drive to deliver the post AGI future without being stupid about risks

I am not excited to ‘skate at the razor’s edge’ or ‘have much higher risk tolerance.’ I doubt many others are, either. Nor do I want a supervisory board that wants to take more risk – and here risk often means existential risk – than even the median tech engineer.

A key problem with ‘Democratic governance’ for those who want to push forward is the people involved in that Democracy. They are very much against the development of AGI. They dislike AI in general. They are misaligned, in the sense that things they want do not function well out of distribution, and their expressed preferences are not good predictors of what I or Roon think would produce value either for their assessment of value or for ours. They also tend to be quite risk averse, especially when it comes to the transformation of everything around them and the potential death of everyone they love.

That is distinct from the question of iterative development and testing as a path to success. If building and studying models iteratively is a safer path than going slowly, I desire to believe that it is a safer path than going slowly, in which case I would support doing it.

It is likely the first best solution, if it were possible, would be something like ‘build iteratively better models until you hit X, where X is a wise criteria, then stop to solve the problem while no one would be so stupid as to keep advancing capabilities.’ Except that has to be something that we collectively have the ability to actually do, or it doesn’t work. If, as is the default, wee keep charging ahead anyway after we hit the wise X, then the charging ahead before X makes us worse off as well.

Nora Belrose and Quintin Pope write ‘AI will be easy to control.

The argument seems to be: Our current techniques already work to teach human values and instill common sense. Our real values are simple and will be easy to find and we humans are well-aligned to them. Our real values will then be encoded into the AIs so even if we lose control over them everything will be fine. That the opportunity to White Box (examine the components of the AI’s calculation) and do things it would be illegal to do to a human makes things vastly easier when dealing with an AI, that our full control over the input mechanism makes things vastly easier.

And all of this is asserted as, essentially, obvious and undeniable, extreme confidence is displayed, all the arguments offered against this are invalid and dumb, and those that disagree are at best deeply confused and constantly told they did not understand or fairly represent what was said.

I don’t even know where to begin with all that at this point. It all seems so utterly wrong to me on so many levels. I tried one reply to one of Pope’s posts when it won the OpenPhil contest – a post this post cites as evidence – and I do not believe my responding or the resulting exchange got us anywhere. I would consider a conversation worth trying, especially if it was in person somehow, but I don’t see much hope for further written exchange.

So I will simply note that the arguments have been made, that I strongly disagree with the core claims other than that they do cite some marginal reasons to be more hopeful versus a world where those reasons did not hold, I believe the problems involved remain impossibly hard and our leads remain unpromising, and that I have stated my thoughts on such topics previously, including many (but presumably not all) my reasons for disagreement.

I will also note that it is far better to make actual arguments like these, even with all the disagreement and hostility and everything else that I think is wrong with it, than to engage in the typical ad hominem.

The post still puts existential risk from AI, despite all this, at ~1%. Which I will note that I do agree would be an acceptable risk, given our alternatives, if that was accurate.

Andrew Critch has a thread in which he says we have ‘multiple ideas’ how to control AGI, advocates of responsible behavior will be in deep trouble if they keep saying we can’t control it and then we do control it, and he seems to essentially endorse what Belrose and Pope said, although even then he says 10% chance of losing control of the first AI and 85% chance of doom overall, despite this, because he expects us to botch the execution when faced with all this new power.

He also endorses changing the way existential risk discourse uses words to match word use elsewhere, in this case the term ‘white box.’

There was a good response on LessWrong by Steven Byrnes, with which I mostly agree.

There was also a ‘quick take’ from Nate, which was intended to be helpful and which I did find helpful and might even lead to a good dialogue, but in context in mostly generated further animosity. Takes should in future either be far quicker, or involve a full reading and be far less quick.

If you actually believed for a second there that everything involved would really be this easy, would that justify a number as low as 1%? If it was simply about AI being easy to control, I would say no, because we would then have to choose to send the AIs we can control on wise paths, and find an equilibrium.

Nora’s claims, however, are stronger than that. She is saying that the AIs will naturally not only be fully under control, but also somehow somewhat automatically take in true human values, such that if AI somehow did get out of control, they would still work to ensure good outcomes for us. And also she seems fully confident we will have no ethical issues with all the things we would be doing to AIs that we wouldn’t do to humans, including keeping them fully under our control. It is optimism all the way.

Can we get to 99% survival under ASI if we indeed answer fully optimistically at every step, even when I don’t know how to logically parse the claims this requires? I think this would require at least one additional optimistic assumption Nora does not mention. But yes, if you are going to assign approximately zero risk to all these various steps, I can see how someone could get there. Where there is still risk at 1%.

Claims that risk is substantially below 1%, even given the future existence of ASI, seem to rest on some version of ‘you need to tell me exactly how it happens step by step, and then I will multiply your various steps together.’ It has a baseline assumption that creating smarter, more capable entities than humans is a presumed safe exercise until shown to be specifically dangerous, that something has to ‘go wrong’ for humans to not remain on top. That we will remain the special ones.

As opposed to, even if everything else goes as well as it possibly could, you have competition in which those who do not increasingly put their AIs in charge of everything and hand them over power lose such competitions, and the resulting AIs compete with each other, those that are focused (for whatever reason) on gaining resources and power and ensuring copies of themselves exist multiply and gain resources and power and change to be better at this over time, and we perish.

I hope that by now, if you are reading this, you realize that the assumption of human survival in such worlds makes no sense as a default. That perhaps we could get there, but if we do it will be via our own efforts, not something that happens on its own. That the idea that letting technology run its course without intervention works while humans are the most powerful optimizers on the planet and doing all the fine tuning and optimizing that matters, that is why it worked so far, and that once that is no longer true that will stop working for us even if we solve various problems that I think are impossibly hard (but that Belrose insists will be easy).

Nora Belrose even explicitly endorses that her good scenarios involve the creation of misaligned AIs, smarter and more capable than humans. Which means a world with competition between super-capable entities competing with and against humans. I don’t see how one can assign anything like a 99% chance of survivable outcomes to such worlds, even if a full and free ‘alignment solution’ was created and made universally available today.

Arvind Narayanan: We must prepare for a world in which unaligned models exist, either because threat actors trained them from scratch or because they modified an existing model. We must instead look to defend the attack surfaces that attackers might target using such models

Yo Shavit: Unfortunately, I’ve increasingly come to the conclusion that (other than maybe at the short-term frontier) this is probably the world we’re going to be in. It implies a very different set of mitigations beyond outright prevention. We need to reprioritize and get going on them.

Nora Belrose: To be honest, I don’t view this as an “unfortunate” scenario but more like, “of course we were always going to have misaligned AIs, just like we have ‘misaligned’ humans; trying to prevent this is hopeless and any serious attempt would have increased tyranny risk.”

Would have ‘increased tyranny risk’? What do you think happens with misaligned superintelligences on the loose? The response at that stage will not only work out, it will also be less intrusive? We all keep our freedoms in the meaningful senses, humans stay in charge and it all works out? Are we gaming this out at all? What?

I do not get it. I flat out do not get it.

What seems hopeless is repeating the explanations over and over again. I do it partly in hopes of rhetorical innovation via iteration and exploration, partly to hope new people are somehow reached, partly because the argument doesn’t stop, partly because I don’t know what else to do. It is continuously getting less fun.

Recently a clip of me discussing my p(doom) was passed around Twitter, with a number of responses blaming me for not justifying my answer with a bunch of explanations and mathematical calculations. Or asking how dare I disagree with ‘superforecasters.’ To which I want to scream, I know from context you know of my work, so are you saying I have not written enough words explaining my thinking? Was I not clear? Do I need to start from scratch every time someone pulls an audio clip?

Sigh.

Arvind Narayanan’s comment above links to his post claiming that alignment such as RLHF currently is effective against accidental harm to users, but that the problem with adversarial attacks runs deep. Not are current RLHF and similar techniques unable to defend against such attacks, he says, alignment is inherently unable to do this.

Model alignment may be useless even against much weaker adversaries, such as a scammer using it to generate websites with fraudulent content, or a terrorist group using AI for instructions on how to build a bomb. If they have even a small budget, they can pay someone to fine tune away the alignment in an open model (in fact, such de-aligned models have now been publicly released). And recent research suggests that they can fine tune away the alignment even for closed models.

Indeed this is the case for open source models and all known alignment techniques, that the fine-tune cost to eliminate all safeguards is trivial. I do not see any even theoretical proposal of how to change this unfortunate reality. If you allow unmonitored fine-tuning of a closed model, you can jailbreak those as well. I presume the solution to this will be that fine tuning of sufficiently capable closed source models will be monitored continuously to prevent this from happening, or the resulting model’s weights will be kept controlled and its outputs will be monitored, or something else similar will be done, or else we won’t be able to offer fine tuning.

I disagree with Arvind’s assertion that existing open source models are sufficiently capable that it is already too late to prevent the existence of unaligned models. Yes, Llama-2 and similar models have their uses for bad actors, but in a highly manageable way.

Arvind’s third claim is that you can use other methods, like monitoring and filtering of inputs, as a substitute for model alignment. If the model is vulnerable to particular weird strings, you can check for weird strings. At current tech levels, this seems right. Once again, this option is closed source only, but OpenAI could totally load up on such techniques if it wanted to, and for now it would raise the jailbreak bar a lot, especially after many iterations.

Longer term, as the models grow more capable, this focus on the malintent of the user or the hostile properties of inputs becomes misplaced, but for now it seems valid. Short term, as Arvind notes, you wouldn’t want to do anything where you cared about someone doing a prompt injection attack or you otherwise needed full reliability, but if you can afford some mistakes you can get a lot of utility.

Steven Pinker buys Effective Altruism’s cost estimates for saving lives at $5,000 straight up including not on that close a margin, but he does not buy that smarter than human intelligences might pose an existential threat worth spending money to mitigate.

Steven Pinker: Half a billion dollars, donated to effective philanthropies like Givewell, could have saved 100,000 lives. Instead it underwrote ingenious worries like, ‘”If an AI was tasked with eliminating cancer, what if it deduced that exterminating humanity was the most efficient way, and murdered us all?” This is not effective altruism.

Wei Dai: Me: Humanity should intensively study all approaches to the Singularity before pushing forward, to make sure we don’t mess up this once in a lightcone opportunity. Ideally we’d spend a significant % of GWP on this. Others: Even $50 million per year is too much.

And thus, if the movement splits its money between doing the thing you say saves lives vastly more efficiently than other charities, and this other thing you dismiss as stupid? Then you blame them for not spending only on the thing you approve.

You know who Steven Pinker sounds exactly like here? The Congressional Republicans who give a speech each year on how we should cut science funding because there was some studies on things like migratory patterns of birds that they thought was stupid. Except instead of public funding for things many people would indeed largely not want to fund, this is completely voluntary and private funding.

In what was quite the mind-numbing conversation throughout, here is the section that was about AI.

First, we have the boiler plate, included for completeness but you can skip.

R. SORKIN:  Okay, let me ask a different question.  AI —  and I — I know we don’t have a lot of time.  Sam Altman has been talking a lot about the need for regulation.  You’ve talked about the need for regulation.

THE VICE PRESIDENT:  Yeah.

MR. SORKIN:  Washington has not been able to get its arms even around social media. 

THE VICE PRESIDENT:  Yeah.

MR. SORKIN:  How do you imagine Washington could?  And what — if you had to regulate AI, how would you do it?

THE VICE PRESIDENT:  Right.  So, I actually am back a few weeks now from London, the U.K.  Rishi Sunak invited a number of us to talk about safety and AI.  And I presented, basically, our vision, the vision that we have for the future of AI in the context of safety. 

And I would offer a number of points: One, I think it is im- — is critically important that we, as the United States, be a leader on this, including how we perceive and then interpret what should be the international rules and norms on a variety of levels, including what would be in the best interest —

MR. SORKIN:  Right.

THE VICE PRESIDENT:  — of our national security.

Then comes the dumbest timeline department. The first paragraph is infuriating, although I suppose only about as infuriating as others find Biden when he responds to Mission Impossible: Dead Reckoning, two sides of the same coin.

But then comes the idea of ‘existential to whom?’ and there are so many levels in which this person simply does not get it.

VP: I do believe also that we should evaluate risk.  There is a lot of discussion on AI that is about existential risks, and those are real, but one should also ask: Existential to whom?  So, we have an image of the Terminator and Arnold Schwarzenegger and the machine and — right? — machine versus man.  And many would argue that that is something that we should take seriously as a possibility.  It is not a current threat.

We should also, in thinking about [AI] policy, think about the current threats.  And in that way, I present it as existential to whom when we ask about existential threats. 

For example, if we are talking about a senior and — seniors, I’ve done a lot of work in terms of abuse of seniors.  They have lived a life of productivity.  They are sitting on assets.  They are vulnerable to predators and scams.  And the use of technology and AI is one of those that is currently happening where you’ve heard the stories — you may know the stories; you may have family members — who the audio sounds like their grandson, “I’m in distress; I need help,” and they start giving away their life’s savings.

Existential to who?  Existential to that senior.  That’s how it feels.  Existential to who? 

Eliezer has a response, which I will put directly here.

The fact that mundane harms can ‘feel existential’ to people anyway is perhaps confusing her. She has in mind, as the good Senator Blumenthal put it, the effect on jobs. Except no. Seriously. If you are going to be evoking Terminator then you might or might not be confused in a different highly understandable way, or you might only be trying to make people you dislike sound foolish through metaphor, but you know damn well the whom in ‘existential to whom.’

And you know damn well, madame Vice President, exactly what ‘existential’ means here. It does not mean evoking Continental philosophy. It does not mean how anyone feels. It means death.

Anyway, she goes on and does it again.

How about the father who is driving and then is the subject of facial recognition that is flawed, ends up in jail?  Well, that’s existential to his family.  Existential to who?

I mean, seriously? What the actual f? Let’s go over this again.

Anyway, full remarks, so she goes back to boilerplate again. The whole ‘intentional to not stifle innovation’ argument, and, well, I don’t mean to laugh but have you met the entire Biden administration? To be clear, the answer could be no.

So, the spec- — the full spectrum of risks must also be evaluated as we are establishing public policy.

My final point is: Public policies should be intentional to not stifle innovation.  And I say this as the former Attorney General of California.  I ran the second-largest Department of Justice in the United States, second only to the United States Department of Justice, and created one of the first Privacy and Protection Units of any Department of Justice.  Back in 2010, I was elected.

I know that there is a balance that can and must be struck between what we must do in terms of oversight and regulation and being intentional to not stifle innovation.  I will also agree with you, as a devout public servant, government has historically been too slow to address these issues.  AI is rapidly expanding.

MR. SORKIN:  Right.

THE VICE PRESIDENT:  And we have to, then, take seriously our ability to have the resources and the skillset to do this in a smart way that strikes the right balance and doesn’t accept false choices.

In my experience, don’t accept false choices is sometimes important, but mostly is what people say when they want to promise incompatible things, that their approach will magically do everything good and nothing bad, have everyone assume it will somehow work out and get promoted or move on before it blows up in their face.

Yes, this is the person the White House put in charge of many of its AI efforts, although that was before Dead Reckoning, and is also the person those who want reasonable AI policy are going to have to hope wins the next election, given Trump has already stated his intention to revoke the executive order on AI.

The rules have changed.

The rules have stayed the same.

All I’m saying is, we were definitely warned.

Not that you understood.

You Shavit: One interesting realization from moving inside OpenAI is that a lot of the time, we have no idea what Roon is talking about either.

Nor did she: A reply to Kamala Harris on existential risk. She asks, existential to whom? There is a type of person, which she is, who can only think in such terms.

AI #41: Bring in the Other Gemini Read More »

gemini-1.0

Gemini 1.0

Discover more from Don’t Worry About the Vase

A world made of gears. Doing both speed premium short term updates and long term world model building. Explorations include AI, policy, rationality, Covid and medicine, strategy games and game design, and more.

Over 9,000 subscribers

It’s happening. Here is CEO Pichai’s Twitter announcement. Here is Demis Hassabis announcing. Here is the DeepMind Twitter announcement. Here is the blog announcement. Here is Gemini co-lead Oriol Vinyals, promising more to come. Here is Google’s Chief Scientist Jeff Dean bringing his best hype.

EDIT: This post has been updated for the fact that I did not fully appreciate how fake Google’s video demonstration was.

Let’s check out the specs.

Context length trained was 32k tokens, they report 98% accuracy on information retrieval for Ultra across the full context length. So a bit low, both lower than GPT—4 and Claude and lower than their methods can handle. Presumably we should expect that context length to grow rapidly with future versions.

There are three versions of Gemini 1.0.

Gemini 1.0, our first version, comes in three sizes: Ultra for highly-complex tasks, Pro for enhanced performance and deployability at scale, and Nano for on-device applications. Each size is specifically tailored to address different computational limitations and application requirements.

Nano: Our most efficient model, designed to run on-device. We trained two versions of Nano, with 1.8B (Nano-1) and 3.25B (Nano-2) parameters, targeting low and high memory devices respectively. It is trained by distilling from larger Gemini models. It is 4-bit quantized for deployment and provides best-in-class performance.

The Nano series of models leverage additional advancements in distillation and training algorithms to produce the best-in-class small language models for a wide variety of tasks, such as summarization and reading comprehension, which power our next generation on-device experiences.

This makes sense. I do think there are, mostly, exactly these three types of tasks. Nano tasks are completely different from non-Nano tasks.

This graph reports relative performance of different size models. We know the sizes of Nano 1 and Nano 2, so this is a massive hint given how scaling laws work for the size of Pro and Ultra.

Gemini is natively multimodal, which they represent as being able to seamlessly integrate various inputs and outputs.

They say their benchmarking on text beats the existing state of the art.

Our most capable model, Gemini Ultra, achieves new state-of-the-art results in 30 of 32 benchmarks we report on, including 10 of 12 popular text and reasoning benchmarks, 9 of 9 image understanding benchmarks, 6 of 6 video understanding benchmarks, and 5 of 5 speech recognition and speech translation benchmarks. Gemini Ultra is the first model to achieve human-expert performance on MMLU (Hendrycks et al., 2021a) — a prominent benchmark testing knowledge and reasoning via a suite of exams — with a score above 90%. Beyond text, Gemini Ultra makes notable advances on challenging multimodal reasoning tasks.

I love that ‘above 90%’ turns out to be exactly 90.04%, whereas human expert is 89.8%, prior SOTA was 86.4%. Chef’s kiss, 10/10, no notes. I mean, what a coincidence, that is not suspicious at all and no one was benchmark gaming that, no way.

We find Gemini Ultra achieves highest accuracy when used in combination with a chain-of-thought prompting approach (Wei et al., 2022) that accounts for model uncertainty. The model produces a chain of thought with k samples, for example 8 or 32. If there is a consensus above a preset threshold (selected based on the validation split), it selects this answer, otherwise it reverts to a greedy sample based on maximum likelihood choice without chain of thought.

I wonder when such approaches will be natively integrated into the UI for such models. Ideally, I should be able to, after presumably giving them my credit card information, turn my (Bard?) to ‘Gemini k-sample Chain of Thought’ and then have it take care of itself.

Here’s their table of benchmark results.

So the catch with MMLU is that Gemini Ultra gets more improvement from CoT@32, where GPT-4 did not improve much, but Ultra’s baseline performance on 5-shot is worse than GPT-4’s.

Except the other catch is that GPT-4, with creative prompting, can get to 89%?

GPT-4 is pretty excited about this potential ‘Gemini Ultra’ scoring 90%+ on the MMLU, citing a variety of potential applications and calling it a substantial advancement in AI capabilities.

They strongly imply that GPT-4 got 95.3% on HellaSwag due to data contamination, noting that including ‘specific website extracts’ improved Gemini’s performance there to a 1-shot 96%. Even if true, performance there is disappointing.

What does this suggest about Gemini Ultra? One obvious thing to do would be to average all the scores together for GPT-4, GPT-3.5 and Gemini, to place Gemini on the GPT scale. Using only benchmarks where 3.5 has a score, we get an average of 61 for GPT 3.5, 79.05 for GPT-4 and 80.1 for Gemini Ultra.

By that basic logic, we would award Gemini a benchmark of 4.03 GPTs. If you take into account that improvements matter more as scores go higher, and otherwise look at the context, and assume these benchmarks were not selected for results, I would increase that to 4.1 GPTs.

On practical text-only performance, I still expect GPT-4-turbo to be atop the leaderboards.

Gemini Pro clearly beat out PaLM-2 head-to-head on human comparisons, but not overwhelmingly so. It is kind of weird that we don’t have a win rate here for GPT-4 versus Gemini Ultra.

Image understanding benchmarks seem similar. Some small improvements, some big enough to potentially be interesting if this turns out to be representative.

Similarly they claim improved SOTA for video, where they also have themselves as the prior SOTA in many cases.

For image generation, they boast that text and images are seamlessly integrated, such as providing both text and images for a blog, but provide no examples of Gemini doing such an integration. Instead, all we get are some bizarrely tiny images.

One place we do see impressive claimed improvement is speech recognition. Note that this is only Gemini Pro, not Gemini Ultra, which should do better.

Those are error rate declines you would absolutely notice. Nano can run on-device and it is doing importantly better on YouTube than Whisper. Very cool.

Here’s another form of benchmarking.

The AlphaCode team built AlphaCode 2 (Leblond et al, 2023), a new Gemini-powered agent, that combines Gemini’s reasoning capabilities with search and tool-use to excel at solving competitive programming problems. AlphaCode 2 ranks within the top 15% of entrants on the Codeforces competitive programming platform, a large improvement over its state-of-the-art predecessor in the top 50% (Li et al., 2022).

AlphaCode 2 solved 43% of these competition problems, a 1.7x improvement over the prior record-setting AlphaCode system which solved 25%.

I read the training notes mostly as ‘we used all the TPUs, no really there were a lot of TPUs’ with the most interesting note being this speed-up. Does this mean they now have far fewer checkpoints saved, and if so does this matter?

Maintaining a high goodput [time spent computing useful new steps over the elapsed time of a training job] at this scale would have been impossible using the conventional approach of periodic checkpointing of weights to persistent cluster storage.

For Gemini, we instead made use of redundant in-memory copies of the model state, and on any unplanned hardware failures, we rapidly recover directly from an intact model replica. Compared to both PaLM and PaLM-2 (Anil et al., 2023), this provided a substantial speedup in recovery time, despite the significantly larger training resources being used. As a result, the overall goodput for the largest-scale training job increased from 85% to 97%.

Their section on training data drops a few technical hints but wisely says little. They deliberately sculpted their mix of training data, in ways they are keeping private.

In section 6 they get into responsible deployment. I appreciated them being clear they are focusing explicitly on questions of deployment.

They focus (correctly) exclusively on the usual forms of mundane harm, given Gemini is not yet breaking any scary new ground.

Building upon this understanding of known and anticipated effects, we developed a set of “model policies” to steer model development and evaluations. Model policy definitions act as a standardized criteria and prioritization schema for responsible development and as an indication of launch-readiness. Gemini model policies cover a number of domains including: child safety, hate speech, factual accuracy, fairness and inclusion, and harassment.

Their instruction tuning used supervised fine tuning and RLHF.

A particular focus was on attribution, which makes sense for Google.

Another was to avoid reasoning from a false premise and to otherwise refuse to answer ‘unanswerable’ questions. We need to see the resulting behavior but it sounds like the fun police are out in force.

It doesn’t sound like their mitigations for factuality were all that successful? Unless I am confusing what the numbers mean.

Looking over the appendix and its examples, it is remarkable how unimpressive were all of the examples given.

I notice that I watch how honestly DeepMind approaches reporting capabilities and attacking benchmarks as an important sign for their commitment to safety. There are some worrying signs that they are willing to twist quite a ways. Whereas the actual safety precautions do not bother me too much one way or the other?

The biggest safety precaution is one Google is not even calling a safety precaution. They are releasing Gemini Pro, and holding back Gemini Ultimate. That means they have a gigantic beta test with Pro, whose capabilities are such that it is harmless. They can use that to evaluate and tune Ultimate so it will be ready.

The official announcement offers some highlights.

Demis Hassabis talked to Wired about Gemini. Didn’t seem to add anything.

Gemini Pro, even without Gemini Ultra should be a substantial upgrade to Bard. The question is, will that be enough to make it useful when we have Claude and ChatGPT available? I will be trying it to find out, same as everyone else. Bard does have some other advantages, so it seems likely there will be some purposes, when you mostly want information, where Bard will be the play.

This video represents some useful prompt engineering and reasoning abilities, used to help plan a child’s birthday party, largely by brainstorming possibilities and asking clarifying questions. If they have indeed integrated this functionality in directly, that’s pretty cool.

Pete says Bard is finally at a point where he feels comfortable recommending it. The prompts are not first rate, but he says it is greatly improved since September and the integrations with GMail, YouTube and Maps are useful. It definitely is not a full substitute at this time, the question is if it is a good complement.

Even before Gemini, Bard did a very good job helping my son with his homework assignments, such that I was sending him there rather than to ChatGPT.

Returning a clean JSON continues to require extreme motivation.

When will Bard Advanced (with Gemini Ultra) be launched? Here’s a market on whether it happens in January.

Some were impressed. Others, not so much.

The first unimpressive thing is that all we are getting for now is Gemini Pro. Pro is very clearly not so impressive, clearly behind GPT-4.

Eli Dourado: Here is the table of Gemini evals from the paper. Note that what is being released into the wild today is Gemini Pro, not Gemini Ultra. So don’t expect Bard to be better than ChatGPT Plus just yet. Looks comparable to Claude 2.

Simeon? Not impressed.

Simeon: Gemini is here. Tbh it feels like it’s GPT-4 + a bit more multimodality + epsilon capabilities. So my guess is that it’s not a big deal on capabilities, although it might be a big deal from a product standpoint which seems to be what Google is looking for.

As always, one must note that everything involved was chosen to be what we saw, and potentially engineered or edited. The more production value, the more one must unwind.

For the big multimodal video, this issue is a big deal.

Robin: I found it quite instructive to compare this promo video with the actual prompts.

Robert Wiblin (distinct thread): It’s what Google themselves out put. So it might be cherry picked, but not faked. I think it’s impressive even if cherry picked.

Was this faked? EDIT: Yes. Just yes. Shame on Google on several levels.

Set aside the integrity issues, wow are we all jaded at this point, but when I watched that video, even when I assumed it was real, the biggest impression I got was… big lame dad energy?

I do get that this was supposedly happening in real time, but none of this is surprising me. Google put out its big new release, and I’m not scared. If anything, I’m kind of bored? This is the best you could do?

Whereas when watching the exact same video, others react differently.

Amjad Masad (CEO Replit): This fundamentally changes how humans work with computers.

Does it even if real? I mean, I guess, if you didn’t already assume all of it, and it was this smooth for regular users? I can think of instances in which a camera feed hooked up to Gemini with audio discussions could be a big game. To me this is a strange combination of the impressive parts already having been ‘priced into’ my world model, and the new parts not seeming impressive.

So I’m probably selling it short somewhat to be bored by it as a potential thing that could have happened. If this was representative of a smooth general multimodal experience, there is a lot to explore.

Arthur thinks Gemini did its job, but that this is unsurprising and it is weird people thought Google couldn’t do it.

Liv Boeree? Impressed.

Liv Boeree: This is pretty nuts, looks like they’ve surpassed GPT4 on basically every benchmark… so this is most powerful model in the world?! Woweee what a time to be alive.

Gary Marcus? Impressed in some ways, not in others.

Gary Marcus: Thoughts & prayers for VCs that bought OpenAI at $86B.

Hot take on Google Gemini and GPT-4:

👉Google Gemini seems to have by many measures matched (or slightly exceeded) GPT-4, but not to have blown it away.

👉From a commercial standpoint GPT-4 is no longer unique. That’s a huge problem for OpenAI, especially post drama, when many customers are now seeking a backup plan.

👉From a technical standpoint, the key question is: are LLMs close to a plateau?

Note that Gates and Altman have both been dropping hints, and GPT-5 isn’t here after a year despite immense commercial desire. The fact that Google, with all its resources, did NOT blow away GPT-4 could be telling.

I love that this is saying that OpenAI isn’t valuable both because Gemini is so good and also because Gemini is not good enough.

Roon offers precise praise.

Roon: congrats to Gemini team! it seems like the global high watermark on multimodal ability.

The MMLU result seems a bit fake / unfair terms but the HumanEval numbers look like a actual improvement and ime pretty closely match real world programming utility

David Manheim seems on point (other thread): I have not used the system, but if it does only slightly outmatch GPT-4, it seems like slight evidence that progress in AI with LLMs is not accelerating the way that many people worried and/or predicted.

Joey Krug is super unimpressed by the fudging on the benchmarks, says they did it across the board not only MMLU.

Packy McCormick: all of you (shows picture)

Ruxandra Teslo: wait what happened recently? did they do something good?

Packy: they did a good!

Google’s central problem is not wokeness, it is that they are a giant company with lots of internal processes and powers that prevent or slow or derail innovation, and prevent moving fast or having focus. And there are especially problems making practical products, integrating the work of various teams, making incentives line up. There is lots of potential, tons of talent, plenty of resources, but can they turn that into a product?

Too soon to tell. Certainly they are a long way from ‘beat OpenAI’ but this is the first and only case where someone might be in the game. The closest anyone else has come is Claude’s longer context window.

Gemini 1.0 Read More »

‘lego-bricktales’-quest-review-–-vr-brick-building-done-right

‘LEGO Bricktales’ Quest Review – VR Brick-building Done Right

LEGO Bricktales may not be a VR-native, as it was first released on flatscreen last year, but this Quest-exclusive port makes a pretty solid case that lego brick-building not only works in VR, but is something anyone can do for hours on end—even in the face of a pretty kid-focused story.

LEGO Bricktales Details:

Available On:  Quest 2/3/Pro

Reviewed On:  Quest 3

Release Date:  December 7th, 2023

Price: $30

Developer: ClockStone STUDIO

PublisherThunderful Publishing AB

Gameplay

LEGO Bricktales isn’t just a big box of lego in VR where you can go wild—there is a sandbox mode for each bespoke puzzle, however no ‘free for all’ blank sandbox space to build whatever you want. The emphasis with Bricktales is definitely on building all sorts of functional things with one-off lego sets, such as bridges, furniture, statues and more, and doing it amid some classic RPG worldbuilding that includes a ton of linear quests and puzzles to solve.

The kid-friendly story will have you spending a lot of time engaging with characters via text-based dialogue and figuring out how to help out each of the little inhabitants in the world, all of which (if it matters to you) comes with zero combat.

Image captured by Road to VR

After all, you’re here to help restore the world by fixing things, and making everyone happy so you can… for some reason… fix your grandpa’s theme park with the power of happiness. Ok, that part is a little clunky, but it’s all in the name of honest, squeaky-clean fun that’s hard knock.

So, Bricktales is family-friendly fun, and it’s been largely admired for its light puzzling elements thanks to its clever block-building function. But how does that translate to VR? I would say surprisingly well—and that’s despite the inherent lack of tactility. When you’re prompted to build a model, you’re transported to a building space where you can grab pieces from a pre-set pile that you’ll need to attach to specific starting points. The objective below is to build a bridge from the blue arrow to the flag. Build it too wobbly, and it won’t past the stability test, making you reassess your design before going back to the world map.

Image captured by Road to VR

While picking up and using fiddly little pieces sounds like a nightmare in VR, the digital lego pieces thankfully only go in one specific orientation, so snapping them into place is satisfying, and rarely ends in a miss. Browsing pieces with the tips of your controllers, which are blue orb-like cursors, you can pick up blocks, place them, and highlight to remove pieces from models. To snap them into a different orientation, you can either physically move the piece, or hold it and use the right joystick to change positions.

The only thing missing really is a quick reset button for when you’ve completely screwed up a model, which instead requires you to dismantle and throw lego bricks off the map to reset them into their little hoppers. That’s pretty tedious, especially if you want to build something from the ground up again.

There are a good array of puzzle styles ranging from bridge builder-style affairs, like the one above, to fulfilling one-off tasks, like constructing a perfectly balanced perch for a giant bird or building a racecar. Watch out though, because you can’t just plop down whatever you want. Each building prompt comes with a few prerequisites. Here’s how a typical puzzle might go for a little helicopter you need to build:

  • Attach the seat
  • Attach the rotor on top
  • Reach the finish line
  • Nothing may break
Image courtesy ClockStone Studio

From there, you can build anything your imagination can handle (within the translucent wire cage), or equally just stick to the bare bones task to get past the hurdle. While none of the tasks are particularly hard (on flatscreen the game is suggested for kids 8+), all of them are gratifying in their own way, as they typically provide enough decorative pieces so you can not only build something functional, but embellish it with plenty of flair.

While fun in spurts, Bricktales also undoubtedly relies a ton on the cute factor of its little lego dioramas, all of which feel true to life. You can’t resize maps, which can either float in your living room thanks to mixed reality, or float in an unobtrusive skybox when played purely in VR. You can however twist and turn maps to get a better view for hidden pathways and so many easter eggs that you’ll be obligated to come back after the story is done, if only to see why that weird tree-man needs 20 chameleons. Seriously? Is what is he going to do with them??

Ok, as far as reasons for searching around the entire game for collectible extras goes, that’s fairly obtuse. Still, the “rated for ‘E’ everyone” age rating definitely means it’s geared towards kids, but snappy enough for adults to play too. Beware though, it’s not going to be the most engaging story, albeit harmless enough to act as sufficient narrative scaffolding that took me around six hours to complete. That’s just the story mode, so you can spend a lot more time rebuilding models and searching out the game’s many (many) collectibles, avatar skins, etc.

Image captured by Road to VR

One of the definite misses with LEGO Bricktales is the lack of a dedicated sandbox. You can unlock a sandbox mode once you complete a bespoke construction spot. This lets you improve your model and also build with a growable selection of bricks from different biomes you explore along the way, but the true ‘sit down and build whatever’ feature would be great when you’re just looking to completely space out and build something of your own design.

Immersion

As you’d imagine, the whole word is made of lego, which is just so damn charming on its own. As a slightly-modified VR port of the flatscreen version, much of the praise you’ll find out there for Bricktales is also true here, but visually the Quest version has a definite leg-up on monitor versions. There’s something about the density of detail in the little dioramas that feels like really playing a game from the future.

Image captured by Road to VR

Both Quest Pro and Quest 3 have color passthrough, which can be more immersive than playing in straight VR, which features a pretty innocuous skybox. On the spectrum of gimmick to absolutely essential though, the mixed reality in Bricktales is much closer to the gimmick side of things, as it’s just a plain passthrough and no real mixed reality implementation that would make it more immersive (i.e. logo dudes knowing where you couch is or busting through your walls). Still, it’s a pretty great gimmick, considering the little lego pieces are all accurately sized to their real-world counterparts. It’s difficult to at least marvel once or twice that you’re remote-controlling a little lego dude on your living room floor.

That said, there are less VR-specific interactions than I would have hoped, as most of the time you’re hunched over at the model controlling your dude like an RC car with your left thumbstick. Here’s the only other ‘immersive’ control scheme in the game: a rotary valve that can turn things like statues, water valves, etc.

View post on imgur.com

Substantively, the only other VR-specific adaptation from the original is your wrist-worn UI which clumsily lets you toggle through specific powers, leave the map to return to the overworld, and go through regular menu stuff.

Comfort

My first instinct was to hunch over and play the game like some sort of demigod looking over my little realm. The game is super approachable, and is designed for long playsessions, however it’s easy to lock into bad neck and back positions. Because VR headsets add extra weight that your neck has to overcompensate for, hunching over to play will fatigue your more quickly than doing the same action without the headset.

Granted, you can dynamically reposition the map to your liking at any point, so it’s more of a warning for players than a flaw as such. Otherwise, LEGO Bricktales is a very comfortable VR game since it lacks any sort of artificial locomotion, presenting you with an entirely static space.

‘LEGO Bricktales’ Comfort Settings – December 6th, 2023

Turning
Artificial turning ✖
Snap-turn ✖
Quick-turn ✖
Smooth-turn ✖
Movement
Artificial movement ✖
Teleport-move ✖
Dash-move ✖
Smooth-move ✖
Blinders ✖
Head-based ✖
Controller-based ✖
Swappable movement hand ✖
Posture
Standing mode ✖
Seated mode ✖
Artificial crouch ✖
Real crouch ✖
Accessibility
Subtitles ✔
Languages English, Simplified Chinese, Danish, French, German, Italian, Japanese, Korean, Portuguese (Brazil), Russian, Spanish
Dialogue audio ✖
Languages n/a
Adjustable difficulty ✖
Two hands required ✔
Real crouch required ✖
Hearing required ✖
Adjustable player height ✖

‘LEGO Bricktales’ Quest Review – VR Brick-building Done Right Read More »

ai-can-copy-human-social-learning-skills-in-real-time,-deepmind-finds

AI can copy human social learning skills in real time, DeepMind finds

Human intelligence heavily depends on acquiring knowledge from other humans — accumulated through time as part of our cultural evolution. This type of social learning, known in literature as cultural transmission, enables us to imitate actions and behaviours in real time. But can AI also develop social learning skills the same way?

Imitation learning has long been a training approach for artificial intelligence, instructing the algorithms to observe humans complete a task and then try to mimic them. But usually AI tools need multiple examples and exposure to vast amounts of data to successfully copy their trainer.

Now, a groundbreaking study by DeepMind researchers claims that AI agents can also demonstrate social learning skills in real time, by imitating a human in novel contexts “without using any pre-collected human data.”

Specifically, the team focused on a particular form of cultural transmission, known as observational learning or (few-shot) imitation, which refers to the copying of body movement.

DeepMind ran its experiment in a simulated environment called GoalCycle3D, a virtual world with uneven terrain, footpaths, and obstacles, which the AI agents had to navigate.

To help the AI learn, the researchers used reinforcement learning. For those unfamiliar with Pavlov’s work in the field, this method is based on offering rewards for every behaviour that facilitates learning and the desired result — in this case, finding the correct course.

At the following stage, the team added expert agents (either hard-coded or human-controlled) that already knew how to navigate the simulation. The AI agents understood quickly that the best way to reach their destination was to learn from the experts.

The researchers’ observations were twofold. Firstly, they found that the AI not only learned faster when mimicking the experts, but also that it applied the knowledge it had gained to other virtual paths. Secondly, DeepMind discovered that the AI agents could still use their new skills even in the absence of the experts, which, according to the study’s authors, constitutes an example of social learning.

While the authors note that more research is needed, they believe that their method can pave the way “for cultural evolution to play an algorithmic role in the development of artificial general intelligence.” They also look forward to further interdisciplinary cooperation between the fields of AI and cultural evolutionary psychology.

Despite its early stage, DeepMind’s breakthrough could have significant implications for the artificial intelligence industry. Such an advancement has the potential to reduce the traditional, resource-intensive training of algorithms, while increasing their problem-solving capabilities. It also raises the question of whether artificial intelligence could ever learn to acquire social and cultural elements of human thought.

The full study is published on the journal Nature Communications.

Published

Back to top

AI can copy human social learning skills in real time, DeepMind finds Read More »

7-most-in-demand-programming-languages-for-2024

7 most in-demand programming languages for 2024

As a new year approaches, you might be curious to see whether your programming skills are still in demand or whether you should consider up-skilling for the best opportunities.

Hundreds of coding languages have emerged over the years; no matter what you’re hoping to create, there is no doubt a programming language out there for it.

So which are standing the test of time and which are worth boning up on? Here are seven that are set to emerge or remain in demand in 2024 and beyond.

Python

Hailed for its versatility and dev velocity, Python has steadily climbed the programming language charts over the past few years. It’s considered a useful language for working with AI, and Statista reports it was the third most used language of 2023, behind JavaScript and HTML/CSS.

The TIOBE Index, which factors search volume popularity into its rankings, currently lists Python in the number one spot.

Its power lies in its ability to automate tasks and improve workflows. Skilled software engineers with strong Python skills are in demand right now and will continue to be.

Python developers are natural problem-solvers, always looking for ways to optimise and improve processes.

If Python is your language of choice, Tech for Good is hiring a senior Python engineer to help develop a healthcare product that enables users to better manage their patient experience. It’s a UK-based remote role, though you will collaborate with a small, globally distributed team across the US, New Zealand and, eventually, Europe. Curious? See the requirements here.

Java

Since its creation in 1995, Java has been a solid and steady performer. A survey of 14 million developer jobs earlier this year put Java as the third most in-demand programming language.

Widely used in everything from web development to cloud computing, Internet of Things applications and large-scale enterprise tools, it’s commonly seen as a language that offers excellent job security.

PHP

Depending on who you ask, this 28-year-old programming language is either making a comeback – or never went away. Mainly used for web development, PHP skills continue to be sought after on the job market. Over 77% of websites still rely on it and one in every 10 dev jobs calls for it.

If you’re a PHP dev with a love of web culture, Belgian IT company Smals is looking for a PHP lead developer to help create websites for various Belgian federal and regional institutions. Working with a multidisciplinary team, you will work on project definition and design of open-source products and translate customer needs into cutting-edge digital solutions. Find out more about the role here.

C++

C++ continues to be one of the most popular programming languages out there, thanks to its versatility and high performance.

Widely used in the gaming industry, as well as for system-level programming, where interactions with hardware are crucial, there is a constant demand for C++ developers across a wide range of industries, translating into strong job security.

Kotlin

Popular for both Android and cross-platform app development, Kotlin is supported by Google, which announced it as an official language for Android development in 2017. Since then, it has steadily grown in popularity.

Fintech company SumUp is currently seeking a senior backend Kotlin engineer to work with the product development team in Paris on an in-app point-of-sale solution. Used by millions of businesses around the world, you’ll use Kotlin daily to support a large-scale fintech product. You can learn more about the role here.

C#

A key language in the Microsoft tech stack, C# is used for building web apps, Windows desktop apps and in-game development. Consistently in demand at small organisations and enterprise-level businesses, the C# syntax will look really familiar to you if you’ve spent time with a classic language like Java, so it can be a good one to upskill into.

JavaScript

Thanks to its adaptability, JavaScript will continue to be one of the most in-demand programming languages out there. Used primarily for front-end web development (over 98% of all websites use it in some way), every tech device you interact with, from your laptop to your phone to your smart TV, makes use of it to create dynamic, interactive content.

If you’re looking for a new opportunity, ConnectingTheDots is looking for a backend JavaScript developer. In this role, you would work with a team in Zwolle creating landing pages for global campaigns, festivals, and major product launches. As well as extensive JavaScript experience, a role like this also calls for experience with e-commerce tools like Salesforce Commerce Cloud and proficiency with UX/UI software. For more information, head here.

For hundreds more career opportunities featuring a wide range of programming languages, start browsing The House of Talent Job Board today

7 most in-demand programming languages for 2024 Read More »

europe’s-battle-against-the-rising-media-power-of-silicon-valley

Europe’s battle against the rising media power of Silicon Valley

It’s a question as old as the tech industry itself: can Europe compete with Silicon Valley?

This reared up again in my mind for two main reasons. The first is the recent(-ish) shift of Big Tech into being media entities. And the second? That’s Spotify’s struggles as a European stalwart in this field.

Let’s consider the first point.

Over the past few years, we’ve seen Silicon Valley shift its strategy and start investing heavily in media. You only need to look at Apple’s launch of the Apple TV+ and Apple Music streaming services, or Amazon’s foray into movies and TV series. I mean, the latter was behind The Rings Of Power, the most expensive television show ever made.

There are, of course, a myriad of reasons why Big Tech is investing in media, but one of the biggest is using it as a tool to hook people into their ecosystems.

“In the case of Amazon, due to its various revenue channels and methods of connecting with customers, it has a greater understanding of its users and their preferences through data,” Stephen Hateley says. He’s the head of product and partner marketing at DigitalRoute, a business that helps streaming companies understand their customer data.

He tells me that because Amazon “is not primarily or solely a media company, it can combine its customer accounts and upsell to them via its ecommerce, TV, film and music streaming, consumer electronics, and grocery delivery channels.”

For example, the company is able to spend money on shows and encourage people to subscribe to Amazon Prime Video. This comes bundled with Amazon Prime itself, meaning users have an incentive to use the platform to shop on.

Apple takes a similar approach.

In recent years, the company has realised that it’s close to hitting the ceiling of how many devices it can sell. From this point, growth will be tougher. Knowing this, it has shifted focus to services, effectively aiming to upsell software to its existing customers — and it’s working.

Apple not only gives customers free trials of its streaming services with new hardware purchases, but also bundles them together in its Apple One package. And again, like Amazon, it spends big on shows in order to attract people to enter its services ecosystem — with Ted Lasso being a prime example of this working successfully.

“This provides it with more opportunities to monetise its customers as well as collect a great amount of data on their preferences,” Hateley says. 

Spotify’s struggles: An industry signpost

The thing is, all of the above isn’t particularly profitable — and especially not when it comes to the media side of things. In many ways, US tech companies are using streaming as a loss leader. They’re pumping billions into shows and movies with the aim of making money elsewhere, not through the media itself.

This is a huge problem to both media companies in general and European businesses in the same field. And guess who sits in both these categories? Yep, you guessed it: Spotify.

The Swedish company, which is broadly independent, is struggling to keep up with Big Tech. It pays its artists less than its biggest competitors, yet still hasn’t made a profit: 

Chart of Spotify's earnings
This graph from Carbon Finance shows that although Spotify has incredible growth, it’s consistently losing money.

This shows in its behaviour. For example, it made a huge bet on podcasts, investing over a billion dollars in an attempt to bring a wider range of users onto its platform. While this had the clear benefit of making it a podcast leader, the company struggled to turn it into profit, leading to layoffs and a paring back of the approach.

This pattern is being played out across the entire European media landscape. 

“US dominance can prove challenging for European companies attempting to claim their share of the market in any industry, and media is no different,” Hateley says, pointing towards how even organisations like the BBC are struggling in this environment.

This paints a picture of a sector being blasted away by Big Tech’s ability to spend and raises some important questions for the future of media.

Can European countries fight back? And do they need to?

“One way European media companies can compete with the big budgets of US firms is re-evaluating the type of content they’re putting out to audiences,” Marty Roberts tells me. He’s the SVP, Product Strategy & Marketing, at Brightcove, a streaming technology company.

Effectively, Roberts believes that US streaming giants create too many shows to market effectively. This is an opportunity for smaller entities to do “an amazing job at promoting a couple of new shows a month.”

Alongside this, he thinks that “[a] key strength for European media companies is hyper-localisation in niche markets.” He points towards either non-English language content, or getting particularly good at a specific genre, such as the success of Nordic detective dramas.

Jesse Shemen — the CEO of Papercup, a company that delivers AI dubbing for media companies — is similarly positive about prospects for European media.

“The current trend of bundling is opening up chances for unprecedented collaboration between European companies and US rivals,” he says. “We’re already seeing this in action, with Paramount Global’s partnerships with Sky and Canal+ just one recent example.”

This paints a rosier picture than I was expecting. The doom-and-gloom of European companies not competing doesn’t seem to trouble many experts, with them generally believing the businesses can thrive by not fighting US Big Tech, but instead working alongside it.

Yet is this unified, global approach a good thing?

One element that was brought up during my conversations was that the interconnected and worldwide focus of media now makes borders broadly irrelevant, meaning this focus on the success of European media specifically isn’t helpful.

“When it comes to investment capital, we live in a global village, where giant investors from the US, EU, UK, APAC, and anywhere can pour substantial capital into companies they believe in,” Maor Sadra says. He’s CEO and co-founder of INCRMNTAL, a data science platform.

This blurring of geographic lines, Sadra contends, is true of Spotify too. He points out that the company’s largest institutional investors include the UK’s Baillie Gifford, US-based Morgan Stanley, and Tencent Holidays, a Chinese company.

“The location of key management and employees in a connected world seems almost an irrelevant point of consideration in today’s age,” he tells me.

There’s no doubt that what Sadra and other experts say is true: we live in a global media environment and, for companies to survive, they need to accept that. Looking for outside investment or partnering with bigger organisations like Apple or Amazon is part-and-parcel of existing in this modern world.

This though doesn’t mean it’s not vital for Europe to maintain powerful media bodies.

You only need to look at how Hollywood and TV has benefitted America. It has expanded its cultural influence worldwide, becoming a form of soft power. Just consider, as one micro example, the global footprint of Halloween and Thanksgiving. For Europe to remain an attractive place, for it to carve out its own identity, it requires strong media.

Yes, it’s important to work together with these huge American organisations, yet European businesses in the same sector have to make their own mark too — and one way of achieving that is with tech.

Staying ahead of the wave

There was one theme that came up across many of my conversations on this topic of using tech to remain relevant: artificial intelligence.

“Localisation is one area where technology’s influence, especially generative AI, is being felt,” Shemen from Papercup tells me.

This is being trialled in a number of places already, with Spotify planning on cloning podcast hosts’ voices and then translating them into different languages. This trend will be hugely important for European media creators, especially if they’re making content in non-English. It almost goes without saying how much this could benefit smaller creators and media companies that fall into this category, as their potential reach can skyrocket.

Artificial intelligence will also be a vital part of the puzzle for European businesses when it comes to analysing data. If they can get access to forms of insights currently only available to gargantuan tech companies, they can alter their content to appeal and reach the masses, levelling the playing field.

The European route to success

If European media is going to survive Big Tech’s thrust into the space one thing’s for certain: it can’t stay stationary. Instead, the European industry needs to take advantage of its positive attributes and use them as best it can.

This should involve embracing its ability to create niche content, clever content partnerships, and investing in technologies that can help European content hit a wider audience.

Ultimately, the future of media streaming in Europe is one of balance. While there’s a lucrative future available by partnering with bigger organisations, it can’t risk losing itself in the process. Currently, there’s no real way European media bodies can compete with the bottomless wallets of Silicon Valley. What they can do though is ensure they stay relevant.

The secret to achieving this isn’t all that secret — being nimble and open minded.

Don’t act so shocked: age-old questions often have age-old answers, after all.

Europe’s battle against the rising media power of Silicon Valley Read More »

‘quantum-first’-microscope-could-solve-chip-inspection-roadblock

‘Quantum-first’ microscope could solve chip inspection roadblock

Oh, the wonderful and mind-twisting world of quantum mechanics. However, in order to harness the magic-like potential of bending qubits to one’s will, there is a whole lot of nitty gritty engineering that needs to occur. 

The quantum revolution will not happen unless an entire ecosystem comes together, each part reaching the highest potential of its own expertise. 

And plenty of that development is happening in the Netherlands. Just today, Dutch startup QuantaMap announced it had secured €1.4mn in funding for its quality assurance tech for the production of quantum computer chips.



Quantum chips are not like regular computer chips, on many different levels (let’s set operating principles and data processing aside for now). One of these is that when they do not work like they should, there is not really any way of finding out why, and what has failed. This is to a great extent because it is so difficult to measure properties of the quantum chips without disturbing the qubits in the process. 

QuantaMap, based in Leiden, the Netherlands, has developed what it calls a “quantum-first” microscope that will allow both quantum researchers and chip manufacturers to closely inspect every chip and improve quality. 

What sets its technology apart, the startup says, is a combination of cryogenic scanning technology with quantum sensors, both specifically designed for quantum applications. 

“We are convinced that our technology will be instrumental for making good on the promises of quantum computing, enabling the societal advances that quantum technology can deliver,” said QuantaMap co-founder Johannes Jobst.

QuantaMap was founded in November 2022 by Jobst, Kaveh Lahabi, Milan Allan, and Jimi de Haan. The funding round includes investment from QDNL Participations, a fund that will invest €15mn into early-stage Dutch quantum computing startups in the coming years. 

Ton van ‘t Noordende, the fund’s managing director, said that QuantaMap’s unique combination of cryogenic scanning-probe microscopy and custom quantum sensors would solve the crucial challenge of producing reliable quantum chips. 

Published

Back to top

‘Quantum-first’ microscope could solve chip inspection roadblock Read More »

how-to-measure-your-ipd-and-why-it’s-important-for-vr-&-ar-headsets

How to Measure Your IPD and Why It’s Important for VR & AR Headsets

IPD stands for interpupillary distance—which simply means the distance between the center of your eyes. It’s important to know your IPD when it comes to VR and AR headsets because headsets can be adjusted to match your IPD for optimal image quality and comfort. Knowing your IPD is important for understanding which headsets are most suitable for your eyes. Luckily you can easily and automatically measure your IPD if you have a recent iPhone or iPad Pro, or use one of several simple measurement methods.

EyeMeasure is a free iOS app which uses the TrueDepth camera on recent iPhone and iPad Pro models to measure your IPD. Developer Dotty Digital claims the measurement is accurate within 0.5mm. Once you use the app the “far” IPD measurement is the one you’ll use when configuring your headset.

You can use the app to measure your IPD with the following iOS devices:

iPad

  • iPad Pro 12.9-inch (4th generation)
  • iPad Pro 12.9-inch (3rd generation)
  • iPad Pro 11-inch (2nd generation)
  • iPad Pro 11-inch

If you don’t know which tablet you have, learn how to identify your iPad model.

Other Ways to Measure Your IPD

Image courtesy Will Folsom (CC BY 2.0)

If you don’t have access to one of the above devices for an automatic measurement, here’s other ways you can measure your IPD.

Ask Your Eye Doctor (most accurate)

The most accurate IPD measurement you’ll be able to get is from an eye-doctor. If you’ve been to one since you’ve reached your adult size, your doctor should have an accurate measurement on file; give them a call and ask if they can provide your IPD measurement in millimeters. If you’re younger than 20 and it’s been more than a year since you saw the eye-doctor, you may want to get a check-up to make sure you have an up-to-date measurement.

Online IPD Measure Tool (easiest)

You can measure your IPD with a browser-based tool like this one from Ace & Tate. This will work through your browser on your computer or smartphone. You’ll be asked to upload a photo of yourself holding any standard-sized magnetic strip card (ie: credit card or drivers license) which will be used to establish the correct scale for the measurement.

Use a Mirror (accurate but you need a ruler)

With a ruler and a mirror you can easily measure your IPD. Our friend Oliver Kreylos offers these simple instructions, along with a more detailed breakdown.

  1. Stand in front of a mirror and hold a ruler up to your nose, such that the measuring edge runs directly underneath both your pupils.
  2. Close your right eye and look directly at your left eye. Move the ruler such that the “0” mark appears directly underneath the center of your left pupil. Try to keep the ruler still for the next step.
  3. Close your left eye and look directly at your right eye. The mark directly underneath the center of your right pupil is your inter-pupillary distance.

Ask a Friend (but you need a ruler… and a friend)

Are you a vampire with no need for mirrors in your home? Ask a friend with a steady hand to hold a ruler directly under your eyes. Look straight forward at a distant object and ask your friend to align the “0” mark with the center of one pupil and then read the measurement under the center of your other pupil. That measurement is your IPD.

This is also an ideal way to measure the IPD of a VR novice to which you’re demoing VR.

Eyeball It (when you’re in a pinch)

This option may be the most error prone, but it’s probably better than nothing if you just need a quick and dirty alignment; it only works with headsets that have a physical IPD adjustment.

While inside the headset, close your non-dominant eye. With your dominant eye open, look at a sharp recognizable texture like text or the flat edge of an object. Begin adjusting the IPD setting back and forth to slowly find the position of maximum sharpness. This should get you in the ballpark of your ideal IPD setting. We would not recommend trying this exercise with both eyes open because it’s easier to misalign your IPD when using both eyes.

Thanks to Allan Hambrick who shared this method in the comments!

Why Correctly Setting Your IPD is Important in a VR or AR Headset

Image courtesy Dboybaker (CC BY 2.0)

Tricking our brains into believing we’re seeing another reality starts by feeding our eyes imagery which closely matches how we perceive the real world. That means making sure the images are correctly aligned with each eye, just like adjusting the width on a pair of binoculars.

Since we always see the real world from the perspective of or own IPD, correct alignment in a headset is important for matching our ingrained sense of 3D depth and scale. If the IPD of your headset is incorrectly set, the scale of the virtual world will appear to be slightly incorrect.

Even if a given headset doesn’t have a physical IPD adjustment, most headsets have a software IPD adjustment which can correct the sense of scale. In both cases you’ll need to know your own IPD measurement to set this properly.

Setting the correct IPD is also very important for maximizing image quality in VR and AR headsets.

Most headsets have lenses and displays which are designed to achieve maximum clarity and field of view when seen through the ‘optical center’ of the lens (this is also called the ‘sweet spot’). If the center of your eyes don’t align with the optical center of the lenses, you won’t get that maximum clarity and field of view; depending upon the lens, such misalignment can lead to a surprising reduction in visual quality.

Luckily, many headsets have physical IPD adjustments which allow you to change the distance between the lenses to align your eyes with the optical center of the lenses. All major headsets with physical IPD adjustments have digital readouts in millimeters that display inside the headset which you can use to match to your own IPD.

In summary, knowing your IPD and setting it correctly is important for achieving the best visual experience and comfort in any headset. And if your measured IPD is an outlier, you should make sure your headset of choice can accommodate your IPD; a headset with a physical IPD adjustment will support a much wider range of IPD measurements.

How to Measure Your IPD and Why It’s Important for VR & AR Headsets Read More »

steam-vr-fest-serves-up-deep-discounts-on-top-pc-vr-titles

Steam VR Fest Serves Up Deep Discounts on Top PC VR Titles

Steam VR Fest is in full swing, offering deep discounts on PC VR titles that may just give you another pretty valid reason to stay indoors this winter.

While you won’t be able to nab Half-Life: Alyx (2020) on the cheap this time around, there are a host of top games on sale to buy or gift to a friend for the holiday season.

That’s great news if you have a PC VR headset, but even greater news if you have a VR-ready PC and Meta Quest thanks to the new dedicated Steam Link App which makes playing Steam games on Quest even easier.

Valve highlighted some immersive games in the latest VR Fest hype video, although there are a ton more on sale to check out. Here’s some of the standout titles on sale from now until December 11th.

Title Sale Price Original Price Percent Off
Hitman 3 $27.99 $69.99 -60%
No Man’s Sky $29.99 $59.99 -50%
The Forest $4.99 $19.99 -75%
Skyrim VR $14.99 $59.99 -75%
Tetris Effect $19.99 $39.99 -50%
Slime Rancher $4.99 $19.99 -75%
Ghosts of Tabor $17.99 $19.99 -10%
The Light Brigade $17.49 $24.99 -30%
CarX Drift Racing $7.49 $14.99 -50%
Ancient Dungeon $14.99 $19.99 -25%
VTOL VR $20.99 $29.99 -30%
Into the Radius $17.99 $29.99 -40%
BONELAB $31.99 $39.99 -20%
Fallout 4 VR $14.99 $59.99 -75%
IL-2 Sturmovik: Battle of Stalingrad $9.99 $49.99 -80%
Keep Talking and Nobody Explodes $4.49 $14.99 -70%
Vox Machinae $14.99 $29.99 -50%
Payday 2 $4.99 $9.99 -50%
Vertigo 2 $25.49 $29.99 -15%
Elite Dangerous $7.49 $29.99 -75%
I Expect You to Die 3 $19.99 $24.99 -20%
BONEWORKS $23.99 $29.99 -20%
XPlane 12 $40.19 $59.99 -33%
Moss Book II $13.99 $19.99 -30%
Kayak VR: Mirage $16.09 $22.99 -30%
Walkabout Mini Golf VR $10.49 $14.99 -30%
Ragnarock $9.99 $24.99 -60%
Demeo $19.99 $39.99 -50%
Red Matter 2 $17.99 $29.99 -40%
Breachers $20.99 $29.99 -30%
Among Us VR $7.49 $9.99 -25%
Sniper Elite VR $8.99 $29.99 -70%
Star Trek Bridge Crew $9.99 $24.99 -60%
GORN $11.99 $19.99 -40%
Broken Edge $8.99 $14.99 -40%
Until You Fall $13.99 $24.99 -44%
The Last Clockwinder $14.99 $24.99 -40%

There are way more than that though, so check in at the Steam VR Fest site to see all of the games currently on sale.

Steam VR Fest Serves Up Deep Discounts on Top PC VR Titles Read More »

new-ai-tool-aims-to-democratise-high-res-image-generation

New AI tool aims to democratise high-res image generation

In the world of AI image generation, tools like DALL-E and Midjourney are holding the crown — and not simply because of their high-resolution performance. The training of these models requires such substantial investment and resources that it inevitably leads to centralised services and pay-per-use access.

A new AI tool developed by the University of Surrey aims to reverse this trend and democratise the technology, by opening up high-res image generation to a wider audience.

Dubbed DemoFusion, the model allows users to generate high-quality images without the need to subscribe to a service, or own a very powerful computer. In fact, the system only requires consumer-grade RTX 3090 GPU that can be found in any mid-range gaming PC or a Mac M1.

The AI is essentially a plug-and-play extension to the Stable Diffusion XL (SDXL) open-source model, which generates images at a resolution of 1024×1024. DemoFusion enables 4x, 16x, or even higher increase in resolution — with a few simple lines of code and without any additional training. The only trade-off according to the team is “a little more patience.” We tried it at TNW and it’s about six minutes.

SDXL vs DemoFusion AI image generator
Credit: University of Surrey
On the left side: the result by SDXL. On the right side, the result by DemoFusion. Credit: University of Surrey

To achieve these high-res results, the scientists first generated low-res images and then enhanced them using a process called progressive upscaling. This improves the SDXL’s detail and resolution by working across images in patches.

“For the first time, our unique technique lets users enhance their AI-generated images without the need for vast computing power, or any re-training of the model,” said Professor Yi-Zhe Song.

“Digital art and imagery is a powerful medium which everyone should have access to — not just a handful of wealthy corporations. That’s why we made DemoFusion publicly available. We believe it can enrich our lives, and everyone should be able to use it.”

The new technique is available online in the paper “DemoFusion: Democratising High-Resolution Image Generation with No $$$.”

Whether DemoFusion will gain enough traction to compete with giants like OpenAI’s DALL-E remains to be seen, but its creation is an important step to opening up AI’s image-generation potential to the public and the wider tech community.

Published

Back to top

New AI tool aims to democratise high-res image generation Read More »

gta-vi-trailer-leak-linked-to-rockstar-dev’s-son

GTA VI trailer leak linked to Rockstar dev’s son

Shady behaviour might be part of the Grand Theft Auto DNA, but leaking video game trailers on TikTok before launch is probably not what developers had in mind. Especially not when it can be traced back to a senior Rockstar developer’s son. 

The fact that fans will need to wait more than a year for the next instalment in the GTA saga (or, as one viewer close to the author expressed this morning, “2025 just means not 2024”) did not diminish the enthusiasm when Rockstar Games released the GTA VI trailer in the early hours of Tuesday CET.

Our trailer has leaked so please watch the real thing on YouTube: https://t.co/T0QOBDHwBe

— Rockstar Games (@RockstarGames) December 4, 2023

Vice City looks slicker than ever indeed. However, Rockstar released the trailer to the public some hours earlier than intended. The reason? The leaking of an off-cam clip of the footage to TikTok over the weekend and a subsequent leak of the trailer on X on Monday. Plot twist — the TikTok user in question has reportedly been identified as the son of a senior Rockstar North employee. 

Incriminating evidence?

Rockstar North, based in Edinburgh, Scotland, has been part of the Rockstar Games family since 1999 and is responsible for the development of the Grand Theft Auto series. The evidence that the seven-second TikTok leak came from a developer’s family member has been labelled by some social media users as “fairly convincing.”

Reportedly, it involves the TikTok user posing with the Rockstar employee and calling them “dad.” But as the TikTok (it is a noun, right?) has been deleted, this shall have to remain second-hand speculation on our part. Of course, it could all be a part of a deceitful ruse to deflect culpability, in keeping with the spirit of the game. 

The evidence to suggest the video has come from someone related to the employee in question is fairly convincing.

Again, if this is true it’s extremely disappointing that this has occurred so close to the official reveal.

— GTABase.com (@GTABase) December 2, 2023

In another noteworthy turn of events, the trailer revealed that GTA VI will feature the game’s first female protagonist (Bonnie and Clyde storylines FTW). Rockstar Games says it will be released on PS5 and Xbox Series X / S.

Other notable vide game leaks

Leaks to social media are not unusual in the gaming world. A prototype of Horizon Forbidden West was leaked to Twitter one week before its release. A Russian website published a version of the script to Mass Effect 3 before the game’s official release in March 2012 (although we cannot see the appeal of reading it — it would be like sneaking a peek at your Christmas presents before they are wrapped). 

However, it is unusual for leaks to come from such intimate sources, and so close to the official release. Whoever may prove to be behind the leaks, let’s hope the repercussions are more akin to being grounded than ending up in jail, like the last teenagers who messed with Rockstar and GTA.

Published

Back to top

GTA VI trailer leak linked to Rockstar dev’s son Read More »

mistral-ai-nears-$2b-valuation-—-less-than-12-months-after-founding

Mistral AI nears $2B valuation — less than 12 months after founding

European contributions might have been a little late to join the generative AI investment party, but that does not mean they will not end up rivalling some of the earlier North American frontrunners. According to people familiar with the matter, Mistral AI, the French genAI seed-funding sensation, is just about to conclude the raising of about €450mn from investors. 

Unlike Germany’s Aleph Alpha who just raised a similar sum, most investors come from beyond the confines of the continent. The round is led by Silicon Valley VC firm Andreessen Horowitz, and also includes backing from Nvidia and Salesforce. 

Sources close to the deal told Bloomberg that Andreessen Horowitz would invest €200mn in funding, whereas Nvidia and Salesforce would be down for €120mn in convertible debt, although this was still subject to change. If it goes through, this would value the Paris-based startup at nearly $2bn — less than a year after it was founded. 

Mistral AI was one of the few European AI companies to participate in the UK’s AI Safety Summit held at Bletchley Park last month. The generative AI startup released its first large language model (LLM), Mistral 7B, under the open source Apache 2.0 licence in September. 

Targeting dev space with smaller size LLMs

The key thing that sets Mistral apart is that it is specifically building smaller models that target the developer space. Speaking at the SLUSH conference in Helsinki last week, co-founder and CEO Arthur Mensch said this was exactly what separates the philosophy of the company from its competitors.

“You can start with a very big model with hundreds of billions of parameters — maybe it’s going to solve your task. But you could actually have something which is a hundred times smaller,” Mensch stated. “And when you make a production application that targets a lot of users, you want to make choices that lower the latency, lower the costs, and leverage the actual populated data that you may have. And this is something that I think is not the topic of our competitors — they’re really targeting multi-usage, very large models.”



Mensch, who previously worked for Google DeepMind, added that this approach would also allow for strong differentiation through proprietary data, a key factor for actors to survive in the mature application market space. 

Mistral AI and the reported investors have all declined to comment on the potential proceedings.

Published

Back to top

Mistral AI nears $2B valuation — less than 12 months after founding Read More »