RNA

chatgpt’s-success-could-have-come-sooner,-says-former-google-ai-researcher

ChatGPT’s success could have come sooner, says former Google AI researcher


A co-author of Attention Is All You Need reflects on ChatGPT’s surprise and Google’s conservatism.

Jakob Uszkoreit Credit: Jakob Uszkoreit / Getty Images

In 2017, eight machine-learning researchers at Google released a groundbreaking research paper called Attention Is All You Need, which introduced the Transformer AI architecture that underpins almost all of today’s high-profile generative AI models.

The Transformer has made a key component of the modern AI boom possible by translating (or transforming, if you will) input chunks of data called “tokens” into another desired form of output using a neural network. Variations of the Transformer architecture power language models like GPT-4o (and ChatGPT), audio synthesis models that run Google’s NotebookLM and OpenAI’s Advanced Voice Mode, video synthesis models like Sora, and image synthesis models like Midjourney.

At TED AI 2024 in October, one of those eight researchers, Jakob Uszkoreit, spoke with Ars Technica about the development of transformers, Google’s early work on large language models, and his new venture in biological computing.

In the interview, Uszkoreit revealed that while his team at Google had high hopes for the technology’s potential, they didn’t quite anticipate its pivotal role in products like ChatGPT.

The Ars interview: Jakob Uszkoreit

Ars Technica: What was your main contribution to the Attention is All You Need paper?

Jakob Uszkoreit (JU): It’s spelled out in the footnotes, but my main contribution was to propose that it would be possible to replace recurrence [from Recurrent Neural Networks] in the dominant sequence transduction models at the time with the attention mechanism, or more specifically self-attention. And that it could be more efficient and, as a result, also more effective.

Ars: Did you have any idea what would happen after your group published that paper? Did you foresee the industry it would create and the ramifications?

JU: First of all, I think it’s really important to keep in mind that when we did that, we were standing on the shoulders of giants. And it wasn’t just that one paper, really. It was a long series of works by some of us and many others that led to this. And so to look at it as if this one paper then kicked something off or created something—I think that is taking a view that we like as humans from a storytelling perspective, but that might not actually be that accurate of a representation.

My team at Google was pushing on attention models for years before that paper. It’s a lot longer of a slog with much, much more, and that’s just my group. Many others were working on this, too, but we had high hopes that it would push things forward from a technological perspective. Did we think that it would play a role in really enabling, or at least apparently, seemingly, flipping a switch when it comes to facilitating products like ChatGPT? I don’t think so. I mean, to be very clear in terms of LLMs and their capabilities, even around the time we published the paper, we saw phenomena that were pretty staggering.

We didn’t get those out into the world in part because of what really is maybe a notion of conservatism around products at Google at the time. But we also, even with those signs, weren’t that confident that stuff in and of itself would make that compelling of a product. But did we have high hopes? Yeah.

Ars: Since you knew there were large language models at Google, what did you think when ChatGPT broke out into a public success? “Damn, they got it, and we didn’t?”

JU: There was a notion of, well, “that could have happened.” I think it was less of a, “Oh dang, they got it first” or anything of the like. It was more of a “Whoa, that could have happened sooner.” Was I still amazed by just how quickly people got super creative using that stuff? Yes, that was just breathtaking.

Jakob Uskoreit presenting at TED AI 2024.

Jakob Uszkoreit presenting at TED AI 2024. Credit: Benj Edwards

Ars: You weren’t at Google at that point anymore, right?

JU: I wasn’t anymore. And in a certain sense, you could say the fact that Google wouldn’t be the place to do that factored into my departure. I left not because of what I didn’t like at Google as much as I left because of what I felt I absolutely had to do elsewhere, which is to start Inceptive.

But it was really motivated by just an enormous, not only opportunity, but a moral obligation in a sense, to do something that was better done outside in order to design better medicines and have very direct impact on people’s lives.

Ars: The funny thing with ChatGPT is that I was using GPT-3 before that. So when ChatGPT came out, it wasn’t that big of a deal to some people who were familiar with the tech.

JU: Yeah, exactly. If you’ve used those things before, you could see the progression and you could extrapolate. When OpenAI developed the earliest GPTs with Alec Radford and those folks, we would talk about those things despite the fact that we weren’t at the same companies. And I’m sure there was this kind of excitement, how well-received the actual ChatGPT product would be by how many people, how fast. That still, I think, is something that I don’t think anybody really anticipated.

Ars: I didn’t either when I covered it. It felt like, “Oh, this is a chatbot hack of GPT-3 that feeds its context in a loop.” And I didn’t think it was a breakthrough moment at the time, but it was fascinating.

JU: There are different flavors of breakthroughs. It wasn’t a technological breakthrough. It was a breakthrough in the realization that at that level of capability, the technology had such high utility.

That, and the realization that, because you always have to take into account how your users actually use the tool that you create, and you might not anticipate how creative they would be in their ability to make use of it, how broad those use cases are, and so forth.

That is something you can sometimes only learn by putting something out there, which is also why it is so important to remain experiment-happy and to remain failure-happy. Because most of the time, it’s not going to work. But some of the time it’s going to work—and very, very rarely it’s going to work like [ChatGPT did].

Ars: You’ve got to take a risk. And Google didn’t have an appetite for taking risks?

JU: Not at that time. But if you think about it, if you look back, it’s actually really interesting. Google Translate, which I worked on for many years, was actually similar. When we first launched Google Translate, the very first versions, it was a party joke at best. And we took it from that to being something that was a truly useful tool in not that long of a period. Over the course of those years, the stuff that it sometimes output was so embarrassingly bad at times, but Google did it anyway because it was the right thing to try. But that was around 2008, 2009, 2010.

Ars: Do you remember AltaVista’sBabel Fish?

JU: Oh yeah, of course.

Ars: When that came out, it blew my mind. My brother and I would do this thing where we would translate text back and forth between languages for fun because it would garble the text.

JU: It would get worse and worse and worse. Yeah.

Programming biological computers

After his time at Google, Uszkoreit co-founded Inceptive to apply deep learning to biochemistry. The company is developing what he calls “biological software,” where AI compilers translate specified behaviors into RNA sequences that can perform desired functions when introduced to biological systems.

Ars: What are you up to these days?

JU: In 2021 we co-founded Inceptive in order to use deep learning and high throughput biochemistry experimentation to design better medicines that truly can be programmed. We think of this as really just step one in the direction of something that we call biological software.

Biological software is a little bit like computer software in that you have some specification of the behavior that you want, and then you have a compiler that translates that into a piece of computer software that then runs on a computer exhibiting the functions or the functionality that you specify.

You specify a piece of a biological program and you compile that, but not with an engineered compiler, because life hasn’t been engineered like computers have been engineered. But with a learned AI compiler, you translate that or compile that into molecules that when inserted into biological systems, organisms, our cells exhibit those functions that you’ve programmed into.

A pharmacist holds a bottle containing Moderna’s bivalent COVID-19 vaccine. Credit: Getty | Mel Melcon

Ars: Is that anything like how the mRNA COVID vaccines work?

JU: A very, very simple example of that are the mRNA COVID vaccines where the program says, “Make this modified viral antigen” and then our cells make that protein. But you could imagine molecules that exhibit far more complex behaviors. And if you want to get a picture of how complex those behaviors could be, just remember that RNA viruses are just that. They’re just an RNA molecule that when entering an organism exhibits incredibly complex behavior such as distributing itself across an organism, distributing itself across the world, doing certain things only in a subset of your cells for a certain period of time, and so on and so forth.

And so you can imagine that if we managed to even just design molecules with a teeny tiny fraction of such functionality, of course with the goal not of making people sick, but of making them healthy, it would truly transform medicine.

Ars: How do you not accidentally create a monster RNA sequence that just wrecks everything?

JU: The amazing thing is that medicine for the longest time has existed in a certain sense outside of science. It wasn’t truly understood, and we still often don’t truly understand their actual mechanisms of action.

As a result, humanity had to develop all of these safeguards and clinical trials. And even before you enter the clinic, all of these empirical safeguards prevent us from accidentally doing [something dangerous]. Those systems have been in place for as long as modern medicine has existed. And so we’re going to keep using those systems, and of course with all the diligence necessary. We’ll start with very small systems, individual cells in future experimentation, and follow the same established protocols that medicine has had to follow all along in order to ensure that these molecules are safe.

Ars: Thank you for taking the time to do this.

JU: No, thank you.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

ChatGPT’s success could have come sooner, says former Google AI researcher Read More »

dna-based-bacterial-parasite-uses-completely-new-dna-editing-method

DNA-based bacterial parasite uses completely new DNA-editing method

Top row: individual steps in the reaction process. Bottom row: cartoon diagram of the top, showing the position of each DNA and RNA strand.

Enlarge / Top row: individual steps in the reaction process. Bottom row: cartoon diagram of the top, showing the position of each DNA and RNA strand.

Hiraizumi, et. al.

While CRISPR is probably the most prominent gene-editing technology, there are a variety of others, some developed before, others since. And people have been developing CRISPR variants to perform more specialized functions, like altering specific bases. In all of these cases, researchers are trying to balance a number of competing factors: convenience; flexibility; specificity and precision for the editing; low error rates; and so on.

So, having additional options for editing can be a good thing, enabling new ways of balancing those different needs. On Wednesday, a pair of papers in Nature describe a DNA-based parasite that moves itself around bacterial genomes through a mechanism that hasn’t been previously described. It’s nowhere near ready for use in humans, but it may have some distinctive features that make it worth further development.

Going mobile

Mobile genetic elements, commonly called transposons, are quite common in many species—they make up nearly half the sequences in the human genome, for example. They are indeed mobile, showing up in new locations throughout the genome, sometimes by cutting themselves out and hopping to new locations, other times by sending a copy out to a new place in the genome. For any of this to work, they need to have an enzyme that cuts DNA and specifically recognizes the right transposon sequence to insert into the cut.

The specificity of that interaction, needed to ensure the system only inserts new copies of itself, and the cutting of DNA, are features we’d like for gene editing, which places a value on better understanding these systems.

Bacterial genomes tend to have very few transposons—the extra DNA isn’t really in keeping with the bacterial reproduction approach of “copy all the DNA as quickly as possible when there’s food around.” Yet bacterial transposons do exist, and a team of scientists based in the US and Japan identified one with a rather unusual feature. As an intermediate step in moving to a new location, the two ends of the transposon (called IS110) are linked together to form a circular piece of DNA.

In its circular form, the DNA sequences at the junction act as a signal that tells the cell to make an RNA copy of nearby DNA (termed a “promoter”). When linear, each of the two bits of DNA on either side of the junction lacks the ability to act as a signal; it only works when the transposon is circular. And the researchers confirmed that there is in fact an RNA produced by the circular form, although the RNA does not encode for any proteins.

So, the research team looked at over 100 different relatives of IS110 and found that they could all produce similar non-protein-coding RNAs, all of which shared some key features. These included stretches where nearby sections of the RNA could base-pair with each other, leaving an unpaired loop of RNA in between. Two of these loops contained sequences that either base-paired with the transposon itself or at the sites in the E. coli genome where it inserted.

That suggests that the RNA produced by the circular form of the transposon helped to act as a guide, ensuring that the transposon’s DNA was specifically used and only inserted into precise locations in the genome.

Editing without precision

To confirm this was right, the researchers developed a system where the transposon would produce a fluorescent protein when it was properly inserted into the genome. They used this to show that mutations in the loop that recognized the transposon would stop it from being inserted into the genome—and that it was possible to direct it to new locations in the genome by changing the recognition sequences in the second loop.

To show this was potentially useful for gene editing, the researchers blocked the production of the transposon’s own RNA and fed it a replacement RNA that worked. So, you could potentially use this system to insert arbitrary DNA sequences into arbitrary locations in a genome. It could also be used with targeting RNAs that caused specific DNA sequences to be deleted. All of this is potentially very useful for gene editing.

Emphasis on “potentially.” The problem is that the targeting sequences in the loops are quite short, with the insertion site targeted by a recognition sequence that’s only four to seven bases long. At the short end of this range, you’d expect that a random string of bases would have an insertion site about once every 250 bases.

That relatively low specificity showed. At the high end, various experiments could see an insertion accuracy ranging from a close-to-being-useful 94 percent down to a positively threatening 50 percent. For deletion experiments, the low end of the range was a catastrophic 32 percent accuracy. So, while this has some features of an interesting gene-editing system, there’s a lot of work to do before it could fulfill that potential. It’s possible that these recognition loops could be made longer to add the sort of specificity that would be needed for editing vertebrate genomes, but we simply don’t know at this point.

DNA-based bacterial parasite uses completely new DNA-editing method Read More »

mutations-in-a-non-coding-gene-associated-with-intellectual-disability

Mutations in a non-coding gene associated with intellectual disability

Splice of life —

A gene that only makes an RNA is linked to neurodevelopmental problems.

Colored ribbons that represent the molecular structure of a large collection of proteins and RNAs.

Enlarge / The spliceosome is a large complex of proteins and RNAs.

Almost 1,500 genes have been implicated in intellectual disabilities; yet for most people with such disabilities, genetic causes remain unknown. Perhaps this is in part because geneticists have been focusing on the wrong stretches of DNA when they go searching. To rectify this, Ernest Turro—a biostatistician who focuses on genetics, genomics, and molecular diagnostics—used whole genome sequencing data from the 100,000 Genomes Project to search for areas associated with intellectual disabilities.

His lab found a genetic association that is the most common one yet to be associated with neurodevelopmental abnormality. And the gene they identified doesn’t even make a protein.

Trouble with the spliceosome

Most genes include instructions for how to make proteins. That’s true. And yet human genes are not arranged linearly—or rather, they are arranged linearly, but not contiguously. A gene containing the instructions for which amino acids to string together to make a particular protein—hemoglobin, insulin, serotonin, albumin, estrogen, whatever protein you like—is modular. It contains part of the amino acid sequence, then it has a chunk of DNA that is largely irrelevant to that sequence, then a bit more of the protein’s sequence, then another chunk of random DNA, back and forth until the end of the protein. It’s as if each of these prose paragraphs were separated by a string of unrelated letters (but not a meaningful paragraph from a different article).

In order to read this piece through coherently, you’d have to take out the letters interspersed between its paragraphs. And that’s exactly what happens with genes. In order to read the gene through coherently, the cell has machinery that splices out the intervening sequences and links up the protein-making instructions into a continuous whole. (This doesn’t happen in the DNA itself; it happens to an RNA copy of the gene.) The cell’s machinery is obviously called the spliceosome.

There are about a hundred proteins that comprise the spliceosome. But the gene just found to be so strongly associated with neurodevelopmental disorders doesn’t encode any of them. Rather, it encodes one of five RNA molecules that are also part of the spliceosome complex and interact with the RNAs that are being spliced. Mutations in this gene were found to be associated with a syndrome with symptoms that include intellectual disability, seizures, short stature, neurodevelopmental delay, drooling, motor delay, hypotonia (low muscle tone), and microcephaly (having a small head).

Supporting data

The researchers buttressed their finding by examining three other databases; in all of them, they found more people with the syndrome who had mutations in this same gene. The mutations occur in a remarkably conserved region of the genome, suggesting that it is very important. Most of the mutations were new in the affected people—i.e. not inherited from their parents—but there was one case of one particular mutation in the gene that was inherited. Based on this, the researchers concluded that this particular variant may cause a less severe disorder than the other mutations.

Many studies that look for genes associated with diseases have focused on searching catalogs of protein coding genes. These results suggest that we could have been missing important mutations because of this focus.

Nature Medicine, 2024. DOI: 10.1038/s41591-024-03085-5

Mutations in a non-coding gene associated with intellectual disability Read More »