Google went ahead with plans to launch Gemini for Workspace today. The big news is the pricing information, and you can see the Workspace pricing page is new, with every plan offering a “Gemini add-on.” Google’s old AI-for-Business plan, “Duet AI for Google Workspace,” is dead, though it never really launched anyway.
Google has a blog post explaining the changes. Google Workspace starts at $6 per user per month for the “Starter” package, and the AI “Add-on,” as Google is calling it, is an extra $20 monthly cost per user (all of these prices require an annual commitment). That is a massive price increase over the normal Workspace bill, but AI processing is expensive. Google says this business package will get you “Help me write in Docs and Gmail, Enhanced Smart Fill in Sheets and image generation in Slides.” It also includes the “1.0 Ultra” model for the Gemini chatbot—there’s a full feature list here. This $20 plan is subject to a usage limit for Gemini AI features of “1,000 times per month.”
Gemini for Google Workspace represents a total rebrand of the AI business product and some amount of consistency across Google’s hard-to-follow, constantly changing AI branding. Duet AI never really launched to the general public. The product, announced in August, only ever had a “Try” link that led to a survey, and after filling it out, Google would presumably contact some businesses and allow them to pay for Duet AI. Gemini Business now has a checkout page, and any Workspace business customer can buy the product today with just a few clicks.
Google’s second plan is “Gemini Enterprise,” which doesn’t come with any usage limits, but it’s also only available through a “contact us” link and not a normal checkout procedure. Enterprise is $30 per user per month, and it “includes additional capabilities for AI-powered meetings, where Gemini can translate closed captions in more than 100 language pairs, and soon even take meeting notes.”
On Wednesday, Google announced a new family of AI language models called Gemma, which are free, open-weights models built on technology similar to the more powerful but closed Gemini models. Unlike Gemini, Gemma models can run locally on a desktop or laptop computer. It’s Google’s first significant open large language model (LLM) release since OpenAI’s ChatGPT started a frenzy for AI chatbots in 2022.
Gemma models come in two sizes: Gemma 2B (2 billion parameters) and Gemma 7B (7 billion parameters), each available in pre-trained and instruction-tuned variants. In AI, parameters are values in a neural network that determine AI model behavior, and weights are a subset of these parameters stored in a file.
Developed by Google DeepMind and other Google AI teams, Gemma pulls from techniques learned during the development of Gemini, which is the family name for Google’s most capable (public-facing) commercial LLMs, including the ones that power its Gemini AI assistant. Google says the name comes from the Latin gemma, which means “precious stone.”
While Gemma is Google’s first major open LLM since the launch of ChatGPT (it has released smaller research models such as FLAN-T5 in the past), it’s not Google’s first contribution to open AI research. The company cites the development of the Transformer architecture, as well as releases like TensorFlow, BERT, T5, and JAX as key contributions, and it would not be controversial to say that those have been important to the field.
Owing to lesser capability and high confabulation rates, smaller open-weights LLMs have been more like tech demos until recently, as some larger ones have begun to match GPT-3.5 performance levels. Still, experts see source-available and open-weights AI models as essential steps in ensuring transparency and privacy in chatbots. Google Gemma is not “open source” however, since that term usually refers to a specific type of software license with few restrictions attached.
In reality, Gemma feels like a conspicuous play to match Meta, which has made a big deal out of releasing open-weights models (such as LLaMA and Llama 2) since February of last year. That technique stands in opposition to AI models like OpenAI’s GPT-4 Turbo, which is only available through the ChatGPT application and a cloud API and cannot be run locally. A Reuters report on Gemma focuses on the Meta angle and surmises that Google hopes to attract more developers to its Vertex AI cloud platform.
We have not used Gemma yet; however, Google claims the 7B model outperforms Meta’s Llama 2 7B and 13B models on several benchmarks for math, Python code generation, general knowledge, and commonsense reasoning tasks. It’s available today through Kaggle, a machine-learning community platform, and Hugging Face.
In other news, Google paired the Gemma release with a “Responsible Generative AI Toolkit,” which Google hopes will offer guidance and tools for developing what the company calls “safe and responsible” AI applications.
One of Google’s most lucrative businesses consists of packaging its free consumer apps with a few custom features and extra security and then selling them to companies. That’s usually called “Google Workspace,” and today it offers email, calendar, docs, storage, and video chat. Soon, it sounds like Google is gearing up to offer an AI chatbot for businesses. Google’s latest chatbot is called “Gemini” (it used to be “Bard”), and the latest early patch notes spotted by Dylan Roussei of 9to5Google and TestingCatalog.eth show descriptions for new “Gemini Business” and “Gemini Enterprise” products.
The patch notes say that Workspace customers will get “enterprise-grade data protections” and Gemini settings in the Google Workspace Admin console and that Workspace users can “use Gemini confidently at work” while “trusting that your conversations aren’t used to train Gemini models.”
These “early patch notes” for Bard/Gemini have been a thing for a while now. Apparently, some people have ways of making the site spit out early patch notes, and in this case, they were independently confirmed by two different people. I’m not sure the date (scheduled for February 21) is trustworthy, though.
Normally, you would expect a Google app to be included in the “Business Standard” version of Workspace, which is $12 per user per month, but it sounds like Gemini won’t be included. Google describes the products as “new Gemini Business and Gemini Enterprise plans” [emphasis ours] and implores existing paying Google Workspace users to “upgrade today to Gemini Business or Gemini Enterprise.” Roussei says the “upgrade today” link goes to the Duet AI Workspace page, Google’s first attempt at “AI for business,” which hasn’t been updated with any new plans just yet.
It’s unclear how much of the Duet AI business plan is surviving the Gemini rollout. Duet was announced in August 2023 as a few “help me write” buttons in Gmail, Docs, and other Workspace apps, which would all open chatbots that can control the various apps. Duet AI was supposed to have an “initial offering” price of an additional $30 per user per month, but it has been six months now, and Duet AI still isn’t generally available to businesses. The “try Duet AI” link goes to a “request a trial” contact form. Six months is an eternity in Google’s rapidly evolving AI plans; it’s a good bet Duet is replaced by all this Gemini stuff. Will it still be an extra $30, or did everyone scoff at that price?
If this $30-extra-for-AI plan ever ships, that would mean a typical AI-infused Workspace account would be a total of $45 per user per month. That sounds like a lot, but generative AI products currently take a huge amount of processing, which means they cost a lot. Right now, everyone is in land-grab mode, trying to get as many users as possible, but generally, the big players are all losing money. Nvidia’s market-leading AI cards can cost around $10,000 to $40,000 for a single card, and that’s not even counting the ongoing electricity costs.
On Monday, Will Smith posted a video on his official Instagram feed that parodied an AI-generated video of the actor eating spaghetti that went viral last year. With the recent announcement of OpenAI’s Sora video synthesis model, many people have noted the dramatic jump in AI-video quality over the past year compared to the infamous spaghetti video. Smith’s new video plays on that comparison by showing the actual actor eating spaghetti in a comical fashion and claiming that it is AI-generated.
Captioned “This is getting out of hand!”, the Instagram video uses a split screen layout to show the original AI-generated spaghetti video created by a Reddit user named “chaindrop” in March 2023 on the top, labeled with the subtitle “AI Video 1 year ago.” Below that, in a box titled “AI Video Now,” the real Smith shows 11 video segments of himself actually eating spaghetti by slurping it up while shaking his head, pouring it into his mouth with his fingers, and even nibbling on a friend’s hair. 2006’s Snap Yo Fingers by Lil Jon plays in the background.
In the Instagram comments section, some people expressed confusion about the new (non-AI) video, saying, “I’m still in doubt if second video was also made by AI or not.” In a reply, someone else wrote, “Boomers are gonna loose [sic] this one. Second one is clearly him making a joke but I wouldn’t doubt it in a couple months time it will get like that.”
We have not yet seen a model with the capability of Sora attempt to create a new Will-Smith-eating-spaghetti AI video, but the result would likely be far better than what we saw last year, even if it contained obvious glitches. Given how things are progressing, we wouldn’t be surprised if by 2025, video synthesis AI models can replicate the parody video created by Smith himself.
It’s worth noting for history’s sake that despite the comparison, the video of Will Smith eating spaghetti did not represent the state of the art in text-to-video synthesis at the time of its creation in March 2023 (that title would likely apply to Runway’s Gen-2, which was then in closed testing). However, the spaghetti video was reasonably advanced for open weights models at the time, having used the ModelScope AI model. More capable video synthesis models had already been released at that time, but due to the humorous cultural reference, it’s arguably more fun to compare today’s AI video synthesis to Will Smith grotesquely eating spaghetti than to teddy bears washing dishes.
On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.
Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.
After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.
In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades’ worth of human-generated content.
If the reported $60 million/year deal goes through, it’s quite possible that if you’ve ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.
While we don’t know if OpenAI is the company that signed the deal with Reddit, Bloomberg speculates that Reddit’s ability to tap into AI hype for additional revenue may boost the value of its IPO, which might be worth $5 billion. Despite drama last year, Bloomberg states that Reddit pulled in more than $800 million in revenue in 2023, growing about 20 percent over its 2022 numbers.
Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.
On Thursday, designer Matt Webb unveiled a new iPhone app called Galactic Compass, which always points to the center of the Milky Way galaxy—no matter where Earth is positioned on our journey through the stars. The app is free and available now on the App Store.
While using Galactic Compass, you set your iPhone on a level surface, and a big green arrow on the screen points the way to the Galactic Center, which is the rotational core of the spiral galaxy all of us live in. In that center is a supermassive black hole known as Sagittarius A*, a celestial body from which no matter or light can escape. (So, in a way, the app is telling us what we should avoid.)
But truthfully, the location of the galactic core at any given time isn’t exactly useful, practical knowledge—at least for people who aren’t James Tiberius Kirk in Star Trek V. But it may inspire a sense of awe about our place in the cosmos.
“It is astoundingly grounding to always have a feeling of the direction of the center of the galaxy,” Webb told Ars Technica. “Your perspective flips. To begin with, it feels arbitrary. The middle of the Milky Way seems to fly all over the sky, as the Earth turns and moves in its orbit.”
Webb’s journey to creating Galactic Compass began a decade ago as an offshoot of his love for casual astronomy. “About 10 years ago, I taught myself how to point to the center of the galaxy,” Webb said. “I lived in an apartment where I had a great view of the stars, so I was using augmented reality apps to identify them, and I gradually learned my way around the sky.”
While Webb initially used an astronomy app to help locate the Galactic Center, he eventually taught himself how to always find it. He described visualizing himself on the surface of the Earth as it spins and tilts, understanding the ecliptic as a line across the sky and recognizing the center of the galaxy as an invisible point moving predictably through the constellation Sagittarius, which lies on the ecliptic line. By visualizing Earth’s orbit over the year and determining his orientation in space, he was able to point in the right direction, refining his ability through daily practice and comparison with an augmented reality app.
With a little help from AI
In 2021, Webb imagined turning his ability into an app that would help take everyone on the same journey, showing a compass that points toward the galactic center instead of Earth’s magnetic north. “But I can’t write apps,” he said. “I’m a decent enough engineer, and an amateur designer, but I’ve never figured out native apps.”
That’s where ChatGPT comes in, transforming Webb’s vision into reality. With the AI assistant as his coding partner, Webb progressed step by step, crafting a simple app interface and integrating complex calculations for locating the galactic center (which involves calculating the user’s azimuth and altitude).
Still, coding with ChatGPT has its limitations. “ChatGPT is super smart, but it’s not embodied like a human, so it falls down on doing the 3D calculations,” he says. “I had to learn a lot about quaternions, which are a technique for combining 3D rotations, and even then, it’s not perfect. The app needs to be held flat to work simply because my math breaks down when the phone is upright! I’ll fix this in future versions,” Webb said.
Webb is no stranger to ChatGPT-powered creations that are more fun than practical. Last month, he launched a Kickstarter for an AI-rhyming poetry clock called the Poem/1. With his design studio, Acts Not Facts, Webb says he uses “whimsy and play to discover the possibilities in new technology.”
Whimsical or not, Webb insists that Galactic Compass can help us ponder our place in the vast universe, and he’s proud that it recently peaked at #87 in the Travel chart for the US App Store. In this case, though, it’s spaceship Earth that is traveling the galaxy while every living human comes along for the ride.
“Once you can follow it, you start to see the galactic center as the true fixed point, and we’re the ones whizzing and spinning. There it remains, the supermassive black hole at the center of our galaxy, Sagittarius A*, steady as a rock, eternal. We go about our days; it’s always there.”
Lax content moderation on X (aka Twitter) has disrupted coordinated efforts between social media companies and law enforcement to tamp down on “propaganda accounts controlled by foreign entities aiming to influence US politics,” The Washington Post reported.
Now propaganda is “flourishing” on X, The Post said, while other social media companies are stuck in endless cycles, watching some of the propaganda that they block proliferate on X, then inevitably spread back to their platforms.
Meta, Google, and then-Twitter began coordinating takedown efforts with law enforcement and disinformation researchers after Russian-backed influence campaigns manipulated their platforms in hopes of swaying the 2016 US presidential election.
The next year, all three companies promised Congress to work tirelessly to stop Russian-backed propaganda from spreading on their platforms. The companies created explicit election misinformation policies and began meeting biweekly to compare notes on propaganda networks each platform uncovered, according to The Post’s interviews with anonymous sources who participated in these meetings.
Sources told The Post that the last X meeting attendee was Irish intelligence expert Aaron Rodericks—who was allegedly disciplined for liking an X post calling Musk “a dipshit.” Rodericks was subsequently laid off when Musk dismissed the entire election integrity team last September, and after that, X apparently ditched the biweekly meeting entirely and “just kind of disappeared,” a source told The Post.
In 2023, for example, Meta flagged 150 “artificial influence accounts” identified on its platform, of which “136 were still present on X as of Thursday evening,” according to The Post’s analysis. X’s seeming oversight extends to all but eight of the 123 “deceptive China-based campaigns” connected to accounts that Meta flagged last May, August, and December, The Post reported.
The Post’s report also provided an exclusive analysis from the Stanford Internet Observatory (SIO), which found that 86 propaganda accounts that Meta flagged last November “are still active on X.”
The majority of these accounts—81—were China-based accounts posing as Americans, SIO reported. These accounts frequently ripped photos from Americans’ LinkedIn profiles, then changed the real Americans’ names while posting about both China and US politics, as well as people often trending on X, such as Musk and Joe Biden.
Meta has warned that China-based influence campaigns are “multiplying,” The Post noted, while X’s standards remain seemingly too relaxed. Even accounts linked to criminal investigations remain active on X. One “account that is accused of being run by the Chinese Ministry of Public Security,” The Post reported, remains on X despite its posts being cited by US prosecutors in a criminal complaint.
Prosecutors connected that account to “dozens” of X accounts attempting to “shape public perceptions” about the Chinese Communist Party, the Chinese government, and other world leaders. The accounts also comment on hot-button topics like the fentanyl problem or police brutality, seemingly to convey “a sense of dismay over the state of America without any clear partisan bent,” Elise Thomas, an analyst for a London nonprofit called the Institute for Strategic Dialogue, told The Post.
Some X accounts flagged by The Post had more than 1 million followers. Five have paid X for verification, suggesting that their disinformation campaigns—targeting hashtags to confound discourse on US politics—are seemingly being boosted by X.
SIO technical research manager Renée DiResta criticized X’s decision to stop coordinating with other platforms.
“The presence of these accounts reinforces the fact that state actors continue to try to influence US politics by masquerading as media and fellow Americans,” DiResta told The Post. “Ahead of the 2022 midterms, researchers and platform integrity teams were collaborating to disrupt foreign influence efforts. That collaboration seems to have ground to a halt, Twitter does not seem to be addressing even networks identified by its peers, and that’s not great.”
Musk shut down X’s election integrity team because he claimed that the team was actually “undermining” election integrity. But analysts are bracing for floods of misinformation to sway 2024 elections, as some major platforms have removed election misinformation policies just as rapid advances in AI technologies have made misinformation spread via text, images, audio, and video harder for the average person to detect.
It seems apparent that propaganda accounts from foreign entities on X will use every tool available to get eyes on their content, perhaps expecting Musk’s platform to be the slowest to police them. According to The Post, some of the X accounts spreading propaganda are using what appears to be AI-generated images of Biden and Donald Trump to garner tens of thousands of views on posts.
It’s possible that X will start tightening up on content moderation as elections draw closer. Yesterday, X joined Amazon, Google, Meta, OpenAI, TikTok, and other Big Tech companies in signing an agreement to fight “deceptive use of AI” during 2024 elections. Among the top goals identified in the “AI Elections accord” are identifying where propaganda originates, detecting how propaganda spreads across platforms, and “undertaking collective efforts to evaluate and learn from the experiences and outcomes of dealing” with propaganda.
On Thursday, OpenAI announced Sora, a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it’s only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It’s also freaking people out.
“It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them,” wrote Wall Street Journal tech reporter Joanna Stern on X.
“This could be the ‘holy shit’ moment of AI,” wrote Tom Warren of The Verge.
“Every single one of these videos is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee.
For future reference—since this type of panic will some day appear ridiculous—there’s a generation of people who grew up believing that photorealistic video must be created by cameras. When video was faked (say, for Hollywood films), it took a lot of time, money, and effort to do so, and the results weren’t perfect. That gave people a baseline level of comfort that what they were seeing remotely was likely to be true, or at least representative of some kind of underlying truth. Even when the kid jumped over the lava, there was at least a kid and a room.
The prompt that generated the video above: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.“
Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false. How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions back in 2020, when all of the tech we’re seeing now seemed like a distant fantasy to most people.
In that piece, I called the moment that truth and fiction in media become indistinguishable the “cultural singularity.” It appears that OpenAI is on track to bring that prediction to pass a bit sooner than we expected.
Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.
OpenAI has found that, like other AI models that use the transformer architecture, Sora scales with available compute. Given far more powerful computers behind the scenes, AI video fidelity could improve considerably over time. In other words, this is the “worst” AI-generated video is ever going to look. There’s no synchronized sound yet, but that might be solved in future models.
How (we think) they pulled it off
AI video synthesis has progressed by leaps and bounds over the past two years. We first covered text-to-video models in September 2022 with Meta’s Make-A-Video. A month later, Google showed off Imagen Video. And just 11 months ago, an AI-generated version of Will Smith eating spaghetti went viral. In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments. In earlier video-generation models, people pop in and out of reality with ease, limbs flow together like pasta, and physics doesn’t seem to matter.
Sora (which means “sky” in Japanese) appears to be something altogether different. It’s high-resolution (1920×1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity. So, how did OpenAI pull it off?
OpenAI doesn’t usually share insider technical details with the press, so we’re left to speculate based on theories from experts and information given to the public.
OpenAI says that Sora is a diffusion model, much like DALL-E 3 and Stable Diffusion. It generates a video by starting off with noise and “gradually transforms it by removing the noise over many steps,” the company explains. It “recognizes” objects and concepts listed in the written prompt and pulls them out of the noise, so to speak, until a coherent series of video frames emerge.
Sora is capable of generating videos all at once from a text prompt, extending existing videos, or generating videos from still images. It achieves temporal consistency by giving the model “foresight” of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.
OpenAI represents video as collections of smaller groups of data called “patches,” which the company says are similar to tokens (fragments of a word) in GPT-4. “By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios,” the company writes.
An important tool in OpenAI’s bag of tricks is that its use of AI models is compounding. Earlier models are helping to create more complex ones. Sora follows prompts well because, like DALL-E 3, it utilizes synthetic captions that describe scenes in the training data generated by another AI model like GPT-4V. And the company is not stopping here. “Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI writes, “a capability we believe will be an important milestone for achieving AGI.”
One question on many people’s minds is what data OpenAI used to train Sora. OpenAI has not revealed its dataset, but based on what people are seeing in the results, it’s possible OpenAI is using synthetic video data generated in a video game engine in addition to sources of real video (say, scraped from YouTube or licensed from stock video libraries). Nvidia’s Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, “I won’t be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!” Until confirmed by OpenAI, however, that’s just speculation.
Appall and scorn ripped through scientists’ social media networks Thursday as several egregiously bad AI-generated figures circulated from a peer-reviewed article recently published in a reputable journal. Those figures—which the authors acknowledge in the article’s text were made by Midjourney—are all uninterpretable. They contain gibberish text and, most strikingly, one includes an image of a rat with grotesquely large and bizarre genitals, as well as a text label of “dck.”
On Thursday, the publisher of the review article, Frontiers, posted an “expression of concern,” noting that it is aware of concerns regarding the published piece. “An investigation is currently being conducted and this notice will be updated accordingly after the investigation concludes,” the publisher wrote.
The article in question is titled “Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway,” which was authored by three researchers in China, including the corresponding author Dingjun Hao of Xi’an Honghui Hospital. It was published online Tuesday in the journal Frontiers in Cell and Developmental Biology.
Frontiers did not immediately respond to Ars’ request for comment, but we will update this post with any response.
The first figure in the paper, the one containing the rat, drew immediate attention as scientists began widely sharing it and commenting on it on social media platforms, including Bluesky and the platform formerly known as Twitter. From a distance, the anatomical image is clearly all sorts of wrong. But, looking closer only reveals more flaws, including the labels “dissilced,” Stemm cells,” “iollotte sserotgomar,” and “dck.” Many researchers expressed surprise and dismay that such a blatantly bad AI-generated image could pass through the peer-review system and whatever internal processing is in place at the journal.
But the rat’s package is far from the only problem. Figure 2 is less graphic but equally mangled. While it’s intended to be a diagram of a complex signaling pathway, it instead is a jumbled mess. One scientific integrity expert questioned whether it provide an overly complicated explanation of “how to make a donut with colorful sprinkles.” Like the first image, the diagram is rife with nonsense text and baffling images. Figure 3 is no better, offering a collage of small circular images that are densely annotated with gibberish. The image is supposed to provide visual representations of how the signaling pathway from Figure 2 regulates the biological properties of spermatogonial stem cells.
Some scientists online questioned whether the text was also AI-generated. One user noted that AI detection software determined that it was likely to be AI-generated; however, as Ars has reported previously, such software is unreliable.
The images, while egregious examples, highlight a growing problem in scientific publishing. A scientist’s success relies heavily on their publication record, with a large volume of publications, frequent publishing, and articles appearing in top-tier journals, all of which earn scientists more prestige. The system incentivizes less-than-scrupulous researchers to push through low-quality articles, which, in the era of AI chatbots, could potentially be generated with the help of AI. Researchers worry that the growing use of AI will make published research less trustworthy. As such, research journals have recently set new authorship guidelines for AI-generated text to try to address the problem. But for now, as the Frontiers article shows, there are clearly some gaps.
One week after its last major AI announcement, Google appears to have upstaged itself. Last Thursday, Google launched Gemini Ultra 1.0, which supposedly represented the best AI language model Google could muster—available as part of the renamed “Gemini” AI assistant (formerly Bard). Today, Google announced Gemini Pro 1.5, which it says “achieves comparable quality to 1.0 Ultra, while using less compute.”
Congratulations, Google, you’ve done it. You’ve undercut your own premiere AI product. While Ultra 1.0 is possibly still better than Pro 1.5 (what even are we saying here), Ultra was presented as a key selling point of its “Gemini Advanced” tier of its Google One subscription service. And now it’s looking a lot less advanced than seven days ago. All this is on top of the confusing name-shuffling Google has been doing recently. (Just to be clear—although it’s not really clarifying at all—the free version of Bard/Gemini currently uses the Pro 1.0 model. Got it?)
Google claims that Gemini 1.5 represents a new generation of LLMs that “delivers a breakthrough in long-context understanding,” and that it can process up to 1 million tokens, “achieving the longest context window of any large-scale foundation model yet.” Tokens are fragments of a word. The first part of the claim about “understanding” is contentious and subjective, but the second part is probably correct. OpenAI’s GPT-4 Turbo can reportedly handle 128,000 tokens in some circumstances, and 1 million is quite a bit more—about 700,000 words. A larger context window allows for processing longer documents and having longer conversations. (The Gemini 1.0 model family handles 32,000 tokens max.)
But any technical breakthroughs are almost beside the point. What should we make of a company that just trumpeted to the world about its AI supremacy last week, only to partially supersede that a week later? Is it a testament to the rapid rate of AI technical progress in Google’s labs, a sign that red tape was holding back Ultra 1.0 for too long, or merely a sign of poor coordination between research and marketing? We honestly don’t know.
So back to Gemini 1.5. What is it, really, and how will it be available? Google implies that like 1.0 (which had Nano, Pro, and Ultra flavors), it will be available in multiple sizes. Right now, Pro 1.5 is the only model Google is unveiling. Google says that 1.5 uses a new mixture-of-experts (MoE) architecture, which means the system selectively activates different “experts” or specialized sub-models within a larger neural network for specific tasks based on the input data.
Google says that Gemini 1.5 can perform “complex reasoning about vast amounts of information,” and gives an example of analyzing a 402-page transcript of Apollo 11’s mission to the Moon. It’s impressive to process documents that large, but the model, like every large language model, is highly likely to confabulate interpretations across large contexts. We wouldn’t trust it to soundly analyze 1 million tokens without mistakes, so that’s putting a lot of faith into poorly understood LLM hands.
For those interested in diving into technical details, Google has released a technical report on Gemini 1.5 that appears to show Gemini performing favorably versus GPT-4 Turbo on various tasks, but it’s also important to note that the selection and interpretation of those benchmarks can be subjective. The report does give some numbers on how much better 1.5 is compared to 1.0, saying it’s 28.9 percent better than 1.0 Pro at “Math, Science & Reasoning” and 5.2 percent better at those subjects than 1.0 Ultra.
But for now, we’re still kind of shocked that Google would launch this particular model at this particular moment in time. Is it trying to get ahead of something that it knows might be just around the corner, like OpenAI’s unreleased GPT-5, for instance? We’ll keep digging and let you know what we find.
Google says that a limited preview of 1.5 Pro is available now for developers via AI Studio and Vertex AI with a 128,000 token context window, scaling up to 1 million tokens later. Gemini 1.5 apparently has not come to the Gemini chatbot (formerly Bard) yet.
On Tuesday, the United States Patent and Trademark Office (USPTO) published guidance on inventorship for AI-assisted inventions, clarifying that while AI systems can play a role in the creative process, only natural persons (human beings) who make significant contributions to the conception of an invention can be named as inventors. It also rules out using AI models to churn out patent ideas without significant human input.
The USPTO says this position is supported by “the statutes, court decisions, and numerous policy considerations,” including the Executive Order on AI issued by President Biden. We’ve previously covered attempts, which have been repeatedly rejected by US courts, by Dr. Stephen Thaler to have an AI program called “DABUS” named as the inventor on a US patent (a process begun in 2019).
Even though an AI model itself cannot be named an inventor or joint inventor on a patent, using AI assistance to create an invention does not necessarily disqualify a human from holding a patent, as the USPTO explains:
“While AI systems and other non-natural persons cannot be listed as inventors on patent applications or patents, the use of an AI system by a natural person(s) does not preclude a natural person(s) from qualifying as an inventor (or joint inventors) if the natural person(s) significantly contributed to the claimed invention.”
However, the USPTO says that significant human input is required for an invention to be patentable: “Maintaining ‘intellectual domination’ over an AI system does not, on its own, make a person an inventor of any inventions created through the use of the AI system.” So a person simply overseeing an AI system isn’t suddenly an inventor. The person must make a significant contribution to the conception of the invention.
If someone does use an AI model to help create patents, the guidance describes how the application process would work. First, patent applications for AI-assisted inventions must name “the natural person(s) who significantly contributed to the invention as the inventor,” and additionally, applications must not list “any entity that is not a natural person as an inventor or joint inventor, even if an AI system may have been instrumental in the creation of the claimed invention.”
Reading between the lines, it seems the contributions made by AI systems are akin to contributions made by other tools that assist in the invention process. The document does not explicitly say that the use of AI is required to be disclosed during the application process.
Even with the published guidance, the USPTO is seeking public comment on the newly released guidelines and issues related to AI inventorship on its website.
Mozilla got a new “interim” CEO just a few days ago, and the first order of business appears to be layoffs. Bloomberg was the first to report that the company is cutting about 60 jobs, or 5 percent of its workforce. A TechCrunch report has a company memo that followed these layoffs, detailing one product shutdown and a “scaling back” of a few others.
Mozilla started as the open source browser/email company that rose from the ashes of Netscape. Firefox and Thunderbird have kept on trucking since then, but the mozilla.org/products page is a great example of what the strategy has been lately: “Firefox is just the beginning!” reads the very top of the page; it then goes on to detail a lot of projects that aren’t in line with Mozilla’s core work of making a browser. There’s Mozilla Monitor (a data breach checker), Mozilla VPN, Pocket (a news reader app), Firefox Relay (for making burner email accounts), and Firefox Focus, a fork of Firefox with a privacy focus.
That’s not even a comprehensive list of recent Mozilla products. From 2017–2020, there was “Firefox Send,” an encrypted file transfer service, and a VR-focused “Firefox Reality” browser that lasted from 2018 to 2022. In 2022, Mozilla launched a $35 million venture capital fund called Mozilla Ventures. Not all Mozilla side-projects are losers—the memory-safe Rust programming language was spun out of Mozilla in 2020 and has seen rapid adoption in the Linux kernel and Android.
Mozilla is a tiny company that competes with some of the biggest tech companies in the world—Apple, Google, and Microsoft. It’s also very important to the web as a whole, as Firefox is the only browser that can’t trace its lineage back to Apple and WebKit (Chrome’s Blink engine is a WebKit fork. Microsoft Edge is a Chromium fork). So you would think focusing on Firefox would be a priority, but the company continually struggles with focus.
The Mozilla Corporation gets about 80 percent of its revenue from Google—also its primary browser competitor—via a search deal, so Mozilla isn’t exactly a healthy company. These non-browser projects could be seen as a search for a less vulnerable revenue stream, but none have put a huge dent in the bottom line.
TechCrunch managed to get an internal company memo that details a few “strategic corrections” for the myriad Mozilla products. Mozilla has a “mozilla.social” Mastodon instance that the memo says originally intended to “effectively shape the future of social media,” but the company now says the social group will get a “much smaller team.” Mozilla says it will also “reduce our investments” in Mozilla VPN, Firefox Relay, and something the memo calls “Online Footprint Scrubber” (that sounds like Mozilla Monitor?). It’s also shutting down “Mozilla Hubs,” which was a 3D virtual world it launched in 2018—that’s right, there was also a metaverse project! The memo says that “demand has moved away from 3D virtual worlds” and that “this is impacting all industry players.” The company is also cutting jobs at “MozProd,” its infrastructure team.
While chasing the trends of VR and metaverse didn’t work out, Mozilla now wants to chase another hot new trend: AI! The memo says: “In 2023, generative AI began rapidly shifting the industry landscape. Mozilla seized an opportunity to bring trustworthy AI into Firefox, largely driven by the Fakespot acquisition and the product integration work that followed. Additionally, finding great content is still a critical use case for the Internet. Therefore, as part of the changes today, we will be bringing together Pocket, Content, and the AI/ML teams supporting content with the Firefox Organization. More details on the specific organizational changes will follow shortly.” Mozilla paid an undisclosed sum in 2023 to buy a company called Fakespot, which uses AI to identify fake product reviews. Specifically citing “generative AI” leads us to believe the company wants to build a chatbot or webpage summarizer.
The TechCrunch report interprets the memo, saying, “It now looks like Mozilla may refocus on Firefox once more,” but the memo does not give an affirmative statement on “Firefox the browser” being important or seeing additional investments. In 2020, the company had another round of layoffs and said it wanted to “refocus the Firefox organization on core browser growth,” but nothing seems to have come of that. Firefox’s market share is about 3 percent of all browsers, and that number goes down every year.