france

google-balks-at-$270m-fine-after-training-ai-on-french-news-sites’-content

Google balks at $270M fine after training AI on French news sites’ content

Google balks at $270M fine after training AI on French news sites’ content

Google has agreed to pay 250 million euros (about $273 million) to settle a dispute in France after breaching years-old commitments to inform and pay French news publishers when referencing and displaying content in both search results and when training Google’s AI-powered chatbot, Gemini.

According to France’s competition watchdog, the Autorité de la Concurrence (ADLC), Google dodged many commitments to deal with publishers fairly. Most recently, it never notified publishers or the ADLC before training Gemini (initially launched as Bard) on publishers’ content or displaying content in Gemini outputs. Google also waited until September 28, 2023, to introduce easy options for publishers to opt out, which made it impossible for publishers to negotiate fair deals for that content, the ADLC found.

“Until this date, press agencies and publishers wanting to opt out of this use had to insert an instruction opposing any crawling of their content by Google, including on the Search, Discover and Google News services,” the ADLC noted, warning that “in the future, the Autorité will be particularly attentive as regards the effectiveness of opt-out systems implemented by Google.”

To address breaches of four out of seven commitments in France—which the ADLC imposed in 2022 for a period of five years to “benefit” publishers by ensuring Google’s ongoing negotiations with them were “balanced”—Google has agreed to “a series of corrective measures,” the ADLC said.

Google is not happy with the fine, which it described as “not proportionate” partly because the fine “doesn’t sufficiently take into account the efforts we have made to answer and resolve the concerns raised—in an environment where it’s very hard to set a course because we can’t predict which way the wind will blow next.”

According to Google, regulators everywhere need to clearly define fair use of content when developing search tools and AI models, so that search companies and AI makers always know “whom we are paying for what.” Currently in France, Google contends, the scope of Google’s commitments has shifted from just general news publishers to now also include specialist publications and listings and comparison sites.

The ADLC agreed that “the question of whether the use of press publications as part of an artificial intelligence service qualifies for protection under related rights regulations has not yet been settled,” but noted that “at the very least,” Google was required to “inform publishers of the use of their content for their Bard software.”

Regarding Bard/Gemini, Google said that it “voluntarily introduced a new technical solution called Google-Extended to make it easier for rights holders to opt out of Gemini without impact on their presence in Search.” It has now also committed to better explain to publishers both “how our products based on generative AI work and how ‘Opt Out’ works.”

Google said that it agreed to the settlement “because it’s time to move on” and “focus on the larger goal of sustainable approaches to connecting people with quality content and on working constructively with French publishers.”

“Today’s fine relates mostly to [a] disagreement about how much value Google derives from news content,” Google’s blog said, claiming that “a lack of clear regulatory guidance and repeated enforcement actions have made it hard to navigate negotiations with publishers, or plan how we invest in news in France in the future.”

What changes did Google agree to make?

Google defended its position as “the first and only platform to have signed significant licensing agreements” in France, benefiting 280 French press publishers and “covering more than 450 publications.”

With these publishers, the ADLC found that Google breached requirements to “negotiate in good faith based on transparent, objective, and non-discriminatory criteria,” to consistently “make a remuneration offer” within three months of a publisher’s request, and to provide information for publishers to “transparently assess their remuneration.”

Google also breached commitments to “inform editors and press agencies of the use of their content by its service Bard” and of Google’s decision to link “the use of press agencies’ and publishers’ content by its artificial intelligence service to the display of protected content on services such as Search, Discover and News.”

Regarding negotiations, the ADLC found that Google not only failed to be transparent with publishers about remuneration, but also failed to keep the ADLC informed of information necessary to monitor whether Google was honoring its commitments to fairly pay publishers. Partly “to guarantee better communication,” Google has agreed to appoint a French-speaking representative in its Paris office, along with other steps the ADLC recommended.

According to the ADLC’s announcement (translated from French), Google seemingly acted sketchy in negotiations by not meeting non-discrimination criteria—and unfavorably treating publishers in different situations identically—and by not mentioning “all the services that could generate revenues for the negotiating party.”

“According to the Autorité, not taking into account differences in attractiveness between content does not allow for an accurate reflection of the contribution of each press agency and publisher to Google’s revenues,” the ADLC said.

Also problematically, Google established a minimum threshold of 100 euros for remuneration that it has now agreed to drop.

This threshold, “in its very principle, introduces discrimination between publishers that, below a certain threshold, are all arbitrarily assigned zero remuneration, regardless of their respective situations,” the ADLC found.

Google balks at $270M fine after training AI on French news sites’ content Read More »

microsoft-partners-with-openai-rival-mistral-for-ai-models,-drawing-eu-scrutiny

Microsoft partners with OpenAI-rival Mistral for AI models, drawing EU scrutiny

The European Approach —

15M euro investment comes as Microsoft hosts Mistral’s GPT-4 alternatives on Azure.

Velib bicycles are parked in front of the the U.S. computer and micro-computing company headquarters Microsoft on January 25, 2023 in Issy-les-Moulineaux, France.

On Monday, Microsoft announced plans to offer AI models from Mistral through its Azure cloud computing platform, which came in conjunction with a 15 million euro non-equity investment in the French firm, which is often seen as a European rival to OpenAI. Since then, the investment deal has faced scrutiny from European Union regulators.

Microsoft’s deal with Mistral, known for its large language models akin to OpenAI’s GPT-4 (which powers the subscription versions of ChatGPT), marks a notable expansion of its AI portfolio at a time when its well-known investment in California-based OpenAI has raised regulatory eyebrows. The new deal with Mistral drew particular attention from regulators because Microsoft’s investment could convert into equity (partial ownership of Mistral as a company) during Mistral’s next funding round.

The development has intensified ongoing investigations into Microsoft’s practices, particularly related to the tech giant’s dominance in the cloud computing sector. According to Reuters, EU lawmakers have voiced concerns that Mistral’s recent lobbying for looser AI regulations might have been influenced by its relationship with Microsoft. These apprehensions are compounded by the French government’s denial of prior knowledge of the deal, despite earlier lobbying for more lenient AI laws in Europe. The situation underscores the complex interplay between national interests, corporate influence, and regulatory oversight in the rapidly evolving AI landscape.

Avoiding American influence

The EU’s reaction to the Microsoft-Mistral deal reflects broader tensions over the role of Big Tech companies in shaping the future of AI and their potential to stifle competition. Calls for a thorough investigation into Microsoft and Mistral’s partnership have been echoed across the continent, according to Reuters, with some lawmakers accusing the firms of attempting to undermine European legislative efforts aimed at ensuring a fair and competitive digital market.

The controversy also touches on the broader debate about “European champions” in the tech industry. France, along with Germany and Italy, had advocated for regulatory exemptions to protect European startups. However, the Microsoft-Mistral deal has led some, like MEP Kim van Sparrentak, to question the motives behind these exemptions, suggesting they might have inadvertently favored American Big Tech interests.

“That story seems to have been a front for American-influenced Big Tech lobby,” said Sparrentak, as quoted by Reuters. Sparrentak has been a key architect of the EU’s AI Act, which has not yet been passed. “The Act almost collapsed under the guise of no rules for ‘European champions,’ and now look. European regulators have been played.”

MEP Alexandra Geese also expressed concerns over the concentration of money and power resulting from such partnerships, calling for an investigation. Max von Thun, Europe director at the Open Markets Institute, emphasized the urgency of investigating the partnership, criticizing Mistral’s reported attempts to influence the AI Act.

Also on Monday, amid the partnership news, Mistral announced Mistral Large, a new large language model (LLM) that Mistral says “ranks directly after GPT-4 based on standard benchmarks.” Mistral has previously released several open-weights AI models that have made news for their capabilities, but Mistral Large will be a closed model only available to customers through an API.

Microsoft partners with OpenAI-rival Mistral for AI models, drawing EU scrutiny Read More »

everybody’s-talking-about-mistral,-an-upstart-french-challenger-to-openai

Everybody’s talking about Mistral, an upstart French challenger to OpenAI

A challenger appears —

“Mixture of experts” Mixtral 8x7B helps open-weights AI punch above its weight class.

An illustrated robot holding a French flag.

Enlarge / An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It’s hard to draw a picture of an LLM, so a robot will have to do.

On Monday, Mistral AI announced a new AI language model called Mixtral 8x7B, a “mixture of experts” (MoE) model with open weights that reportedly truly matches OpenAI’s GPT-3.5 in performance—an achievement that has been claimed by others in the past but is being taken seriously by AI heavyweights such as OpenAI’s Andrej Karpathy and Jim Fan. That means we’re closer to having a ChatGPT-3.5-level AI assistant that can run freely and locally on our devices, given the right implementation.

Mistral, based in Paris and founded by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, has seen a rapid rise in the AI space recently. It has been quickly raising venture capital to become a sort of French anti-OpenAI, championing smaller models with eye-catching performance. Most notably, Mistral’s models run locally with open weights that can be downloaded and used with fewer restrictions than closed AI models from OpenAI, Anthropic, or Google. (In this context “weights” are the computer files that represent a trained neural network.)

Mixtral 8x7B can process a 32K token context window and works in French, German, Spanish, Italian, and English. It works much like ChatGPT in that it can assist with compositional tasks, analyze data, troubleshoot software, and write programs. Mistral claims that it outperforms Meta’s much larger LLaMA 2 70B (70 billion parameter) large language model and that it matches or exceeds OpenAI’s GPT-3.5 on certain benchmarks, as seen in the chart below.

A chart of Mixtral 8x7B performance vs. LLaMA 2 70B and GPT-3.5, provided by Mistral.

Enlarge / A chart of Mixtral 8x7B performance vs. LLaMA 2 70B and GPT-3.5, provided by Mistral.

Mistral

The speed at which open-weights AI models have caught up with OpenAI’s top offering a year ago has taken many by surprise. Pietro Schirano, the founder of EverArt, wrote on X, “Just incredible. I am running Mistral 8x7B instruct at 27 tokens per second, completely locally thanks to @LMStudioAI. A model that scores better than GPT-3.5, locally. Imagine where we will be 1 year from now.”

LexicaArt founder Sharif Shameem tweeted, “The Mixtral MoE model genuinely feels like an inflection point — a true GPT-3.5 level model that can run at 30 tokens/sec on an M1. Imagine all the products now possible when inference is 100% free and your data stays on your device.” To which Andrej Karpathy replied, “Agree. It feels like the capability / reasoning power has made major strides, lagging behind is more the UI/UX of the whole thing, maybe some tool use finetuning, maybe some RAG databases, etc.”

Mixture of experts

So what does mixture of experts mean? As this excellent Hugging Face guide explains, it refers to a machine-learning model architecture where a gate network routes input data to different specialized neural network components, known as “experts,” for processing. The advantage of this is that it enables more efficient and scalable model training and inference, as only a subset of experts are activated for each input, reducing the computational load compared to monolithic models with equivalent parameter counts.

In layperson’s terms, a MoE is like having a team of specialized workers (the “experts”) in a factory, where a smart system (the “gate network”) decides which worker is best suited to handle each specific task. This setup makes the whole process more efficient and faster, as each task is done by an expert in that area, and not every worker needs to be involved in every task, unlike in a traditional factory where every worker might have to do a bit of everything.

OpenAI has been rumored to use a MoE system with GPT-4, accounting for some of its performance. In the case of Mixtral 8x7B, the name implies that the model is a mixture of eight 7 billion-parameter neural networks, but as Karpathy pointed out in a tweet, the name is slightly misleading because, “it is not all 7B params that are being 8x’d, only the FeedForward blocks in the Transformer are 8x’d, everything else stays the same. Hence also why total number of params is not 56B but only 46.7B.”

Mixtral is not the first “open” mixture of experts model, but it is notable for its relatively small size in parameter count and performance. It’s out now, available on Hugging Face and BitTorrent under the Apache 2.0 license. People have been running it locally using an app called LM Studio. Also, Mistral began offering beta access to an API for three levels of Mistral models on Monday.

Everybody’s talking about Mistral, an upstart French challenger to OpenAI Read More »