Biz & IT

dropbox-lays-off-20%-of-staff,-says-it-overinvested-and-underperformed

Dropbox lays off 20% of staff, says it overinvested and underperformed

Dropbox is laying off 528 employees in a move that will reduce its global workforce by 20 percent, CEO Drew Houston announced today.

Houston wrote that Dropbox’s core file sync and sharing “business has matured, and we’ve been working to build our next phase of growth with products like Dash,” an “AI-powered universal search” product targeted to business customers. The company’s “current structure and investment levels” are “no longer sustainable,” according to Houston.

“We continue to see softening demand and macro headwinds in our core business,” Houston wrote. “But external factors are only part of the story. We’ve heard from many of you that our organizational structure has become overly complex, with excess layers of management slowing us down.”

Dropbox previously cut 500 employees in an April 2023 round of layoffs. At the time, Houston said that Dropbox’s business was profitable but growth was slowing.

Today, Houston said that Dropbox is “still not delivering at the level our customers deserve or performing in line with industry peers. So we’re making more significant cuts in areas where we’re over-invested or underperforming while designing a flatter, more efficient team structure overall.”

In a Securities and Exchange Commission filing, Dropbox said it expects to “make total cash expenditures of approximately $63 million to $68 million in connection with the reduction in force, primarily consisting of severance payments, employee benefits and related costs.” Laid-off employees are eligible for 16 weeks of pay, plus one additional week of pay for each year of tenure, Houston wrote. He also said the laid-off workers “will receive their Q4 equity vest” and will be eligible for a pro-rated payment equivalent to their 2024 bonus target.

Dropbox lays off 20% of staff, says it overinvested and underperformed Read More »

android-trojan-that-intercepts-voice-calls-to-banks-just-got-more-stealthy

Android Trojan that intercepts voice calls to banks just got more stealthy

Much of the new obfuscation is the result of hiding malicious code in a dynamically decrypted and loaded .dex file of the apps. As a result, Zimperium initially believed the malicious apps they were analyzing were part of a previously unknown malware family. Then the researchers dumped the .dex file from an infected device’s memory and performed static analysis on it.

“As we delved deeper, a pattern emerged,” Ortega wrote. “The services, receivers, and activities closely resembled those from an older malware variant with the package name com.secure.assistant.” That package allowed the researchers to link it to the FakeCall Trojan.

Many of the new features don’t appear to be fully implemented yet. Besides the obfuscation, other new capabilities include:

Bluetooth Receiver

This receiver functions primarily as a listener, monitoring Bluetooth status and changes. Notably, there is no immediate evidence of malicious behavior in the source code, raising questions about whether it serves as a placeholder for future functionality.

Screen Receiver

Similar to the Bluetooth receiver, this component only monitors the screen’s state (on/off) without revealing any malicious activity in the source code.

Accessibility Service

The malware incorporates a new service inherited from the Android Accessibility Service, granting it significant control over the user interface and the ability to capture information displayed on the screen. The decompiled code shows methods such as onAccessibilityEvent() and onCreate() implemented in native code, obscuring their specific malicious intent.

While the provided code snippet focuses on the service’s lifecycle methods implemented in native code, earlier versions of the malware give us clues about possible functionality:

  • Monitoring Dialer Activity: The service appears to monitor events from the com.skt.prod.dialer package (the stock dialer app), potentially allowing it to detect when the user is attempting to make calls using apps other than the malware itself.
  • Automatic Permission Granting: The service seems capable of detecting permission prompts from the com.google.android.permissioncontroller (system permission manager) and com.android.systemui (system UI). Upon detecting specific events (e.g., TYPE_WINDOW_STATE_CHANGED), it can automatically grant permissions for the malware, bypassing user consent.
  • Remote Control: The malware enables remote attackers to take full control of the victim’s device UI, allowing them to simulate user interactions, such as clicks, gestures, and navigation across apps. This capability enables the attacker to manipulate the device with precision.

Phone Listener Service

This service acts as a conduit between the malware and its Command and Control (C2) server, allowing the attacker to issue commands and execute actions on the infected device. Like its predecessor, the new variant provides attackers with a comprehensive set of capabilities (see the table below). Some functionalities have been moved to native code, while others are new additions, further enhancing the malware’s ability to compromise devices.

The Kaspersky post from 2022 said that the only language supported by FakeCall was Korean and that the Trojan appeared to target several specific banks in South Korea. Last year, researchers from security firm ThreatFabric said the Trojan had begun supporting English, Japanese, and Chinese, although there were no indications people speaking those languages were actually targeted.

Android Trojan that intercepts voice calls to banks just got more stealthy Read More »

downey-jr.-plans-to-fight-ai-re-creations-from-beyond-the-grave

Downey Jr. plans to fight AI re-creations from beyond the grave

Robert Downey Jr. has declared that he will sue any future Hollywood executives who try to re-create his likeness using AI digital replicas, as reported by Variety. His comments came during an appearance on the “On With Kara Swisher” podcast, where he discussed AI’s growing role in entertainment.

“I intend to sue all future executives just on spec,” Downey told Swisher when discussing the possibility of studios using AI or deepfakes to re-create his performances after his death. When Swisher pointed out he would be deceased at the time, Downey responded that his law firm “will still be very active.”

The Oscar winner expressed confidence that Marvel Studios would not use AI to re-create his Tony Stark character, citing his trust in decision-makers there. “I am not worried about them hijacking my character’s soul because there’s like three or four guys and gals who make all the decisions there anyway and they would never do that to me,” he said.

Downey currently performs on Broadway in McNeal, a play that examines corporate leaders in AI technology. During the interview, he freely critiqued tech executives—Variety pointed out a particular quote from the interview where he criticized tech leaders who potentially do negative things but seek positive attention.

Downey Jr. plans to fight AI re-creations from beyond the grave Read More »

hospitals-adopt-error-prone-ai-transcription-tools-despite-warnings

Hospitals adopt error-prone AI transcription tools despite warnings

In one case from the study cited by AP, when a speaker described “two other girls and one lady,” Whisper added fictional text specifying that they “were Black.” In another, the audio said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.” Whisper transcribed it to, “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.”

An OpenAI spokesperson told the AP that the company appreciates the researchers’ findings and that it actively studies how to reduce fabrications and incorporates feedback in updates to the model.

Why Whisper confabulates

The key to Whisper’s unsuitability in high-risk domains comes from its propensity to sometimes confabulate, or plausibly make up, inaccurate outputs. The AP report says, “Researchers aren’t certain why Whisper and similar tools hallucinate,” but that isn’t true. We know exactly why Transformer-based AI models like Whisper behave this way.

Whisper is based on technology that is designed to predict the next most likely token (chunk of data) that should appear after a sequence of tokens provided by a user. In the case of ChatGPT, the input tokens come in the form of a text prompt. In the case of Whisper, the input is tokenized audio data.

The transcription output from Whisper is a prediction of what is most likely, not what is most accurate. Accuracy in Transformer-based outputs is typically proportional to the presence of relevant accurate data in the training dataset, but it is never guaranteed. If there is ever a case where there isn’t enough contextual information in its neural network for Whisper to make an accurate prediction about how to transcribe a particular segment of audio, the model will fall back on what it “knows” about the relationships between sounds and words it has learned from its training data.

Hospitals adopt error-prone AI transcription tools despite warnings Read More »

removal-of-russian-coders-spurs-debate-about-linux-kernel’s-politics

Removal of Russian coders spurs debate about Linux kernel’s politics

“Remove some entries due to various compliance requirements. They can come back in the future if sufficient documentation is provided.”

That two-line comment, submitted by major Linux kernel maintainer Greg Kroah-Hartman, accompanied a patch that removed about a dozen names from the kernle’s MAINTAINERS file. “Some entries” notably had either Russian names or .ru email addresses. “Various compliance requirements” was, in this case, sanctions against Russia and Russian companies, stemming from that country’s invasion of Ukraine.

This merge did not go unnoticed. Replies on the kernel mailing list asked about this “very vague” patch. Kernel developer James Bottomley wrote that “we” (seemingly speaking for Linux maintainers) had “actual advice” from Linux Foundation counsel. Employees of companies on the Treasury Department’s Office of Foreign Assets Control list of Specially Designated Nationals and Blocked Persons (OFAC SDN), or connected to them, will have their collaborations “subject to restrictions,” and “cannot be in the MAINTAINERS file.” “Sufficient documentation” would mean evidence that someone does not work for an OFAC SDN entity, Bottomley wrote.

There followed a number of messages questioning the legitimacy, suddenness, potentially US-forced, and non-reviewed nature of the commit, along with broader questions about the separation of open source code from international politics. Linux creator Linus Torvalds entered the thread with, “Ok, lots of Russian trolls out and about.” He wrote: “It’s entirely clear why the change was done” and noted that “Russian troll factories” will not revert it and that “the ‘various compliance requirements’ are not just a US thing.

Removal of Russian coders spurs debate about Linux kernel’s politics Read More »

phone-tracking-tool-lets-government-agencies-follow-your-every-move

Phone tracking tool lets government agencies follow your every move

Both operating systems will display a list of apps and whether they are permitted access always, never, only while the app is in use, or to prompt for permission each time. Both also allow users to choose whether the app sees precise locations down to a few feet or only a coarse-grained location.

For most users, there’s usefulness in allowing an app for photos, transit or maps to access a user’s precise location. For other classes of apps—say those for Internet jukeboxes at bars and restaurants—it can be helpful for them to have an approximate location, but giving them precise, fine-grained access is likely overkill. And for other apps, there’s no reason for them ever to know the device’s location. With a few exceptions, there’s little reason for apps to always have location access.

Not surprisingly, Android users who want to block intrusive location gathering have more settings to change than iOS users. The first thing to do is access Settings > Security & Privacy > Ads and choose “Delete advertising ID.” Then, promptly ignore the long, scary warning Google provides and hit the button confirming the decision at the bottom. If you don’t see that setting, good for you. It means you already deleted it. Google provides documentation here.

iOS, by default, doesn’t give apps access to “Identifier for Advertisers,” Apple’s version of the unique tracking number assigned to iPhones, iPads, and AppleTVs. Apps, however, can display a window asking that the setting be turned on, so it’s useful to check. iPhone users can do this by accessing Settings > Privacy & Security > Tracking. Any apps with permission to access the unique ID will appear. While there, users should also turn off the “Allow Apps to Request to Track” button. While in iOS Privacy & Security, users should navigate to Apple Advertising and ensure Personalized Ads is turned off.

Additional coverage of Location X from Haaretz and NOTUS is here and here. The New York Times, the other publication given access to the data, hadn’t posted an article at the time this Ars post went live.

Phone tracking tool lets government agencies follow your every move Read More »

at-ted-ai-2024,-experts-grapple-with-ai’s-growing-pains

At TED AI 2024, experts grapple with AI’s growing pains


A year later, a compelling group of TED speakers move from “what’s this?” to “what now?”

The opening moments of TED AI 2024 in San Francisco on October 22, 2024.

The opening moments of TED AI 2024 in San Francisco on October 22, 2024. Credit: Benj Edwards

SAN FRANCISCO—On Tuesday, TED AI 2024 kicked off its first day at San Francisco’s Herbst Theater with a lineup of speakers that tackled AI’s impact on science, art, and society. The two-day event brought a mix of researchers, entrepreneurs, lawyers, and other experts who painted a complex picture of AI with fairly minimal hype.

The second annual conference, organized by Walter and Sam De Brouwer, marked a notable shift from last year’s broad existential debates and proclamations of AI as being “the new electricity.” Rather than sweeping predictions about, say, looming artificial general intelligence (although there was still some of that, too), speakers mostly focused on immediate challenges: battles over training data rights, proposals for hardware-based regulation, debates about human-AI relationships, and the complex dynamics of workplace adoption.

The day’s sessions covered a wide breadth: physicist Carlo Rovelli explored consciousness and time, Project CETI researcher Patricia Sharma demonstrated attempts to use AI to decode whale communication, Recording Academy CEO Harvey Mason Jr. outlined music industry adaptation strategies, and even a few robots made appearances.

The shift from last year’s theoretical discussions to practical concerns was particularly evident during a presentation from Ethan Mollick of the Wharton School, who tackled what he called “the productivity paradox”—the disconnect between AI’s measured impact and its perceived benefits in the workplace. Already, organizations are moving beyond the gee-whiz period after ChatGPT’s introduction and into the implications of widespread use.

Sam De Brouwer and Walter De Brouwer organized TED AI and selected the speakers. Benj Edwards

Drawing from research claiming AI users complete tasks faster and more efficiently, Mollick highlighted a peculiar phenomenon: While one-third of Americans reported using AI in August of this year, managers often claim “no one’s using AI” in their organizations. Through a live demonstration using multiple AI models simultaneously, Mollick illustrated how traditional work patterns must evolve to accommodate AI’s capabilities. He also pointed to the emergence of what he calls “secret cyborgs“—employees quietly using AI tools without management’s knowledge. Regarding the future of jobs in the age of AI, he urged organizations to view AI as an opportunity for expansion rather than merely a cost-cutting measure.

Some giants in the AI field made an appearance. Jakob Uszkoreit, one of the eight co-authors of the now-famous “Attention is All You Need” paper that introduced Transformer architecture, reflected on the field’s rapid evolution. He distanced himself from the term “artificial general intelligence,” suggesting people aren’t particularly general in their capabilities. Uszkoreit described how the development of Transformers sidestepped traditional scientific theory, comparing the field to alchemy. “We still do not know how human language works. We do not have a comprehensive theory of English,” he noted.

Stanford professor Surya Ganguli presenting at TED AI 2024. Benj Edwards

And refreshingly, the talks went beyond AI language models. For example, Isomorphic Labs Chief AI Officer Max Jaderberg, who previously worked on Google DeepMind’s AlphaFold 3, gave a well-received presentation on AI-assisted drug discovery. He detailed how AlphaFold has already saved “1 billion years of research time” by discovering the shapes of proteins and showed how AI agents are now capable of running thousands of parallel drug design simulations that could enable personalized medicine.

Danger and controversy

While hype was less prominent this year, some speakers still spoke of AI-related dangers. Paul Scharre, executive vice president at the Center for a New American Security, warned about the risks of advanced AI models falling into malicious hands, specifically citing concerns about terrorist attacks with AI-engineered biological weapons. Drawing parallels to nuclear proliferation in the 1960s, Scharre argued that while regulating software is nearly impossible, controlling physical components like specialized chips and fabrication facilities could provide a practical framework for AI governance.

ReplikaAI founder Eugenia Kuyda cautioned that AI companions could become “the most dangerous technology if not done right,” suggesting that the existential threat from AI might come not from science fiction scenarios but from technology that isolates us from human connections. She advocated for designing AI systems that optimize for human happiness rather than engagement, proposing a “human flourishing metric” to measure its success.

Ben Zhao, a University of Chicago professor associated with the Glaze and Nightshade projects, painted a dire picture of AI’s impact on art, claiming that art schools were seeing unprecedented enrollment drops and galleries were closing at an accelerated rate due to AI image generators, though we have yet to dig through the supporting news headlines he momentarily flashed up on the screen.

Some of the speakers represented polar opposites of each other, policy-wise. For example, copyright attorney Angela Dunning offered a defense of AI training as fair use, drawing from historical parallels in technological advancement. A litigation partner at Cleary Gottlieb, which has previously represented the AI image generation service Midjourney in a lawsuit, Dunning quoted Mark Twin saying “there is no such thing as a new idea” and argued that copyright law allows for building upon others’ ideas while protecting specific expressions. She compared current AI debates to past technological disruptions, noting how photography, once feared as a threat to traditional artists, instead sparked new artistic movements like abstract art and pointillism. “Art and science can only remain free if we are free to build on the ideas of those that came before,” Dunning said, challenging more restrictive views of AI training.

Copyright lawyer Angela Dunning quoted Mark Twain in her talk about fair use and AI. Benj Edwards

Dunning’s presentation stood in direct opposition to Ed Newton-Rex, who had earlier advocated for mandatory licensing of training data through his nonprofit Fairly Trained. In fact, the same day, Newton-Rex’s organization unveiled a “Statement on AI training” signed by many artists that says, “The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.” The issue has not yet been legally settled in US courts, but clearly, the battle lines have been drawn, and no matter which side you take, TED AI did a good job of giving both perspectives to the audience.

Looking forward

Some speakers explored potential new architectures for AI. Stanford professor Surya Ganguli highlighted the contrast between AI and human learning, noting that while AI models require trillions of tokens to train, humans learn language from just millions of exposures. He proposed “quantum neuromorphic computing” as a potential bridge between biological and artificial systems, suggesting a future where computers could potentially match the energy efficiency of the human brain.

Also, Guillaume Verdon, founder of Extropic and architect of the Effective Accelerationism (often called “E/Acc”) movement, presented what he called “physics-based intelligence” and claimed his company is “building a steam engine for AI,” potentially offering energy efficiency improvements up to 100 million times better than traditional systems—though he acknowledged this figure ignores cooling requirements for superconducting components. The company had completed its first room-temperature chip tape-out just the previous week.

The Day One sessions closed out with predictions about the future of AI from OpenAI’s Noam Brown, who emphasized the importance of scale in expanding future AI capabilities, and University of Washington professor Pedro Domingos spoke about “co-intelligence,” saying, “People are smart, organizations are stupid” and proposing that AI could be used to bridge that gap by drawing on the collective intelligence of an organization.

When attended TED AI last year, some obvious questions emerged: Is this current wave of AI a fad? Will there be a TED AI next year? I think the second TED AI answered these questions well—AI isn’t going away, and there are still endless angles to explore as the field expands rapidly.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

At TED AI 2024, experts grapple with AI’s growing pains Read More »

fortigate-admins-report-active-exploitation-0-day-vendor-isn’t-talking.

FortiGate admins report active exploitation 0-day. Vendor isn’t talking.

Citing the Reddit comment, Beaumont took to Mastodon to explain: “People are quite openly posting what is happening on Reddit now, threat actors are registering rogue FortiGates into FortiManager with hostnames like ‘localhost’ and using them to get RCE.”

Beaumont wasn’t immediately available to elaborate. In the same thread, another user said that based on the brief description, it appears attackers are somehow stealing digital certificates authenticating a device to a customer network, loading it onto a FortiGate device they own, and then registering the device into the customer network.

The person continued:

From there, they can configure their way into your network or possibly take other admin actions (eg. possibly sync configs from trustworthy managed devices to their own?) It’s not super clear from these threads. The mitigation to prevent unknown serial numbers suggests that a speedbump to fast onboarding prevents even a cert-bearing(?) device from being included into the fortimanager.

Beaumont went on to say that based on evidence he’s seen, China-state hackers have “been hopping into internal networks using this one since earlier in the year, looks like.”

60,000 devices exposed

After this post went live on Ars, Beaumont published a post that said the vulnerability likely resides in the FortiGate to FortiManager protocol. FGFM is the language that allows Fortigate firewall devices to communicate with the manager over port 541. As Beaumont pointed out, the Shodan search engine shows more than 60,000 such connections exposed to the Internet.

Beaumont wrote:

There’s one requirement for an attacker: you need a valid certificate to connect. However, you can just take a certificate from a FortiGate box and reuse it. So, effectively, there’s no barrier to registering.

Once registered, there’s a vulnerability which allows remote code execution on the FortiManager itself via the rogue FortiGate connection.

From the FortiManager, you can then manage the legit downstream FortiGate firewalls, view config files, take credentials and alter configurations. Because MSPs — Managed Service Providers — often use FortiManager, you can use this to enter internal networks downstream.

Because of the way FGFM is designed — NAT traversal situations — it also means if you gain access to a managed FortiGate firewall you then can traverse up to the managing FortiManager device… and then back down to other firewalls and networks.

To make matters harder for FortiGate customers and defenders, the company’s support portal was returning connection errors at the time this post went live on Ars that prevented people from accessing the site.

FortiGate admins report active exploitation 0-day. Vendor isn’t talking. Read More »

basecamp-maker-37signals-says-its-“cloud-exit”-will-save-it-$10m-over-5-years

Basecamp-maker 37Signals says its “cloud exit” will save it $10M over 5 years

Lots of pointing at clouds

AWS made data transfer out of AWS free for customers who were moving off their servers in March, spurred in part by European regulations. Trade publications are full of trend stories about rising cloud costs and explainers on why companies are repatriating. Stories of major players’ cloud reversals, like that of Dropbox, have become talking points for the cloud-averse.

Not everyone believes the sky is falling. Lydia Leong, a cloud computing analyst at Gartner, wrote on her own blog about how “the myth of cloud repatriation refuses to die.” A large part of this, Leong writes, is in how surveys and anecdotal news stories confuse various versions of “repatriation” from managed service providers to self-hosted infrastructure.

“None of these things are in any way equivalent to the notion that there’s a broad or even common movement of workloads from the cloud back on-premises, though, especially for those customers who have migrated entire data centers or the vast majority of their IT estate to the cloud,” writes Leong.

Both Leong and Rich Hoyer, director of the FinOps group at SADA, suggest that framing the issue as simply “cloud versus on-premises” is too simplistic. A poorly architected split between cloud and on-prem, vague goals and measurements of cloud “cost” and “success,” and fuzzy return-on-investment math, Hoyer writes, are feeding alarmist takes on cloud costs.

For its part, AWS has itself testified that it faces competition from the on-premises IT movement, although it did so as part of a “Cloud Services Market Investigation” by UK market competition authorities. Red Hat and Citrix have suggested that, at a minimum, hybrid approaches have regained ground after a period of cloud primacy.

Those kinds of measured approaches don’t have the same broad reach as declaring an “exit” and putting a very round number on it, but it’s another interesting data point.

Ars has reached out to AWS and will update this post with comment.

Basecamp-maker 37Signals says its “cloud exit” will save it $10M over 5 years Read More »

finally-upgrading-from-isc-dhcp-server-to-isc-kea-for-my-homelab

Finally upgrading from isc-dhcp-server to isc-kea for my homelab

Broken down that way, the migration didn’t look terribly scary—and it’s made easier by the fact that the Kea default config files come filled with descriptive comments and configuration examples to crib from. (And, again, ISC has done an outstanding job with the docs for Kea. All versions, from deprecated to bleeding-edge, have thorough and extensive online documentation if you’re curious about what a given option does or where to apply it—and, as noted above, there are also the supplied sample config files to tear apart if you want more detailed examples.)

Configuration time for DHCP

We have two Kea applications to configure, so we’ll do DHCP first and then get to the DDNS side. (Though the DHCP config file also contains a bunch of DDNS stuff, so I guess if we’re being pedantic, we’re setting both up at once.)

The first file to edit, if you installed Kea via package manager, is /etc/kea/kea-dhcp4.conf. The file should already have some reasonably sane defaults in it, and it’s worth taking a moment to look through the comments and see what those defaults are and what they mean.

Here’s a lightly sanitized version of my working kea-dhcp4.conf file:

    "Dhcp4":       "control-socket":         "socket-type": "unix",        "socket-name": "https://arstechnica.com/tmp/kea4-ctrl-socket"      ,      "interfaces-config":         "interfaces": ["eth0"],        "dhcp-socket-type": "raw"      ,      "dhcp-ddns":         "enable-updates": true      ,      "ddns-conflict-resolution-mode": "no-check-with-dhcid",      "ddns-override-client-update": true,      "ddns-override-no-update": true,      "ddns-qualifying-suffix": "bigdinosaur.lan",      "authoritative": true,      "valid-lifetime": 86400,      "renew-timer": 43200,      "expired-leases-processing":         "reclaim-timer-wait-time": 3600,        "hold-reclaimed-time": 3600,        "max-reclaim-leases": 0,        "max-reclaim-time": 0      ,      "loggers": [      {        "name": "kea-dhcp4",        "output_options": [          {            "output": "syslog",            "pattern": "%-5p %mn",            "maxsize": 1048576,            "maxver": 8          }        ],        "severity": "INFO",        "debuglevel": 0              ],      "reservations-global": false,      "reservations-in-subnet": true,      "reservations-out-of-pool": true,      "host-reservation-identifiers": [        "hw-address"      ],      "subnet4": [        {          "id": 1,          "subnet": "10.10.10.0/24",          "pools": [            {              "pool": "10.10.10.170 - 10.10.10.254"            }          ],          "option-data": [            {              "name": "subnet-mask",              "data": "255.255.255.0"            },            {              "name": "routers",              "data": "10.10.10.1"            },            {              "name": "broadcast-address",              "data": "10.10.10.255"            },            {              "name": "domain-name-servers",              "data": "10.10.10.53"            },            {              "name": "domain-name",              "data": "bigdinosaur.lan"            }          ],          "reservations": [            {              "hostname": "host1.bigdinosaur.lan",              "hw-address": "aa:bb:cc:dd:ee:ff",              "ip-address": "10.10.10.100"            },            {              "hostname": "host2.bigdinosaur.lan",              "hw-address": "ff:ee:dd:cc:bb:aa",              "ip-address": "10.10.10.101"            }          ]              ]    }  }

The first stanzas set up the control socket on which the DHCP process listens for management API commands (we’re not going to set up the management tool, which is overkill for a homelab, but this will ensure the socket exists if you ever decide to go in that direction). They also set up the interface on which Kea listens for DHCP requests, and they tell Kea to listen for those requests in raw socket mode. You almost certainly want raw as your DHCP socket type (see here for why), but this can also be set to udp if needed.

Finally upgrading from isc-dhcp-server to isc-kea for my homelab Read More »

openai-releases-chatgpt-app-for-windows

OpenAI releases ChatGPT app for Windows

On Thursday, OpenAI released an early Windows version of its first ChatGPT app for Windows, following a Mac version that launched in May. Currently, it’s only available to subscribers of Plus, Team, Enterprise, and Edu versions of ChatGPT, and users can download it for free in the Microsoft Store for Windows.

OpenAI is positioning the release as a beta test. “This is an early version, and we plan to bring the full experience to all users later this year,” OpenAI writes on the Microsoft Store entry for the app. (Interestingly, ChatGPT shows up as being rated “T for Teen” by the ESRB in the Windows store, despite not being a video game.)

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app captured on October 18, 2024. Credit: Benj Edwards

Upon opening the app, OpenAI requires users to log into a paying ChatGPT account, and from there, the app is basically identical to the web browser version of ChatGPT. You can currently use it to access several models: GPT-4o, GPT-4o with Canvas, 01-preview, 01-mini, GPT-4o mini, and GPT-4. Also, it can generate images using DALL-E 3 or analyze uploaded files and images.

If you’re running Windows 11, you can instantly call up a small ChatGPT window when the app is open using an Alt+Space shortcut (it did not work in Windows 10 when we tried). That could be handy for asking ChatGPT a quick question at any time.

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024. Credit: Benj Edwards

And just like the web version, all the AI processing takes place in the cloud on OpenAI’s servers, which means an Internet connection is required.

So as usual, chat like somebody’s watching, and don’t rely on ChatGPT as a factual reference for important decisions—GPT-4o in particular is great at telling you what you want to hear, whether it’s correct or not. As OpenAI says in a small disclaimer at the bottom of the app window: “ChatGPT can make mistakes.”

OpenAI releases ChatGPT app for Windows Read More »

cheap-ai-“video-scraping”-can-now-extract-data-from-any-screen-recording

Cheap AI “video scraping” can now extract data from any screen recording


Researcher feeds screen recordings into Gemini to extract accurate information with ease.

Abstract 3d background with different cubes

Recently, AI researcher Simon Willison wanted to add up his charges from using a cloud service, but the payment values and dates he needed were scattered among a dozen separate emails. Inputting them manually would have been tedious, so he turned to a technique he calls “video scraping,” which involves feeding a screen recording video into an AI model, similar to ChatGPT, for data extraction purposes.

What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we’re doing on our computer screens.

“The other day I found myself needing to add up some numeric values that were scattered across twelve different emails,” Willison wrote in a detailed post on his blog. He recorded a 35-second video scrolling through the relevant emails, then fed that video into Google’s AI Studio tool, which allows people to experiment with several versions of Google’s Gemini 1.5 Pro and Gemini 1.5 Flash AI models.

Willison then asked Gemini to pull the price data from the video and arrange it into a special data format called JSON (JavaScript Object Notation) that included dates and dollar amounts. The AI model successfully extracted the data, which Willison then formatted as CSV (comma-separated values) table for spreadsheet use. After double-checking for errors as part of his experiment, the accuracy of the results—and what the video analysis cost to run—surprised him.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video. Credit: Simon Willison

“The cost [of running the video model] is so low that I had to re-run my calculations three times to make sure I hadn’t made a mistake,” he wrote. Willison says the entire video analysis process ostensibly cost less than one-tenth of a cent, using just 11,018 tokens on the Gemini 1.5 Flash 002 model. In the end, he actually paid nothing because Google AI Studio is currently free for some types of use.

Video scraping is just one of many new tricks possible when the latest large language models (LLMs), such as Google’s Gemini and GPT-4o, are actually “multimodal” models, allowing audio, video, image, and text input. These models translate any multimedia input into tokens (chunks of data), which they use to make predictions about which tokens should come next in a sequence.

A term like “token prediction model” (TPM) might be more accurate than “LLM” these days for AI models with multimodal inputs and outputs, but a generalized alternative term hasn’t really taken off yet. But no matter what you call it, having an AI model that can take video inputs has interesting implications, both good and potentially bad.

Breaking down input barriers

Willison is far from the first person to feed video into AI models to achieve interesting results (more on that below, and here’s a 2015 paper that uses the “video scraping” term), but as soon as Gemini launched its video input capability, he began to experiment with it in earnest.

In February, Willison demonstrated another early application of AI video scraping on his blog, where he took a seven-second video of the books on his bookshelves, then got Gemini 1.5 Pro to extract all of the book titles it saw in the video and put them in a structured, or organized, list.

Converting unstructured data into structured data is important to Willison, because he’s also a data journalist. Willison has created tools for data journalists in the past, such as the Datasette project, which lets anyone publish data as an interactive website.

To every data journalist’s frustration, some sources of data prove resistant to scraping (capturing data for analysis) due to how the data is formatted, stored, or presented. In these cases, Willison delights in the potential for AI video scraping because it bypasses these traditional barriers to data extraction.

“There’s no level of website authentication or anti-scraping technology that can stop me from recording a video of my screen while I manually click around inside a web application,” Willison noted on his blog. His method works for any visible on-screen content.

Video is the new text

An illustration of a cybernetic eyeball.

An illustration of a cybernetic eyeball.

An illustration of a cybernetic eyeball. Credit: Getty Images

The ease and effectiveness of Willison’s technique reflect a noteworthy shift now underway in how some users will interact with token prediction models. Rather than requiring a user to manually paste or type in data in a chat dialog—or detail every scenario to a chatbot as text—some AI applications increasingly work with visual data captured directly on the screen. For example, if you’re having trouble navigating a pizza website’s terrible interface, an AI model could step in and perform the necessary mouse clicks to order the pizza for you.

In fact, video scraping is already on the radar of every major AI lab, although they are not likely to call it that at the moment. Instead, tech companies typically refer to these techniques as “video understanding” or simply “vision.”

In May, OpenAI demonstrated a prototype version of its ChatGPT Mac App with an option that allowed ChatGPT to see and interact with what is on your screen, but that feature has not yet shipped. Microsoft demonstrated a similar “Copilot Vision” prototype concept earlier this month (based on OpenAI’s technology) that will be able to “watch” your screen and help you extract data and interact with applications you’re running.

Despite these research previews, OpenAI’s ChatGPT and Anthropic’s Claude have not yet implemented a public video input feature for their models, possibly because it is relatively computationally expensive for them to process the extra tokens from a “tokenized” video stream.

For the moment, Google is heavily subsidizing user AI costs with its war chest from Search revenue and a massive fleet of data centers (to be fair, OpenAI is subsidizing, too, but with investor dollars and help from Microsoft). But costs of AI compute in general are dropping by the day, which will open up new capabilities of the technology to a broader user base over time.

Countering privacy issues

As you might imagine, having an AI model see what you do on your computer screen can have downsides. For now, video scraping is great for Willison, who will undoubtedly use the captured data in positive and helpful ways. But it’s also a preview of a capability that could later be used to invade privacy or autonomously spy on computer users on a scale that was once impossible.

A different form of video scraping caused a massive wave of controversy recently for that exact reason. Apps such as the third-party Rewind AI on the Mac and Microsoft’s Recall, which is being built into Windows 11, operate by feeding on-screen video into an AI model that stores extracted data into a database for later AI recall. Unfortunately, that approach also introduces potential privacy issues because it records everything you do on your machine and puts it in a single place that could later be hacked.

To that point, although Willison’s technique currently involves uploading a video of his data to Google for processing, he is pleased that he can still decide what the AI model sees and when.

“The great thing about this video scraping technique is that it works with anything that you can see on your screen… and it puts you in total control of what you end up exposing to the AI model,” Willison explained in his blog post.

It’s also possible in the future that a locally run open-weights AI model could pull off the same video analysis method without the need for a cloud connection at all. Microsoft Recall runs locally on supported devices, but it still demands a great deal of unearned trust. For now, Willison is perfectly content to selectively feed video data to AI models when the need arises.

“I expect I’ll be using this technique a whole lot more in the future,” he wrote, and perhaps many others will, too, in different forms. If the past is any indication, Willison—who coined the term “prompt injection” in 2022—seems to always be a few steps ahead in exploring novel applications of AI tools. Right now, his attention is on the new implications of AI and video, and yours probably should be, too.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Cheap AI “video scraping” can now extract data from any screen recording Read More »