image recognition

New camera design can ID threats faster, using less memory

Image out the windshield of a car, with other vehicles highlighted by computer-generated brackets.

Elon Musk, back in October 2021, tweeted that “humans drive with eyes and biological neural nets, so cameras and silicon neural nets are only way to achieve generalized solution to self-driving.” The problem with his logic has been that human eyes are way better than RGB cameras at detecting fast-moving objects and estimating distances. Our brains have also surpassed all artificial neural nets by a wide margin at general processing of visual inputs.

To bridge this gap, a team of scientists at the University of Zurich developed a new automotive object-detection system that brings digital camera performance that’s much closer to human eyes. “Unofficial sources say Tesla uses multiple Sony IMX490 cameras with 5.4-megapixel resolution that [capture] up to 45 frames per second, which translates to perceptual latency of 22 milliseconds. Comparing [these] cameras alone to our solution, we already see a 100-fold reduction in perceptual latency,” says Daniel Gehrig, a researcher at the University of Zurich and lead author of the study.

Replicating human vision

When a pedestrian suddenly jumps in front of your car, multiple things have to happen before a driver-assistance system initiates emergency braking. First, the pedestrian must be captured in images taken by a camera. The time this takes is called perceptual latency—it’s a delay between the existence of a visual stimuli and its appearance in the readout from a sensor. Then, the readout needs to get to a processing unit, which adds a network latency of around 4 milliseconds.

The processing to classify the image of a pedestrian takes further precious milliseconds. Once that is done, the detection goes to a decision-making algorithm, which takes some time to decide to hit the brakes—all this processing is known as computational latency. Overall, the reaction time is anywhere between 0.1 to half a second. If the pedestrian runs at 12 km/h they would travel between 0.3 and 1.7 meters in this time. Your car, if you’re driving 50 km/h, would cover 1.4 to 6.9 meters. In a close-range encounter, this means you’d most likely hit them.

Gehrig and Davide Scaramuzza, a professor at the University of Zurich and a co-author on the study, aimed to shorten those reaction times by bringing the perceptual and computational latencies down.

The most straightforward way to lower the former was using standard high-speed cameras that simply register more frames per second. But even with a 30-45 fps camera, a self-driving car would generate nearly 40 terabytes of data per hour. Fitting something that would significantly cut the perceptual latency, like a 5,000 fps camera, would overwhelm a car’s onboard computer in an instant—the computational latency would go through the roof.

So, the Swiss team used something called an “event camera,” which mimics the way biological eyes work. “Compared to a frame-based video camera, which records dense images at a fixed frequency—frames per second—event cameras contain independent smart pixels that only measure brightness changes,” explains Gehrig. Each of these pixels starts with a set brightness level. When the change in brightness exceeds a certain threshold, the pixel registers an event and sets a new baseline brightness level. All the pixels in the event camera are doing that continuously, with each registered event manifesting as a point in an image.

This makes event cameras particularly good at detecting high-speed movement and allows them to do so using far less data. The problem with putting them in cars has been that they had trouble detecting things that moved slowly or didn’t move at all relative to the camera. To solve that, Gehrig and Scaramuzza went for a hybrid system, where an event camera was combined with a traditional one.

Famous xkcd comic comes full circle with AI bird-identifying binoculars

AI, AX Visio, AX Visio 10x32, Binoculars, Biz & IT, image recognition, machine learning, machine vision, Swarovski Optik, tasks, xkcd / Kris Guyer / January 15, 2024

Who watches the bird watchers —

Swarovski AX Visio, billed as first “smart binoculars,” names species and tracks location.

Benj Edwards – Jan 15, 2024 6: 04 pm UTC

Enlarge / The Swarovski Optik Visio binoculars, with an excerpt of a 2014 xkcd comic strip called “Tasks” in the corner.

xckd / Swarovski

Last week, Austria-based Swarovski Optik introduced the AX Visio 10×32 binoculars, which the company says can identify over 9,000 species of birds and mammals using image recognition technology. The company is calling the product the world’s first “smart binoculars,” and they come with a hefty price tag—$4,799.

“The AX Visio are the world’s first AI-supported binoculars,” the company says in the product’s press release. “At the touch of a button, they assist with the identification of birds and other creatures, allow discoveries to be shared, and offer a wide range of practical extra functions.”

The binoculars, aimed mostly at bird watchers, gain their ability to identify birds from the Merlin Bird ID project, created by Cornell Lab of Ornithology. As confirmed by a hands-on demo conducted by The Verge, the user looks at an animal through the binoculars and presses a button. A red progress circle fills in while the binoculars process the image, then the identified animal name pops up on the built-in binocular HUD screen within about five seconds.

In 2014, a famous xkcd comic strip titled Tasks depicted someone asking a developer to create an app that, when a user takes a photo, will check whether the user is in a national park (deemed easy due to GPS) and check whether the photo is of a bird (to which the developer says, “I’ll need a research team and five years”). The caption below reads, “In CS, it can be hard to explain the difference between the easy and the virtually impossible.”

The xkcd comic titled “Tasks” from September 24, 2014.

It’s been just over nine years since the comic was published, and while identifying the presence of a bird in a photo was solved some time ago, these binoculars arguably go further by identifying the species of the bird in the photo (it also keeps track of location due to GPS). While apps to identify bird species already exist, this feature is now packed into a handheld pair of binoculars.

According to Swarovski, the development of the AX Visio took approximately five years, involving around 390 “hardware parts.” The binoculars incorporate a neural processing unit (NPU) for object recognition processing. The company claims that the device will have a long product life cycle, with ongoing updates and improvements. The company also mentions “an open programming interface” in its press release, potentially allowing industrious users (or handy hackers) to expand the unit’s features over time.

The Swarovski Optik Visio binoculars.

Swarovski Optik
The Swarovski Optik Visio binoculars.

Swarovski Optik
The Swarovski Optik Visio binoculars.

Swarovski Optik

The binoculars, which feature industrial design from Marc Newson, include built-in digital camera, compass, GPS, and discovery-sharing features that can “immediately show your companion where you have seen an animal.” The Visio unit also wirelessly ties into the “SWAROVSKI OPTIK Outdoor App” that can run on a smartphone. The app manages sharing photos and videos captured through the binoculars. (As an aside, we’ve come a long way from computer-connected gadgets that required pesky serial cables in the late 1990s.)

Swarovski says the AX Visio will be available at select retailers and online starting February 1, 2024. While this tech is at a premium price right now, given the speed of tech progress and market competition, we may see similar image-recognizing features built into much cheaper models in the years ahead.

Famous xkcd comic comes full circle with AI bird-identifying binoculars Read More »