microsoft azure

crowdstrike-fixes-start-at-“reboot-up-to-15-times”-and-get-more-complex-from-there

CrowdStrike fixes start at “reboot up to 15 times” and get more complex from there

turning it off and back on again, and again, and again —

Admins can also restore backups or manually delete CrowdStrike’s buggy driver.

CrowdStrike fixes start at “reboot up to 15 times” and get more complex from there

Airlines, payment processors, 911 call centers, TV networks, and other businesses have been scrambling this morning after a buggy update to CrowdStrike’s Falcon security software caused Windows-based systems to crash with a dreaded blue screen of death (BSOD) error message.

We’re updating our story about the outage with new details as we have them. Microsoft and CrowdStrike both say that “the affected update has been pulled,” so what’s most important for IT admins in the short term is getting their systems back up and running again. According to guidance from Microsoft, fixes range from annoying but easy to incredibly time-consuming and complex, depending on the number of systems you have to fix and the way your systems are configured.

Microsoft’s Azure status page outlines several fixes. The first and easiest is simply to try to reboot affected machines over and over, which gives affected machines multiple chances to try to grab CrowdStrike’s non-broken update before the bad driver can cause the BSOD. Microsoft says that some of its customers have had to reboot their systems as many as 15 times to pull down the update.

Early guidance for fixing the CrowdStrike bug is simply to reboot systems over and over again so that they can try to grab a non-broken update.

Enlarge / Early guidance for fixing the CrowdStrike bug is simply to reboot systems over and over again so that they can try to grab a non-broken update.

Microsoft

If rebooting doesn’t work

If rebooting multiple times isn’t fixing your problem, Microsoft recommends restoring your systems using a backup from before 4: 09 UTC on July 18 (just after midnight on Friday, Eastern time), when CrowdStrike began pushing out the buggy update. Crowdstrike says a reverted version of the file was deployed at 5: 27 UTC.

If these simpler fixes don’t work, you may need to boot your machines into Safe Mode so you can manually delete the file that’s causing the BSOD errors. For virtual machines, Microsoft recommends attaching the virtual disk to a known-working repair VM so the file can be deleted, then reattaching the virtual disk to its original VM.

The file in question is a CrowdStrike driver located at Windows/System32/Drivers/CrowdStrike/C-00000291*.sys. Once it’s gone, the machine should boot normally and grab a non-broken version of the driver.

Deleting that file on each and every one of your affected systems individually is time-consuming enough, but it’s even more time-consuming for customers using Microsoft’s BitLocker drive encryption to protect data at rest. Before you can delete the file on those systems, you’ll need the recovery key that unlocks those encrypted disks and makes them readable (normally, this process is invisible, because the system can just read the key stored in a physical or virtual TPM module).

This can cause problems for admins who aren’t using key management to store their recovery keys, since (by design!) you can’t access a drive without its recovery key. If you don’t have that key, Cryptography and infrastructure engineer Tony Arcieri on Mastodon compared this to a “self-inflicted ransomware attack,” where an attacker encrypts the disks on your systems and withholds the key until they get paid.

And even if you do have a recovery key, your key management server might also be affected by the CrowdStrike bug.

We’ll continue to track recommendations from Microsoft and CrowdStrike about fixes as each company’s respective status pages are updated.

“We understand the gravity of the situation and are deeply sorry for the inconvenience and disruption,” wrote CrowdStrike CEO George Kurtz on X, formerly Twitter. “We are working with all impacted customers to ensure that systems are back up and they can deliver the services their customers are counting on.”

CrowdStrike fixes start at “reboot up to 15 times” and get more complex from there Read More »

major-outages-at-crowdstrike,-microsoft-leave-the-world-with-bsods-and-confusion

Major outages at CrowdStrike, Microsoft leave the world with BSODs and confusion

Y2K24 —

Nobody’s sure who’s at fault for each outage: Microsoft, CrowdStrike, or both.

A passenger sits on the floor as long queues form at the check-in counters at Ninoy Aquino International Airport, on July 19, 2024 in Manila, Philippines.

Enlarge / A passenger sits on the floor as long queues form at the check-in counters at Ninoy Aquino International Airport, on July 19, 2024 in Manila, Philippines.

Ezra Acayan/Getty Images

Millions of people outside the IT industry are learning what CrowdStrike is today, and that’s a real bad thing. Meanwhile, Microsoft is also catching blame for global network outages, and between the two, it’s unclear as of Friday morning just who caused what.

After cybersecurity firm CrowdStrike shipped an update to its Falcon Sensor software that protects mission-critical systems, blue screens of death (BSODs) started taking down Windows-based systems. The problems started in Australia and followed the dateline from there.

TV networks, 911 call centers, and even the Paris Olympics were affected. Banks and financial systems in India, South Africa, Thailand, and other countries fell as computers suddenly crashed. Some individual workers discovered that their work-issued laptops were booting to blue screens on Friday morning. The outages took down not only Starbucks mobile ordering, but also a single motel in Laramie, Wyoming.

Airlines, never the most agile of networks, were particularly hard-hit, with American Airlines, United, Delta, and Frontier among the US airlines overwhelmed Friday morning.

CrowdStrike CEO “deeply sorry”

Fixes suggested by both CrowdStrike and Microsoft for endlessly crashing Windows systems range from “reboot it up to 15 times” to individual driver deletions within detached virtual OS disks. The presence of BitLocker drive encryption on affected devices further complicates matters.

CrowdStrike CEO George Kurtz posted on X (formerly Twitter) at 5: 45 am Eastern time that the firm was working on “a defect found in a single content update for Windows hosts,” with Mac and Linux hosts unaffected. “This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed,” Kurtz wrote. Kurtz told NBC’s Today Show Friday morning that CrowdStrike is “deeply sorry for the impact that we’ve caused to customers.”

As noted on Mastodon by LittleAlex, Kurtz was the Chief Technology Officer of security firm McAfee when, in April 2010, that firm sent an update that deleted a crucial Windows XP file that caused widespread outages and required system-by-system file repair.

The costs of such an outage will take some time to be known, and will be hard to measure. Cloud cost analyst CloudZero estimated mid-morning Friday that the CrowdStrike incident had already cost $24 billion, based on a previous estimate.

Multiple outages, unclear blame

Microsoft services were, in a seemingly terrible coincidence, also down overnight Thursday into Friday. Multiple Azure services went down Thursday evening, with the cause cited as “a backend cluster management workflow [that] deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region.”

A spokesperson for Microsoft told Ars in a statement Friday that the CrowdStrike update was not related to its July 18 Azure outage. “That issue has fully recovered,” the statement read.

News reporting on these outages has so far blamed either Microsoft, CrowdStrike, or an unclear mixture of the two as the responsible party for various outages. It may be unavoidable, given that the outages are all happening on one platform, Windows. Microsoft itself issued an “Awareness” regarding the CrowdStrike BSOD issue on virtual machines running Windows. The firm was frequently updating it Friday, with a fix that may or may not surprise IT veterans.

“We’ve received feedback from customers that several reboots (as many as 15 have been reported) may be required, but overall feedback is that reboots are an effective troubleshooting step at this stage,” Microsoft wrote in the bulletin. Alternately, Microsoft recommend customers that have a backup from “before 19: 00 UTC on the 18th of July” restore it, or attach the OS disk to a repair VM to then delete the file (Windows/System32/Drivers/CrowdStrike/C00000291*.sys) at the heart of the boot loop.

Security consultant Troy Hunt was quoted as describing the dual failures as “the largest IT outage in history,” saying, “basically what we were all worried about with Y2K, except it’s actually happened this time.”

United Airlines told Ars that it was “resuming some flights, but expect schedule disruptions to continue throughout Friday,” and had issued waivers for customers to change travel plans. American Airlines posted early Friday that it had re-established its operations by 5 am Eastern, but expected delays and cancellations throughout Friday.

Ars has reached out to CrowdStrike for comment and will update this post with response.

This is a developing story and this post will be updated as new information is available.

Major outages at CrowdStrike, Microsoft leave the world with BSODs and confusion Read More »

microsoft-releases-initial-azure-cloud-rendering-support-for-quest-2-&-quest-pro

Microsoft Releases Initial Azure Cloud Rendering Support for Quest 2 & Quest Pro

Microsoft announced it’s released a public preview of Azure Remote Rendering support for Meta Quest 2 and Quest Pro, something that promises to allow devs to render complex 3D content in the cloud and stream it to those VR headsets in real-time.

Azure Remote Rendering, which already supports desktop and the company’s AR headset HoloLens 2, notably uses a hybrid rendering approach to combine remotely rendered content with locally rendered content.

Now supporting Quest 2 and Quest Pro, developers are able to integrate Microsoft’s Azure cloud rendering capabilities to do things like view large and complex models on Quest.

Microsoft says in a developer blog post that one such developer Fracture Reality has already integrated Azure Remote Rendering into its JoinXR platform, enhancing its CAD review and workflows for engineering clients.

Image courtesy Microsoft, Fracture Reality

The JoinXR model above was said to take 3.5 minutes to upload and contains 12.6 million polygons and 8K images.

While streaming XR content from the cloud isn’t a new phenomenon—Nvidia initially released its own CloudXR integration for AWS, Microsoft Azure, and Google Cloud in 2021—Microsoft offering direct integration is a hopeful sign that the company hasn’t given up on VR, and is actively looking to bring enterprise deeper into the fold.

If you’re looking to integrate Azure’s cloud rendering tech into your project, check out Microsoft’s step-by-step guide here.

Microsoft Releases Initial Azure Cloud Rendering Support for Quest 2 & Quest Pro Read More »