Late last week, just as the US Independence Day holiday was due to begin, a new ransomware attack emerged that targeted the servers and customers of Kaseya, a vendor of endpoint management tool VSA. So far, up to 1500 customers may have been subsequently attacked, with all Kaseya customers asked to shut down their VSA management servers. Is this event just another step in the ongoing ransomware battle, or does it raise broader issues about the state of on-premises infrastructure management principles?
From the information provided so far, it appears that the Kaseya incident was a supply chain attack by REvil, which targets software providers with solutions that manage thousands of on-premises IT infrastructure deployments. VSA inadvertently acts as a “super spreader” for ransomware, infecting thousands of customers. The impact so far has been to shut down Swedish grocery retailer Coop and many other smaller organisations, including dentists and nurseries; potentially around 60 MSPs and 1500 businesses. REvil is looking for $70 million to provide a universal decryption tool.
There are lots of online discussions as to how the attack took place, with one theory that the VSA servers were exposed to the Internet and subject to a SQL injection exploit. Kaseya is working with customers to identify potentially compromised infrastructure and put in remediation, however at the time of writing, both on-premises VSA and SaaS-based VSA solutions are unusable.
This link from Sophos has a good explanation of how the exploit works.
Attacking management infrastructure has emerged as a new opportunity for hackers, most notable recently with the SolarWinds attack on their Orion software. Hackers were able to insert malicious code into Orion updates that provided a back door to install further malware into the systems managed by Orion. The implications of this kind of hack are obvious – many thousands of IT infrastructure environments can be compromised, either for commercial exploitation or to infiltrate government organisations. The SolarWinds compromise method looks to have been different from the approach on Kaseya; we’ll discuss the nuances in a moment.
As we examine how IT organisations can mitigate the challenges of supply-chain attacks, it’s worth looking at the history of how we’ve ended up with the ability for hackers to easily infiltrate IT systems. In the past three decades, we’ve moved from the “walled garden” model for both data centres and access points to one where systems are globally connected, frequently using the open Internet. I’ve highlighted the change in a diagram I first created about eight years ago.
Over time, IT has changed significantly, but in gradual steps.
- Networking has become ubiquitous, moving from closed internal networks to widespread use of the global Internet. Many organisations leave management infrastructure on the open Internet, making them vulnerable to attacks. It appears this mistake was also made with Kaseya VSA servers.
- The Internet and global networks require applications to be exposed to the general public, again via public networks. Done right, this is not a problem but does introduce additional attack vectors that have to be mitigated.
- Applications are becoming more complex. Single-server instance applications have transitioned to virtual machines and containerised applications. What defines a single application could consist of components from many platforms, all linked together as micro-services.
- Hybrid cloud introduces multiple challenges for IT. Each cloud service provider implements services in unique ways, which are also different from on-premises tools. With many more implementations to learn, there’s always a risk of misconfiguration and inadvertent exposure.
- We’re still using platforms and tools on the open Internet that were designed for closed networks. Today’s Windows systems, for example, are derivations of a platform built before the widespread introduction of the Internet.
One final point to consider is how IT organisations update software.
Blind and Auto Updates
How many applications do you have on your smartphone that have an update pending? Currently, I have 113 and almost always over 100. I rarely update applications unless the modification fixes a vulnerability or the application stops working. My mainframe background includes work on software management using SMP/E from the 1990s. This laborious tool provided the process to load and deploy new IBM mainframe applications in a controlled and audited manner. I apply the same logic to application updates on my iPhone and other devices. Application updates are done when they’re needed, not just to reduce the number of updates pending.
All too often, I’ve found that “bug fix” updates have introduced new features and functionality that changes the way I use an application. I’d much prefer to have fix updates managed separately from application feature updates. Of course, on a mobile device, that level of complexity is generally too much for the average user and would put significant overhead on application developers to maintain multiple release threads.
However, I’ve also seen this kind of lazy behaviour in enterprise applications that change workflow and functionality while also fixing critical bugs that should be addressed separately. Admittedly, update workflow has improved in recent years, but general maintenance of software and patches isn’t seen as a glamorous or important task and commonly left to automation.
Another major challenge for IT organisations is managing trust and security. In the 1980s and 1990s, the administrator was all-powerful, including the backup admin who could access any files for any purpose. We all now know about implementing minimal trust rules, with elevated privileges used only where necessary.
However, despite these advances, centralised tools still have an all-or-nothing approach to systems management, using either domain-wide admin credentials or requiring significant effort to assign only the essential roles to management tools. It’s far too easy to simply grant global admin access and move on. I blame some of these issues on O/S vendors that have spent more time messing with the look and feel of their products rather than fixing core requirements like maintenance and patching.
Value the Administrator
I’m going to be somewhat controversial and suggest that a move to the DevOps world where everyone wants to be a Rockstar programmer is not a good long-term strategy for the enterprise. As a community, we need to go back to first principles and appreciate the work done by efficient system administrators. We need to stop taking shortcuts and bypassing process as a way to reduce costs. These short-term approaches have a significant impact in the long-run when a business is exposed through a data breach or ransomware attack.
I also believe we need to re-invest in infrastructure management, specifically in the processes used to deploy and update software. All privileges for applications should be at the lowest level possible. Platform and O/S vendors should ensure their products are always capable of exposing RBAC for functions that can be done without full administrator access. Most of all, we need to implement them in our IT infrastructure as quickly as possible.
What about the risk presented by vendors developing insecure software? Open Source could be one solution to this problem as it offers transparency of code development. However, there’s still a place for commercial software development. Businesses need to take time to deploy new solutions rather than rely on automated updates. I also think there’s a place for third-party vulnerability testing and more commercial software companies to offer bug-hunting bounties.
Overall though, simply avoiding putting management servers onto the open Internet, using private networks and VPNs, and implementing sensible security policies go a long way to protecting systems. That strategy needs to be accompanied by a well-implemented data protection process.
The Architect’s View™
As someone who has worked as an administrator, architect, and developer, I see each of these roles as unique, specific and with very different skill sets that don’t sit well together in a single resource. The move to generalists has resulted in a loss of the ability to focus on a specific task and do that task well. We need a renaissance in particular skills, even if that results in a higher cost for the business because the long-term cost could be the loss of the company itself.
It’s time to love your system administrator, and for businesses to value the role for the protection it delivers to the ongoing success of the company.
Copyright (c) 2007-2021 – Post #f686 – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.