Last year saw some egregious attacks which called into question the security of the open source software supply chain. Hundreds of thousands of computers were infected by a deliberately corrupted version of a free security software utility, CCleaner. The same week, a different group of hackers — and then a team of a security researcher and a journalist — added deliberately corrupted Python libraries to the Python Package Index (PyPI), Python’s public package repository, which were unwittingly incorporated into apps by thousands of Python programmers working at corporate, government, and military sites.
So, does this mean that open source software is safe to use again?
Not exactly. To understand how they must defend themselves, enterprises need to understand how the open source software supply chain works. Nearly every device in our lives contains a complex system of embedded open source software and runtime libraries.
Open source software development has led to the existence of packages and libraries for pretty much any common task. Anyone can create packages, and anyone can use others’ packages. This promiscuous sharing increases productivity for everyone — developers can borrow from and improve upon one others’ work, reducing the amount of code that they must write individually.
Unfortunately, this makes it difficult to understand who has uploaded what software, and people can maliciously alter a package or library in the supply chain. In the case of PyPi, the attackers used “typosquatting.” They uploaded a library called “bzip” that impersonated “bz2file.” Many casual library users didn’t know the difference and when they used the altered library it pinged a tracking website so that the authors could see where it was being used. In another attack, someone simply submitted a new version of an existing standard library package — so the name was identical, but the new version was malicious.
To complicate matters, widespread infection often isn’t an attacker’s motivation. In the case of CCleaner, the over 100,000 infected machines were just collateral damage; the attackers were after just 18 or so targeted companies. All they needed was one compromised package used by the target. It didn’t have to be a popular one.
The Python Foundation, GitHub, and others have made important steps in catching these types of vulnerabilities, but there is more that enterprises and the broader open source community can do to prevent them.
Enterprises can operate their own private package repositories that are controlled and audited, typically by the IT organization. This way they can control which versions of packages can be used, and the right services will be notified of security vulnerabilities that need addressing. Another technique is “version pinning,” in which the organization deliberately limits libraries to known good version(s). The organization must actively manage versioning and dependencies, but there are various tools available to simplify and automate the process. This can solve for a new, malicious version of an existing package, and for the situation where you want everyone to start using a new version, in the event a major vulnerability is discovered in the current version (e.g. the Equifax case).
Meanwhile, open source communities must grapple with the unfettered access to packages — a difficult task when this access is what has made many of these communities so productive and innovative in the first place. Security scanning and package signing are techniques used by commercial app stores, such as Apple and Windows, but these can be hard for open source communities to scale. Still, simply curating a package repository, even without signing, can be an effective barrier.
Regardless of your role in the open source supply chain, more caution must be given to security if we are to prevent future attacks. Security professionals are familiar with security through obscurity, the mistaken belief that if software is hard to understand it’s hard to attack. Last year’s attacks show that insecurity through promiscuity — the unmanaged incorporation of open source code into the software supply chain — is our new reality.