GitHub is under automated attack by millions of cloned repositories filled with malicious code



Thanks to a combination of sophisticated methodology and social engineering, this particular attack seems to be very difficult to stop.


GitHub has evolved into a crucial hub for programmers globally, serving as an extensive repository and knowledge base for open-source coding projects, data storage, and code management. However, it's currently under siege by an automated attack involving the creation and cloning of numerous malicious code repositories. Despite efforts by developers to remove these repositories, a significant number persist, with new ones appearing regularly.

An unidentified attacker has orchestrated an automated process that forks and clones existing repositories, injecting its own malicious code obscured under seven layers of obfuscation, as reported by Ars Technica. These nefarious repositories closely resemble legitimate ones, leading some users to unwittingly fork them, inadvertently exacerbating the attack's impact.

Upon utilization of an affected repository, a concealed payload initiates the decryption of seven layers of obfuscation, revealing malicious Python code and a binary executable. This code then proceeds to harvest confidential data and login credentials before transmitting them to a control server.

The research and data teams at security provider Apiiro have been closely monitoring a resurgence of the attack since its initial minor occurrences in May of the previous year. Despite GitHub's prompt removal of affected repositories, Apiiro highlights that its automation detection system is still failing to catch many instances, allowing manually uploaded versions to slip through the cracks.

With the current magnitude of the attack, estimated by researchers to encompass millions of uploaded or forked repositories, even a 1% miss-rate translates to potentially thousands of compromised repositories still present on the platform.

Initially documented as relatively small-scale, the attack consisted of several packages containing early iterations of the malicious code. However, it has since evolved in both size and sophistication. Researchers have pinpointed several factors contributing to the success of the operation, including the vast user base of GitHub and the increasing complexity of the employed techniques.

What's particularly intriguing is the fusion of sophisticated automated attack techniques with basic human tendencies. While the methods of obfuscation have grown increasingly intricate, the attackers heavily rely on social engineering to confuse developers, leading them to unwittingly select the malicious code over legitimate versions, thereby perpetuating its spread. This compounded effect makes the attack significantly harder to identify and mitigate.

This approach has proven remarkably effective thus far. Although GitHub has not directly addressed the attack, it released a general statement reassuring users. They emphasized their commitment to detecting, analyzing, and removing content and accounts that violate their Acceptable Use Policies. GitHub employs both manual reviews and scalable detection methods, utilizing machine learning that continuously evolves and adapts to counter adversarial attacks.

The challenges of popularity have indeed come to the forefront in this scenario. Despite its indispensable role as a global resource for developers, GitHub's open-source framework and extensive user base have seemingly rendered it vulnerable. The effectiveness of the attack method underscores the difficulty in completely resolving the issue, presenting GitHub with an uphill battle that it has yet to surmount entirely.