At Xygeni, we have been busy over the past year implementing a comprehensive system that monitors public open-source package registries for suspicious activity or inherently malicious code.
This infrastructure allows us to analyze in real-time all the packages published daily. Also, it allows us to monitor anomalous user behaviors.
While the presence of malicious code in package registries is well-known, there are other strange or unusual behaviors that completely go unnoticed by the regular users of these registries. Even for those of us who keep a close watch on them, it is not easy to find a reason or explanation for such behavior.
What characterizes the Xygeni team is our persistence when facing a problem, so we dug deeper into this.
It all started with a stats spike
As I mentioned earlier, as part of our infrastructure, we monitor many activity metrics in public open-source package registries.
In the last days of March, my work colleague Carmen raised the alarm. We started to observe a substantial deviation in the number of packages published on NPM. After an initial review, we found that this deviation was primarily due to a relatively small number of users whose package publication rate had suddenly skyrocketed.
This prompted us to review our process and its associated data within the platform. Fortunately, our review determined that the data seemed to be correct, but this was also somewhat puzzling!
Upon further inspection, we found that these packages shared several common characteristics:
- The publishers were newly created users, no more than one or two months old at most.
- The packages had seemingly random names, probably generated by using a dictionary. We can find package names having strange names like enormous_mite-smiletea, erick-mangut94-sukiwir, exotic_reptile-appteadev or obvious_tuna-appteadev. We will include an appendix with some examples of the assets that we have found.
- The content of these packages was very similar, often the same.
In this context, we observed that one of the packages included a Python script, possibly by accident, which appears to be used to publish the packages automatically.
Unraveling the net
The publication activity for some of these publishers in the last month has been carried out according to the following timeline chart:
Here are the full statistics related to this event that we can share:
Nothing relevant emerged from the individual review of some of the packages associated with these users. This led us to analyze the overall picture of all of them, searching for patterns that might provide clues about this success.
After all, we have a good handful of packages to analyze:
These are the key points we can highlight from the analysis:
- Many of the NPM packages are interrelated. Some are referenced as dependencies of others.
- Despite being strange packages and probably no one knows about their existence, some have many weekly downloads. It is hard to accept that a normal user might want to download packages like 0mc03esisd.
- Some have associated repositories. Sometimes the owners of these repositories appear to be dummy accounts, similar to the ones on npm.
Note: Users upload additional packages in waves, so the provided data may vary slightly.
This is an example of one of the related repositories. It contains the code of some of the published packages. Also some other files, like a tea.yaml file.
Meeting the actor(s)
Whether this is the work of a single author or multiple collaborators is difficult to determine. However, we do have some evidence that narrows down the origin of this activity:
- Language: Certain comments in the code are in Indonesian.
- Location: Certain owners of the associated repositories indicate their location in Indonesia.
- The location that we can identify through the activity metadata on GitHub also points to Indonesia.
Five o’clock tea
Given that Indonesia was a British colony, it would fit perfectly here to have tea. However, in this case, it’s a rather special kind of tea.
Do you remember the tea.yaml files we saw at the beginning? At the time, we didn’t show their content, but it looked like this:
These files appear to be related to the Tea protocol. But what exactly is it? We reached https://tea.xyz/blog/250k-grant-for-open-source-developers, that talks about the purpose:
Tea is shaking up the digital world by addressing the long-standing issue of inadequate compensation for open-source developers. Solving this issue is more urgent than ever, which is why we have decided to deploy $250K in grants ahead of the launch of the protocol. This initial stage aims to support maintainers of open-source projects that have a material impact on the open-source software ecosystem and a teaRank greater than 30 ahead of the tea Protocol Incentivized Testnet.”
According to the tea protocol docs:
“For a project to be registered on the tea Protocol it requires a tea.yaml file which serves as the project's constitution to govern its number of contributors and number of votes required to carry out certain actions.”
After accessing the tea network we tried to find one of the packages that we suspect is getting its reputation inflated. We found it:
That seems to explain what is going on here: Certain users inflating their open source projects with fictitious popularity in order to hijack the teaRank and get the benefits provided by the tea association. After all, money is the oldest driving force, along with sex.
Conclusion and final thoughts
Although no evidence of malicious code has been found in these NPM packages, and they likely did not intend for anyone to use them, they may potentially violate several Terms of Service (TOS) of NPM, GitHub, or the tea association. However, this falls outside the scope of our work (they have been notified though and the actions that they deem appropriate will be taken). The abusive use of these platforms will affect us and our organizations in some way.
What is clear is that any seemingly innocuous open-source package can harbor behaviors difficult to understand without full contextual information.
As a developer, I find it challenging to manually control all the open-source packages used in the applications we develop and maintain with a certain level of assurance. Without the help of a tool that automates this control, the task could consume countless hours of our teams.
There is always the option to look the other way, trusting that the open-source realms are safe places. Fortunately, I believe this mindset is not very widespread today; this case has been practically harmless, but we recently experienced a much more serious incident with the XZ Backdoor, and it has not been the only one.
If you have read this far, I hope you have enjoyed all the intricacies of this curious event. Don’t forget to spend a few minutes reviewing what Xygeni can contribute to the security of your organization.
Appendix: List related assets to include in the final article
NPM more relevant users based on the number of published packages: vndra, wanzaty, artknight404, seblakkuah, Mikrositer, kellyman17
A tiny sample of their related packages: mikrositer, cryndex, arts-dao, seblakkuah, vndrabnb, vndrave, depfif, depsik, depeit, depnin, Dopon, kellymanteasproject.
Some of the related GitHub repositories containing tea.yml files: