Dissecting a Multi-Stage Infostealer Malware Package

The Python package threadfluent, discovered on PyPI, has been identified as malicious, containing a conventional infostealer malware. However, its defining characteristic is the use of advanced code obfuscation techniques that obscure its payload, making analysis particularly challenging. This malware utilizes three layers of obfuscation iteratively across its attack chain, creating significant hurdles for manual deconstruction.

Our Malware Early Warning team reported the package to the PyPI security team, who quickly removed it. In this article, we explain the stages of threadfluent‘s attack and the methods it uses to hide its purpose, focusing on how infostealer malware and code obfuscation work together.

Cheat Sheet: Infostealer Malware and Code Obfuscation

What is Infostealer Malware?

Infostealer malware acts like a sneaky thief in your digital world. It quietly grabs sensitive information, such as your passwords, credit card numbers, or personal details, without you noticing. These programs record what you type (keylogging), hijack your browser, or scan files on your device to collect valuable data. The scariest part? They excel at staying under the radar.

What is Code Obfuscation?

Code Obfuscation is basically the art of hiding things in plain sight, often used by hackers to make their malicious code harder to spot or understand. Imagine someone writing code in a way that’s deliberately confusing, encrypting parts of it, or scrambling how it looks to throw off antivirus tools or researchers. It’s a clever but frustrating tactic that helps malware stay active longer and avoid detection.

Obfuscation Layers of ThreadFluent Infostealer Malware

1. Triple Encryption

The obfuscation process begins with a heavily encrypted payload embedded in the fluent.py file. The initial encoding and decryption process can be summarized as follows:

encoded_data = base64_module.b64decode("... BASE-6 with encrypted payload, IV and RSA-encrypted keys…”);
# … Sequential decryption: ChaCha20, Blowfish and AES …
encrypted_payload = AES_Cipher.new(aes_key, AES_Cipher.MODE_CBC, iv=aes_iv).decrypt(encrypted_payload[16:])
final_payload = unpad_function(encrypted_payload, AES_Cipher.block_size)
exec( zlib_module.decompress(bz2_module.decompress(final_payload)) )

This obfuscation process involves:

Compression: Using gzip and bzip.
Symmetric encryption: Chained encryption with ChaCha20, Blowfish, and AES.
Asymmetric encryption: RSA used to encrypt the symmetric keys.

The result is encoded in Base64 and inserted as encoded_data. Each iteration builds on the previous, creating a nested encryption structure that makes it exceptionally difficult to uncover the actual payload without reverse engineering.

A Quick Note Maybe you’re thinking this is just an encryption technique, but it’s more like a matryoshka doll of encryption designed specifically to obfuscate. Each layer adds complexity to hide the malware’s true intent.

2. Gzip Decompression

In many intermediate steps, the malware employs a straightforward but effective obfuscation method: Gzip compression combined with Base64 encoding.

import base64, zlib; eval(zlib.decompress(base64.b64decode(<STR>).decode('utf-8'))

This technique ensures the obfuscated payload remains hidden throughout the stages.

3. Code Concatenation and Dynamic Compilation

The final layer involves splitting the obfuscated payload across multiple variables, which are dynamically concatenated and decoded during execution:

import base64, zlib

FEQXCBDOZKHVIhT = '...'
# ....(111 other variables with base-64 encoded values) ...

zZytWuvuEdBXSrO =
  eval(<FEQXCBDOZKHVIhT' encoded with '\x>) +
  eval(<next variable, encoded>) +
  eval( ' zlib.decompress(base64.b64decode(<VAR>)).decode('utf-8') ') + ...

eval(compile(base64.b64decode(eval(zZytWuvuEdBXSrO)).decode('utf-8'), ‘<app>', 'exec'))

This stage is particularly cumbersome to reverse engineer manually, as each variable must be decoded and recombined to reveal the next payload.

By combining these obfuscation techniques, the infostealer malware in threadfluent evades detection, showcasing the critical role of Code Obfuscation in modern cyberattacks.

The Multi-Stage Attack Chain of ThreadFluent

The threadfluent infostealer malware comprises four distinct stages, each introducing additional layers of Code Obfuscation. These stages serve to not only deliver the payload but also protect it against detection and analysis.

Stage 0: Initial Code Execution

The entry point is the __init__.py file, which initiates execution of the obfuscated payload in fluent.py. This file uses multi-layered obfuscation (as described in steps 1–3) to produce the Stage 1 dropper:

import os,threading
def main():
    if os.name == 'nt':
        from . import fluent
        fluent.main()
        
thread = threading.Thread(target=main , daemon=True)
thread.start()

Stage 1: Initial Dropper

The first dropper, once deobfuscated, downloads and embeds additional malicious code, puts.py, from a GitHub repository Red-haired-shanks-1337/repuests (warning: potential malicious code here).

The dropper embeds puts.py into the local installation of the popular requests library, modifying its __init__.py file to run deobfuscation and execute the next stage. This deobfuscation consists of reversing many applications of the obfuscation layers 1 … 3.

Stage 2: Second Dropper

The second dropper, once deobfuscated following the same process, downloads yet another payload, ssl.py from the GitHub repository, renames it as udp.py, and generates a lockfile to ensure that the malware does not reinstall repeatedly. Then it is run to deobfuscate it, resulting in execution of the final infostealer payload.

Stage 3: Final Infostealer

The final stage is a fully functional infostealer that employs advanced evasion techniques:

Anti-analysis features:
- Detects antimalware tools and terminates if any blacklisted processes, IPs, hostnames, or virtual environments are detected.
Data exfiltration:
- Captures exhaustive information. Ranging from, IP, Mac address, VPN data, browser cookies/history, credit card data and crypto wallet credentials among others, including a screenshot.
- Sends stolen data (encrypted) via a Telegram channel, using the template shown here:

Deobfuscation Strategy

To analyze similar packages, researchers must systematically reverse its obfuscation layers:

Emulate Obfuscation Steps: Write scripts to simulate the transformations (e.g., decryption, decompression) without executing the payload.
Reverse Encryption Layers: Extract and decrypt the payload using the embedded keys and algorithms.
Iterative Analysis: Process each intermediate stage iteratively to reconstruct the final payload.

Automated tools and sandboxed environments are essential to streamline this analysis.

Want to learn more about staying ahead of threats in real time?

Download our whitepaper, 'Early Warning: Real-Time Threat Detection and Prioritization,' and discover how to safeguard your software supply chain.

Download Now

The Threat Actor

Apparently the package was published from two PyPI accounts, ABIRHOSSAIN10 and anomilano785, currently removed.

There is a (currently active) GitHub account named ABIRHOSSAIN10, which could be related to the campaign. The Red-haired-shanks-1337 GitHub user that owns the repository used for downloading stage 2 and 3 payloads is also under analysis.

Although attribution is always complicated and membership to a red team is a possibility, the kind and volume of information exfiltrated hints at a malicious actor.

Conclusion: The Role of Code Obfuscation in Malware Evolution

The threadfluent package exemplifies modern malware obfuscation in its multi-stage delivery. Coupled with the evasion techniques employed, this package demonstrates a clear effort to evade detection and resist analysis.

This analysis underscores the importance of collaborative efforts and automated tools in combating such threats. By dissecting and understanding these methods, security professionals can develop better defenses against similar malware in the future.

How Xygeni’s MEW System Prevented the Malicious threadfluent Package from Reaching Production

Xygeni’s Malware Early Warning (MEW) system was instrumental in stopping the threadfluent Python package, a malicious dependency containing advanced code obfuscation and a multi-stage infostealer malware, before it could compromise production environments. Here’s how:

Detection Through Static Analysis
MEW spotted threadfluent’s hidden code, finding suspicious signs like layered encryption and unusual dependency activity.
Runtime Reachability Assessment
The system found that the package’s code could run during use, marking it as a serious threat without actually running it.
CI/CD Pipeline Protection
MEW blocked threadfluent in real-time, stopped it from being included in builds, and prevented it from reaching production. It sent developers actionable alerts, saving valuable time.
Threat Intelligence Sharing
Xygeni quickly reported the package to PyPI and shared its findings with its global security network, stopping it from spreading further.

By catching threadfluent early, Xygeni’s MEW system made sure the malicious code didn’t harm software supply chains or CI/CD workflows.

Protect Your CI/CD Pipelines for Free

Prevent threats like threadfluent with Xygeni’s Malware Early Warning system. Start your free trial today and secure your software supply chain.

Luis Rodríguez

Luis Rodriguez is a physicist + mathematician and CISSP. Currently co-founder and CTO at Xygeni Security. He has over 20 years of experience in software security and participated in projects like SAST and SCA. He currently focuses on software supply chain security.

Daniel Martín

Daniel Martín is a cybersecurity expert and member of the Xygeni research security team, specializing in SDLC security and application vulnerability mitigation.

Dissecting a Multi-Stage Infostealer Malware Package

Table of Contents

Must-Read posts

Latest posts of interest