The Commit That Opened a Backdoor
A new feature has been merged to handle user-uploaded objects. Everything passes tests, but weeks later, a pentest reveals attackers can execute commands on the server; the result of insecure deserialization hidden in the code. The gap from deserialization to remote code execution can be dangerously small, especially when serialized data from open source packages or internal services is trusted without validation.
What is Deserialization in Code, Why It’s Risky, and Where It Hides
In developer terms, what is deserialization? It’s the process of taking structured data, JSON, XML, binary formats, or language-specific serialized objects, and turning them back into objects in memory.
On its own, deserialization is harmless and common, for example:
- Java: Reading objects with ObjectInputStream
- Python: Loading data with pickle.load
- Node.js: Parsing JSON with JSON.parse
The risk appears when deserialization is applied to untrusted data without validation. An attacker can craft input that triggers a gadget chain, existing code paths used in unintended ways, which can lead to remote code execution.
Unsafe deserialization is often found in:
- Open source packages with unsafe defaults
- Custom code that assumes serialized input is trustworthy
- Third-party APIs are returning serialized objects without verification
- Build artifacts such as:
- Java .ser files containing pre-serialized objects.
- Python .pkl model files in machine learning workflows.
- Serialized configuration objects embedded in Docker images or deployment containers
- Java .ser files containing pre-serialized objects.
- Test fixtures such as:
- Old serialized test data copied from production snapshots.
- Serialized payloads downloaded from external repositories for performance or regression tests.
- Old serialized test data copied from production snapshots.
These files can be introduced into a repository and loaded automatically during tests or deployments, triggering unsafe deserialization in CI/CD pipelines before the code even reaches production.
From Insecure Deserialization to Remote Code Execution: The Attack Path
The exploitation chain for insecure deserialization often follows one of these patterns:
- Untrusted input enters the application.
- Unsafe deserialization recreates objects without restrictions.
- A gadget chain triggers existing functionality in unintended ways.
- The attacker escalates privileges and achieves remote code execution.
Variants:
- Java gadget chains exploit old libraries like Apache Commons.
- Python .pkl model loading with embedded malicious objects.
- Node.js JSON parsing with eval() or dynamic imports.
Flow:
Untrusted Input → Deserialization → Gadget Chain → Remote Code Execution
How to Spot Insecure Deserialization Before Merge
Catching insecure deserialization before code merges is far cheaper than fixing it after deployment. A combination of automated analysis and proactive testing works best:
- Static Application Security Testing (SAST):
- Configure scanners to detect risky APIs like ObjectInputStream, pickle.load, and YAML.load without a safe loader.
- Scan both source code and build/test artifacts for insecure deserialization patterns.
- Display findings directly in pull requests so developers can address them before merging.
- Configure scanners to detect risky APIs like ObjectInputStream, pickle.load, and YAML.load without a safe loader.
- CI/CD Integration:
Example workflow:
sql
Commit → SAST scan → PR alert → Fix before merge
- Block merges on critical insecure deserialization findings to prevent unsafe code from reaching production branches.
- Unit Tests with Simulated Malicious Inputs:
- Create harmless, controlled payloads that mimic common serialized attack objects.
- Test how the application handles them; it should reject, sanitize, or log the input instead of processing it blindly.
- Include these tests in the automated pipeline so they run on every PR, catching unsafe deserialization behavior early.
- Ensure test payloads are non-executable and safe to store in the repository, focusing purely on detection logic.
- Create harmless, controlled payloads that mimic common serialized attack objects.
This layered approach combines automated scanning with developer-owned tests, ensuring that insecure deserialization paths are identified and removed long before they can become remote code execution vulnerabilities.
Prevention Strategies for Developers: Stopping Deserialization from Becoming Remote Code Execution
- Trust boundaries: Only deserialize from authenticated, verified sources.
- Safe APIs:
- Java: Secure libraries with validation.
- Python: Use json.loads() over pickle.loads() where possible.
- Node.js: Avoid eval() or dynamic code execution.
- Java: Secure libraries with validation.
- Allow-lists and schemas: Restrict allowed object types. Enforce JSON schemas.
- Dependency hygiene: Monitor for CVEs mentioning deserialization or remote code execution.
- Code reviews: Add deserialization safety checks to PR review templates.
Tooling note: Tools like Xygeni scan code and dependencies for insecure deserialization before merge, identifying high-risk areas so developers can fix them early.
Example Detection Patterns Across Languages (Safe Pseudo-Code)
All examples below are sanitized pseudo-code showing detection patterns, not working exploits:
Java – Detecting unsafe API usage:
java
// BAD: Accepting untrusted input without validation
ObjectInputStream in = new ObjectInputStream(userInputStream);
Object obj = in.readObject(); // Unsafe - no class type checks
// GOOD: Validate allowed classes before processing
if (allowedClasses.contains(obj.getClass().getName())) {
process(obj); // Safe processing of approved classes
}
Python – Avoiding unsafe deserialization:
python
import pickle
# BAD: Loading untrusted serialized data directly
data = pickle.loads(untrusted_input) # Unsafe - arbitrary object execution risk
# GOOD: Use JSON with schema validation
import json
data = json.loads(untrusted_input) # Safe when validated against schema
Node.js – Preventing dynamic code execution:
javascript
// BAD: Executing code from parsed data
let obj = JSON.parse(untrustedInput);
eval(obj.code); // Unsafe - allows arbitrary code execution
// GOOD: Use fixed logic without dynamic execution
let safeObj = JSON.parse(untrustedInput);
process(safeObj); // Handle only expected properties and values
Automating Detection in DevSecOps Pipelines: Catching Deserialization Before It Reaches Production
Automating the detection of insecure deserialization ensures vulnerabilities are caught and fixed before they lead to remote code execution in production.
Pipeline Scanning
- Run SAST on source code, configuration files, and build artifacts at every commit.
- Detect insecure deserialization patterns in both application code and dependencies.
Artifact Inspection
- Scan .ser, .pkl, and other serialized files for unsafe patterns before deploying or even running tests.
Pull Request Blocking
- Block merges if unsafe deserialization is detected.
- Show actionable feedback in PRs to speed up remediation.
Unit Test Enforcement
- Include unit tests with simulated malicious inputs in the CI/CD pipeline
- Fail builds if the application processes unsafe serialized data instead of rejecting it.
Avoiding False Positives Without Weakening Rules
- Do not disable detection rules to “silence” alerts; this can allow real insecure deserialization to pass undetected.
- Use a controlled whitelist (allow-list) for known safe patterns or dependencies.
- Require security validation before approving whitelist entries.
- Keep the whitelist under version control and review it periodically to ensure all exceptions remain justified and safe.
Xygeni’s Role
- Integrates directly into CI/CD pipelines to scan both source code and build artifacts.
- Detects insecure deserialization patterns and risky dependencies early in the lifecycle.
- Supports policy-based whitelisting with mandatory security review, balancing detection accuracy with developer productivity.
Staying Ahead of Remote Code Execution Through Secure Deserialization
Insecure deserialization can remain unnoticed until it becomes a direct path to remote code execution. Preventing it requires:
- Understanding what is deserialization and how it can be abused.
- Embedding automated detection in the development workflow.
- Regularly reviewing dependencies, build artifacts, and serialized data used in tests.
Practical role of Xygeni in this process:
- Source Code Scanning: Identifies insecure deserialization patterns across multiple languages before the code is merged.
- Artifact and Dependency Analysis: Detects risky serialized files (.ser, .pkl, embedded config) and third-party components with known vulnerabilities.
- Policy-Based Controls: Supports a controlled allow-list with security validation, ensuring necessary exceptions don’t introduce real risks.
- Developer Feedback in Context: Flags the exact location and cause of insecure deserialization inside pull requests, allowing developers to fix issues immediately and confirm the mitigation through re-scans.
By integrating checks like these directly into CI/CD, teams can catch and remediate insecure deserialization before it ever has a chance to escalate into remote code execution in production.