Malware Detection via CNN
Classifying binaries by API-call sequences encoded as images.
- role
- Researcher
- duration
- 5 months
- team
- 2
- stack
- Python · TensorFlow · CNN · Cybersecurity
“CNNs don't only see pictures. they see patterns.”
The problem
Malware families leave behavioral fingerprints in the system calls they invoke. Signature-based detection misses anything new. We wanted to see whether a CNN — normally a vision tool — could spot malicious patterns in call sequences treated as 2D encodings.
The approach
Collect binary runtime traces, normalize the API-call vocabulary, and encode each call sequence into a fixed-size image-like tensor (ordered, bucketed, padded). Train a convolutional network on the result. Handle the real-world class imbalance (goodware dwarfs malware) with targeted augmentation and balanced sampling.
TensorBoard tracked training across dozens of runs and ablations. Final model beat the baseline signature-matching approach meaningfully on held-out families.
The stack
TensorFlow for the CNN · Python end-to-end for the encoding pipeline · custom augmentation for imbalanced classes · TensorBoard for visibility.
Reflection
The paper’s contribution wasn’t the model — it was the encoding. Treating a call sequence as a 2D pattern unlocked a whole set of tools that the sequence-model literature had abandoned.