Malware Detection via CNN

The problem

Malware families leave behavioral fingerprints in the system calls they invoke. Signature-based detection misses anything new. We wanted to see whether a CNN — normally a vision tool — could spot malicious patterns in call sequences treated as 2D encodings.

The approach

Collect binary runtime traces, normalize the API-call vocabulary, and encode each call sequence into a fixed-size image-like tensor (ordered, bucketed, padded). Train a convolutional network on the result. Handle the real-world class imbalance (goodware dwarfs malware) with targeted augmentation and balanced sampling.

TensorBoard tracked training across dozens of runs and ablations. Final model beat the baseline signature-matching approach meaningfully on held-out families.

The stack

TensorFlow for the CNN · Python end-to-end for the encoding pipeline · custom augmentation for imbalanced classes · TensorBoard for visibility.

Reflection

The paper’s contribution wasn’t the model — it was the encoding. Treating a call sequence as a 2D pattern unlocked a whole set of tools that the sequence-model literature had abandoned.