Back to Projects

Insider Threat Detection Dashboard

Tech Stack
Python · Power BI · Pandas · Statistical Analysis

Project Overview

Insider threats are notoriously difficult to detect because they look like legitimate user activity.

This project demonstrates an end-to-end analytical workflow—from raw log processing to statistical modeling—culminating in an interactive Power BI dashboard.

By analyzing user behavior logs (Logons, USB activity, File transfers), I built a Python-based risk-scoring engine that identifies behavioral anomalies, filtering out 99% of "normal" noise to find the 1% of malicious actors.

Final SOC Dashboard highlighting top risk usersFinal SOC Dashboard highlighting top risk users


The Analytical Workflow

This project followed a structured, multi-stage data science lifecycle:

1. Exploratory Data Analysis (EDA)

Before detecting anomalies, I first had to define "normal." Using Python and Pandas, I processed the CERT Insider Threat Dataset (>5GB) to visualize corporate-wide login patterns.

The resulting heatmap clearly identified the standard 7 AM - 6 PM workday, providing the baseline required to flag after-hours anomalies.

Heatmap showing normal logon patternsHeatmap showing normal logon patterns

2. The Risk-Scoring Engine

I engineered a composite risk score for all 1,000 users. Instead of relying on simple counts (which generate false positives), I calculated Z-scores (Standard Deviations from the mean) for three key indicators:

  • Total volume of file copies.
  • Total number of after-hours logons.
  • Total number of weekend logons.

This statistical approach automatically bubbled the most suspicious users to the top of the list.

Ranked list of suspects with Risk ScoresRanked list of suspects with Risk Scores


Dashboard Architecture

The processed data and risk scores were ingested into Power BI. I constructed a Star Schema data model to allow for rapid, drill-down capabilities between the high-level risk scores and the granular event logs.

Power BI Data Model (Star Schema)Power BI Data Model (Star Schema)


The Investigation (Findings)

The analysis successfully differentiated between "Automated Anomalies" and "Human Threats."

The False Positive:

User HSB0196 had the highest file volume. However, the dashboard revealed machine-like, perfectly periodic logon patterns, indicating a scheduled backup script rather than a human threat.

The True Threat:

User AJF0370 had the highest composite Risk Score. The dashboard drill-down confirmed a clear pattern of working late at night and on weekends, correlating directly with high-volume USB exfiltration events.

Filtered investigation view of the primary suspectFiltered investigation view of the primary suspect


Technical Stack

ComponentTechnology Used
Data ProcessingPython, Pandas, Jupyter Notebook
VisualizationMicrosoft Power BI, DAX
ETLPower Query
MathZ-Score Statistical Modeling

Reproduction Steps

  1. Clone the repository:

    git clone https://github.com/sohankanna/Insider-Threat-Detection-Dashboard.git
    
  2. Data Setup: Download the CERT Insider Threat Dataset (r4.2) and place the .csv files inside a data/ folder.

  3. Run Analysis: Execute insider_threat_analysis.ipynb to generate the risk scores.

  4. Visualize: Open Insider_Threat_Dashboard.pbix and refresh the data source to point to your local processed files.