2024-12-03 00:28:17 -07:00
2024-11-02 22:15:39 -07:00
2024-12-03 00:28:17 -07:00
2024-12-03 00:28:17 -07:00
2024-10-30 11:08:09 -07:00
2024-12-03 00:28:17 -07:00
2024-10-30 11:08:09 -07:00
2024-12-03 00:28:17 -07:00

Real-time analytics of Internet traffic flow data

Architecture diagram

Download the dataset

  • The full preprocessed dataset is hosted here - 1.4GB
  • Place this file in the preprocessing directory
  • For testing purposes, you can use the sample CSV that has 10k records from each day instead, change the bind path in the Compose file

To run the project

  • From the scripts directory:
    • Run deploy.ps1 -M for Windows
    • Run deploy.sh -M for Linux/macOS (add -S if sudo needed for docker)
    • See the README in scripts for more
  • This sets up the whole stack

Access the UI

  • The Grafana web interface is located at http://localhost:7602
  • Login:
    • Username: thewebfarm
    • Password: mrafbeweht
  • Go to Dashboards > Internet traffic capture analysis

To run the shard creation and scaling script

  • From the scripts directory:
    • Install dependencies: python3 -r ../clickhouse/update_config_scripts/requirements.txt
    • Run python3 ../clickhouse/update_config_scripts/update_trigger.py
  • This checks every 2 minutes and creates a new shard and two server nodes for it based on resource utilization

Limitations

  • For multi-node deployments using Docker Swarm, the manager node needs to be running on Linux (outside Docker Desktop i.e. standalone Docker installation) due to limitations in the Docker Swarm engine
Description
Real-time analytics of Internet traffic flow data, sourced from MAWI traffic archives, with Kafka, Clickhouse and Grafana, orchestrated with Docker Swarm
Readme 23 MiB
Languages
Python 64.3%
Jinja 15.3%
Shell 15.2%
PowerShell 4.7%
Dockerfile 0.5%