mirror of
https://github.com/20kaushik02/real-time-traffic-analysis-clickhouse.git
synced 2025-12-06 06:34:06 +00:00
Real-time analytics of Internet traffic flow data
Download the dataset
- The full preprocessed dataset is hosted here - 1.4GB
- Place this file in the
preprocessingdirectory - For testing purposes, you can use the sample CSV that has 10k records from each day instead, change the bind path in the Compose file
To run the project
- From the
scriptsdirectory:- Run
deploy.ps1 -Mfor Windows - Run
deploy.sh -Mfor Linux/macOS (add-Sif sudo needed for docker) - See the
READMEinscriptsfor more
- Run
- This sets up the whole stack
Access the UI
- The Grafana web interface is located at
http://localhost:7602 - Login:
- Username:
thewebfarm - Password:
mrafbeweht
- Username:
- Go to
Dashboards>Internet traffic capture analysis
To run the shard creation and scaling script
- From the
scriptsdirectory:- Install dependencies:
python3 -r ../clickhouse/update_config_scripts/requirements.txt - Run
python3 ../clickhouse/update_config_scripts/update_trigger.py
- Install dependencies:
- This checks every 2 minutes and creates a new shard and two server nodes for it based on resource utilization
Limitations
- For multi-node deployments using Docker Swarm, the manager node needs to be running on Linux (outside Docker Desktop i.e. standalone Docker installation) due to limitations in the Docker Swarm engine
Description
Real-time analytics of Internet traffic flow data, sourced from MAWI traffic archives, with Kafka, Clickhouse and Grafana, orchestrated with Docker Swarm
Readme
23 MiB
Languages
Python
64.3%
Jinja
15.3%
Shell
15.2%
PowerShell
4.7%
Dockerfile
0.5%
