37 lines
1.4 KiB
Markdown

# Real-time analytics of Internet traffic flow data
![Architecture diagram](./architecture_diagram.png)
## Download the dataset
- The full preprocessed dataset is hosted [here](https://tmp.knravish.me/512_proj/1M_sample_2023_10_01-2023_10_31.csv) - 1.4GB
- Place this file in the `preprocessing` directory
- For testing purposes, you can use the sample CSV that has 10k records from each day instead, change the bind path in the Compose file
## To run the project
- From the `scripts` directory:
- Run `deploy.ps1 -M` for Windows
- Run `deploy.sh -M` for Linux/macOS (add `-S` if sudo needed for docker)
- See the `README` in `scripts` for more
- This sets up the whole stack
### Access the UI
- The Grafana web interface is located at `http://localhost:7602`
- Login:
- Username: `thewebfarm`
- Password: `mrafbeweht`
- Go to `Dashboards` > `Internet traffic capture analysis`
### To run the shard creation and scaling script
- From the `scripts` directory:
- Install dependencies: `python3 -r ../clickhouse/update_config_scripts/requirements.txt`
- Run `python3 ../clickhouse/update_config_scripts/update_trigger.py`
- This checks every 2 minutes and creates a new shard and two server nodes for it based on resource utilization
## Limitations
- For multi-node deployments using Docker Swarm, the manager node needs to be running on Linux (outside Docker Desktop i.e. standalone Docker installation) due to limitations in the Docker Swarm engine