mirror of
https://github.com/20kaushik02/real-time-traffic-analysis-clickhouse.git
synced 2025-12-06 08:04:06 +00:00
Data filtering, preprocessing and selection for further use
- IP packet traces are taken from here, specifically from 2023/10/01-2023/10/31 (yet to confirm)
- Filtering - TODO
- L4 - Limit to TCP and UDP
- maybe GRE for VPN usage?
- L3 - IPv6 is only around 10%, let's drop it
- L4 - Limit to TCP and UDP
- Selection (of fields):
- Timestamp - note: capture window is from 0500-0515 UTC
- IP
- addresses - src, dst
- protocol - 6 (TCP) or 17 (UDP). cld go for boolean to save space
- TCP
- ports - sport, dport
- Packet size - in bytes - could exclude L2?