mirror of
https://github.com/20kaushik02/real-time-traffic-analysis-clickhouse.git
synced 2025-12-06 07:54:07 +00:00
Data filtering, preprocessing and selection for further use
- IP packet traces are taken from here
- Filtering
- L4 - Limit to TCP and UDP
- L3 - IPv6 is only around 10%, let's drop it
- Selection of fields:
- Timestamp
- capture window is from 0500-0515 UTC
- nanosecond precision, use
DateTime64data type in ClickHouse
- IP
- addresses - src, dst
- L4 protocol - TCP, UDP. use
Enumdata type in ClickHouse
- TCP/UDP - ports - sport, dport
- Packet size - in bytes
- Timestamp
sample_output.csvcontains a partial subset of202310081400.pcap, ~600K packets