mirror of
https://github.com/20kaushik02/real-time-traffic-analysis-clickhouse.git
synced 2025-12-06 11:54:07 +00:00
update readme
This commit is contained in:
parent
59ea030790
commit
3bf8565a16
@ -1,17 +1,16 @@
|
|||||||
# Data filtering, preprocessing and selection for further use
|
# Data filtering, preprocessing and selection for further use
|
||||||
|
|
||||||
- IP packet traces are taken [from here](https://mawi.wide.ad.jp/mawi/samplepoint-F/2023/), specifically from 2023/10/01-2023/10/31 (yet to confirm)
|
- IP packet traces are taken [from here](https://mawi.wide.ad.jp/mawi/samplepoint-F/2023/)
|
||||||
- Filtering - TODO
|
- Filtering
|
||||||
- L4 - Limit to TCP and UDP
|
- L4 - Limit to TCP and UDP
|
||||||
- maybe GRE for VPN usage?
|
|
||||||
- L3 - IPv6 is only around 10%, let's drop it
|
- L3 - IPv6 is only around 10%, let's drop it
|
||||||
- Selection (of fields):
|
- Selection of fields:
|
||||||
- Timestamp
|
- Timestamp
|
||||||
- capture window is from 0500-0515 UTC
|
- capture window is from 0500-0515 UTC
|
||||||
- nanosecond precision, use DateTime64 data type in ClickHouse
|
- nanosecond precision, use `DateTime64` data type in ClickHouse
|
||||||
- IP
|
- IP
|
||||||
- addresses - src, dst
|
- addresses - src, dst
|
||||||
- protocol - TCP or UDP. cld go for boolean in ClickHouse to save space
|
- L4 protocol - TCP, UDP. use `Enum` data type in ClickHouse
|
||||||
- TCP/UDP
|
- TCP/UDP - ports - sport, dport
|
||||||
- ports - sport, dport
|
|
||||||
- Packet size - in bytes
|
- Packet size - in bytes
|
||||||
|
- `sample_output.csv` contains a partial subset of `202310081400.pcap`, ~600K packets
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user