mirror of
https://github.com/20kaushik02/real-time-traffic-analysis-clickhouse.git
synced 2026-01-25 08:04:04 +00:00
preprocessing data
This commit is contained in:
15
preprocessing/README.md
Normal file
15
preprocessing/README.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Data filtering, preprocessing and selection for further use
|
||||
|
||||
- IP packet traces are taken [from here](https://mawi.wide.ad.jp/mawi/samplepoint-F/2023/), specifically from 2023/10/01-2023/10/31 (yet to confirm)
|
||||
- Filtering - TODO
|
||||
- L4 - Limit to TCP and UDP
|
||||
- maybe GRE for VPN usage?
|
||||
- L3 - IPv6 is only around 10%, let's drop it
|
||||
- Selection (of fields):
|
||||
- Timestamp - note: capture window is from 0500-0515 UTC
|
||||
- IP
|
||||
- addresses - src, dst
|
||||
- protocol - 6 (TCP) or 17 (UDP). cld go for boolean to save space
|
||||
- TCP
|
||||
- ports - sport, dport
|
||||
- Packet size - in bytes - could exclude L2?
|
||||
Reference in New Issue
Block a user