ip2location data

This commit is contained in:
2024-11-04 18:02:12 -07:00
parent b4a777c368
commit 38afa8d9fd
2 changed files with 56 additions and 0 deletions

View File

@@ -1,5 +1,7 @@
# Data filtering, preprocessing and selection for further use
## Traffic data
- IP packet traces are taken [from here](https://mawi.wide.ad.jp/mawi/samplepoint-F/2023/)
- Filtering
- L4 - Limit to TCP and UDP
@@ -15,6 +17,11 @@
- Packet size - in bytes
- `sample_output.csv` contains a partial subset of `202310081400.pcap`, ~600K packets
## IP geolocation database
- This project uses the IP2Location LITE database for [IP geolocation](https://lite.ip2location.com)
- bit of preprocessing to leave out country code and convert IP address from decimal format to dotted string format
# Setting up Kafka
- Download and install kafka [from here](https://kafka.apache.org/downloads)
- Run all commands in separate terminals from installation location