2024-11-03 18:47:39 -07:00

2.1 KiB

Data filtering, preprocessing and selection for further use

  • IP packet traces are taken from here
  • Filtering
    • L4 - Limit to TCP and UDP
    • L3 - IPv6 is only around 10%, let's drop it
  • Selection of fields:
    • Timestamp
      • capture window is from 0500-0515 UTC
      • nanosecond precision, use DateTime64 data type in ClickHouse
    • IP
      • addresses - src, dst
      • L4 protocol - TCP, UDP. use Enum data type in ClickHouse
    • TCP/UDP - ports - sport, dport
    • Packet size - in bytes
  • sample_output.csv contains a partial subset of 202310081400.pcap, ~600K packets

Setting up Kafka

  • Download and install kafka from here
  • Run all commands in separate terminals from installation location
  • Zookeeper:
    • Windows: .\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
    • Mac: bin/zookeeper-server-start.sh config/zookeeper.properties
  • Kafka Broker:
    • Windows: .\bin\windows\kafka-server-start.bat .\config\server.properties
    • Mac: bin/kafka-server-start.sh config/server.properties
  • Creating a Kafka topic:
    • Windows: .\bin\windows\kafka-topics.bat --create --topic %topicname% --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
    • Mac: bin/kafka-topics.sh --create --topic %topicname% --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

Streaming from pcap file using Kafka

  • Start zookeeper and Kafka broker whenever python code is run after machine reboot
  • Run pcap_processor.py file
  • Arguments
    • -f or --pcap_file: pcap file path, mandatory argument
    • -o or --out_file: output csv file path
    • -x or --sample: boolean value indicating if data has to be sampled
    • -s or --stream: boolean value indicating if kafka streaming should happen
    • --stream_size: integer indicating number of sampled packets
    • -d or --debug: boolean value indicating if program is run in debug mode

python pcap_processor.py -f C:/Users/akash/storage/Asu/sem3/dds/project/202310081400.pcap -s --sample-size 1000