VPC FLOW LOGS


You can record details about the IP traffic to and from network interfaces in your VPC using a tool called AWS VPC Flow Logs. Data from flow logs can be uploaded to Amazon S3 or CloudWatch Logs. Once a flow log has been created, you can obtain and view its data in the designated location.


With the help of the VPC Flow Logs feature in AWS (Amazon Web Services), you may record details about the IP traffic to and from network interfaces in your Virtual Private Cloud (VPC). At the subnet, VPC, or individual network interface level, it gives you visibility into the network traffic.



Use cases: VPC Flow Logs can be used for various purposes, including troubleshooting network connectivity issues, monitoring and analyzing traffic patterns, detecting and investigating security incidents, and complying with regulatory requirements.


It's important to note that enabling VPC Flow Logs does not impact the performance or availability of your VPC. However, capturing and storing a large volume of flow log data can have an impact on your storage costs and, in some cases, your network bandwidth utilization.

Overall, VPC Flow Logs provide valuable insights into the network traffic within your VPC, allowing you to monitor and analyze network behavior and security events in your AWS environment.



You can use flow logs to assist with a wide range of tasks, including:

  • Diagnosing overly restrictive security group rules
  • Monitor the traffic that is reaching your aws service
  • Determining the traffic direction to and from the network interfaces

When VPC Flow Logs are stored in Amazon CloudWatch Logs, the log events are structured with specific columns that provide information about each network flow. The exact columns present in VPC Flow Logs depend on the log format version (Version 2 or Version 5) and the type of traffic (accepted or rejected). Here are the commonly found columns in VPC Flow Logs:


For Version 2 logs (both accepted and rejected traffic):

  • version: The version number of the log format.
  • account-id: The AWS account ID associated with the flow log.
  • interface-id: The ID of the network interface from which the traffic originated or to which it was destined.
  • srcaddr: The source IP(internet protocol) address of the traffic.
  • dstaddr: The destination IP address of the traffic.
  • srcport: The source port number of the traffic.
  • dstport: The destination port number of the traffic.
  • protocol: The protocol of the traffic (e.g., TCP, UDP, ICMP).
  • packets: The number of packets transferred in the flow.
  • bytes: The number of bytes transferred in the flow.
  • start: The start time of the flow.
  • end: The end time of the flow.
  • action: The action taken on the traffic (ACCEPT or REJECT).

For Version 5 logs (both accepted and rejected traffic), additional columns are available:


  • version: The version number of the log format.
  • account-id: The AWS account ID associated with the flow log.
  • interface-id: The ID of the network interface from which the traffic originated or to which it was destined.
  • srcaddr: The source IP(internet protocol) address of the traffic.
  • dstaddr: The destination IP address of the traffic.
  • srcport: The source port number of the traffic.
  • dstport: The destination port number of the traffic.
  • protocol: The protocol of the traffic (e.g., TCP, UDP, ICMP).
  • packets: The number of packets transferred in the flow.
  • bytes: The number of bytes transferred in the flow.
  • start: The start time of the flow.
  • end: The end time of the flow.
  • action: The action taken on the traffic (ACCEPT or REJECT).
  • log-status: The logging status of the flow log (OK or NODATA).
  • src-geoip: The geographical information of the source IP address.
  • dst-geoip: The geographical information of the destination IP address.
  • tcp-flags: The TCP flags associated with the traffic flow.
  • type: The ICMP type associated with the flow.

HOW TO SET UP VPC FLOW LOGS AND INTEGRATE IT WITH ATHENA



  1. Set up an S3 bucket for Athena to keep the results of queries.






after creating the bucket, go to the permission section and copy the arn of the respective bucket, it’ll be useful while creating the flow logs of the vpc



  1. Create VPC Flow Logs



  1. Go to the AWS Management Console and open the Amazon VPC console.

  2. For Resource, choose the VPC, subnet, or network interface for which you want to create a flow log.


  3. For Destination, choose where you want to send the flow log data. You can choose to send the data to an S3 bucket.


  4. For Filter, specify the type of traffic that you want to log. You can choose to log all traffic, accepted traffic, or rejected traffic.

  5. For Log record format, choose the format in which you want the flow log data to be logged. You can choose to log the data in JSON format or in CEF format.

  6. For Maximum aggregation interval, specify the maximum amount of time that you want to wait before aggregating the flow log data into a single log record. The default value is 60 seconds.

  7. For Tags, add tags to the flow log. Tags can help you organize your flow logs and control access to them.

  8. Choose Create.




  • Now, Go to Athena , Create table and  a partition for the dates you want to be able to query


CREATE EXTERNAL TABLE IF NOT EXISTS default.vpc_flow_logs (

  version int,

  account string,

  interfaceid string,

  sourceaddress string,

  destinationaddress string,

  sourceport int,

  destinationport int,

  protocol int,

  numpackets int,

  numbytes bigint,

  starttime int,

  endtime int,

  action string,

  logstatus string

)


PARTITIONED BY (dt string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION 's3://{your_log_bucket}/AWSLogs/{account_id}/vpcflowlogs/us-east-1/'

TBLPROPERTIES ("skip.header.line.count"="1");


Note: In the above replace the LOCATION block  with your  s3 bucket uri



ALTER TABLE default.vpc_flow_logs

ADD PARTITION (dt='{Year}-{Month}-{Day}')

location 's3://{your_log_bucket}/AWSLogs/{account_id}/vpcflowlogs/us-east-1/{Year}/{Month}/{Day}';


Run your queries now


After creating the table, we can write our queries ➖


# Get all REJECTED logs 


  • SELECT *

FROM vpc_flow_logs

WHERE action = 'REJECT';


# request from port 22

  • SELECT *

FROM vpc_flow_logs

where sourceport=22;


# Get all logs from the source ip address 10.0.1.39


  • SELECT *

FROM vpc_flow_logs

WHERE sourceaddress = '10.0.1.39';


#Get all logs whose destination port is 443


  • SELECT *

FROM vpc_flow_logs

WHERE destinationport = 443


#Aggregate the ip address by the number of packets 


  • SELECT sourceaddress, COUNT(*) AS packet_count

FROM vpc_flow_logs

GROUP BY sourceaddress

ORDER BY packet_count DESC


# count all the records in the table ➖

  • SELECT COUNT(*) AS total_records

FROM vpc_flow_logs


# top 10 ip address with most number of packets

  • SELECT sourceaddress, COUNT(*) AS packet_count

FROM vpc_flow_logs

GROUP BY sourceaddress

ORDER BY packet_count DESC

LIMIT 10


# total data transferred through each protocol

  • SELECT protocol, SUM(numbytes) AS total_bytes

FROM vpc_flow_logs

GROUP BY protocol


#top 10 traffic flow for the longest duration from the source address to the destination address 

  • SELECT sourceaddress, destinationaddress, starttime, endtime, (endtime - starttime) AS duration

FROM vpc_flow_logs

ORDER BY duration DESC

LIMIT 10



PROTOCOLS

1   ➖  ICMP (Internet control message protocol)

6   ➖  TCP(Transmission control protocol)

17 ➖  UDP (User Datagram Protocol)


USE CASES


1. Question: You need to extract VPC flow logs for all traffic to a specific EC2 instance within the last hour. How can you do this using Athena?


select *

FROM vpc_flow_logs

WHERE sourceaddress='172.31.87.229'

ORDER BY sourceport DESC


2. Question: You need to extract VPC flow logs for all traffic from a specific security group within the last 6 hours. How can you do this using Athena?


For this first we’ve to identify the security group attached to which eni

Example: ONLY_HTTP_SG is attached to two network interfaces

So, I have used both eni in the below query ➖


select * 

from vpc_flow_log

WHERE interface_id IN ('{INTERFACE_ID_0}', '{INTERFACE_ID_1}')


3. Question: You need to extract VPC flow logs for all traffic to a specific subnet within the last day. How can you do this using Athena?

select * 

from vpc_flow_log

WHERE subnet_id = '{SUBNET_ID}'

AND date_diff('day', from_unixtime(start_time), current_timestamp) <= 1


4. Question: You need to extract VPC flow logs for all traffic to a specific port within the last 2 hours. How can you do this using Athena?


SELECT *

FROM vpc_flow_logs

WHERE destinationport=80

  AND date_diff('minute', from_unixtime(starttime), current_timestamp) <= 120

ORDER BY starttime

  

5. Question: You need to extract VPC flow logs for all traffic from a specific VPC to a different VPC within the last 12 hours. How can you do this using Athena?


select * 

from vpc_flow_log

where vpc_id != '{YOUR_VPC_ID}'

AND date_diff('hour', from_unixtime(starttime), current_timestamp) <= 12


7. Question: You need to extract VPC flow logs for all traffic to a specific Elastic Load Balancer (ELB) within the last 6 hours. How can you do this using Athena?

SELECT *

FROM vpc_flow_logs

WHERE interfaceid =’{LOADBALANCER_ENI}’

  AND date_diff('hour', from_unixtime(starttime), current_timestamp) <= 6

ORDER BY starttime


8. Question: You need to extract VPC flow logs for all traffic from a specific IP address to a range of IP addresses within the last day. How can you do this using Athena?


SELECT *

FROM vpc_flow_logs

WHERE sourceaddress = 'SOURCE_IP'

  AND destinationaddress BETWEEN '101.0.0.0' AND '201.0.0.0'

  AND date_diff('day', from_unixtime(starttime), current_timestamp) <= 1


9. Question: You need to extract VPC flow logs for all traffic using a specific protocol within the last 2 hours. How can you do this using Athena?


SELECT *

FROM vpc_flow_logs

WHERE protocol = 6

  AND date_diff('hour', from_unixtime(starttime), current_timestamp) <= 2


10. Question: You need to extract VPC flow logs for all traffic from a specific network interface (ENI) within the last 12 hours. How can you do this using Athena?


SELECT *

FROM vpc_flow_logs

WHERE interfaceid = 'eni-0f0813f70f568ff53'

  AND date_diff('hour', from_unixtime(starttime), current_timestamp) <= 12

 

11. Question: You need to extract VPC flow logs for all traffic with a specific action (accept or reject) within the last 4 hours. How can you do this using Athena?


SELECT *

FROM vpc_flow_logs

WHERE action = 'ACCEPT'

  AND date_diff('hour', from_unixtime(starttime), current_timestamp) <= 4

 

OR


SELECT *

FROM vpc_flow_logs

WHERE action = 'REJECT'

  AND date_diff('hour', from_unixtime(starttime), current_timestamp) <= 4

 


12. Question: You need to extract VPC flow logs for all traffic from a specific source IP range to a specific destination IP range within the last 8 hours. How can you do this using Athena?


SELECT *

FROM vpc_flow_logs

WHERE sourceaddress between '172.0.0.0' and '172.255.255.255' 

AND

    destinationaddress between '101.0.0.0' and '201.0.0.0'

AND date_diff('hour', from_unixtime(starttime), current_timestamp) <= 8


13. Question: Count all traffic which uses tcp protocol, udp protocol


select *

FROM vpc_flow_logs

WHERE protocol = 6               – TCP PROTOCOL


Select * 

FROM vpc_flow_logs

WHERE  protocol =17              – UDP PROTOCOL



14. Question: Count all traffic to each of the source address from the destination address


select sourceaddress, count(distinct(destinationaddress)) as no_of_requests_through_srcaddress

from vpc_flow_logs

group by sourceaddress

order by count(distinct(destinationaddress)) desc


15. Question:  No. of packets sent through each destination address

select destinationaddress, sum(numpackets) as packets

from vpc_flow_logs

group by destinationaddress

order by count((numpackets)) desc

ENABLING VERSION 5 FOR VPC DETAILED INFO


  1. Go to the AWS Management Console and open the Amazon VPC console.

  2. For Resource, choose the VPC, subnet, or network interface for which you want to create a flow log.


  3. For Maximum aggregation interval, specify the maximum amount of time that you want to wait before aggregating the flow log data into a single log record. The default value is 60 seconds.

  4. For Destination, choose where you want to send the flow log data. You can choose to send the data to an S3 bucket, and enter your bucket arn.


  5. For Filter, specify the type of traffic that you want to log. You can choose to log all traffic, accepted traffic, or rejected traffic.

  6. For Log record format, choose the format in which you want the flow log data to be logged. You can choose to log the data in JSON format or in CEF format.

Now, for Enabling the version 5, select the custom record format 


  1. For Tags, add tags to the flow log. Tags can help you organize your flow logs and control access to them.

  2. Choose Create.


There are 29 columns in the version details:


CREATE EXTERNAL TABLE IF NOT EXISTS default.vpc_flow_log (

  account_id string,

  action string,

  az_id string,

  bytes bigint,

  dstaddr string,

  dstport int,

  end_time int,

  flow_direction string,

  instance_id string,

  interface_id string,

  logs_status string,

  packets int,

  pkt_dst_aws_service string,

  pkt_dstaddr string,

  pkt_src_aws_service string,

  pkt_srcaddr string,

  protocol int,

  region string,

  srcaddr string,

  srcport string,

  start_time int,

  sublocation_id string,

  sublocation_type string,

  subnet_id string,

  tcp_flags string,

  traffic_path string,

  type string,

  version int,

  vpc_id string

)


PARTITIONED BY (dt string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION 's3://{S3_BUCKET_NAME}/AWSLogs/{ACCOUNT_ID}/vpcflowlogs/us-east-1/'

TBLPROPERTIES ("skip.header.line.count"="1");



The columns of the VPC flow logs are as follows:


  1. account_id: The AWS account ID associated with the flow log.

  2. action: The action performed on the traffic. It can be "ACCEPT" or "REJECT" depending on the network ACL or security group rules.

  3. az_id: The availability zone ID where the flow log data was captured.

  4. bytes: The number of bytes transferred in the network flow.

  5. dstaddr: The destination IP address of the network flow.

  6. dstport: The destination port number of the network flow.

  7. end_time: The end time of the network flow in Unix timestamp format.

  8. flow_direction: The direction of the network flow. It can be "INBOUND" or "OUTBOUND."

  9. instance_id: The ID of the EC2 instance associated with the network flow.

  10. interface_id: The ID of the network interface associated with the network flow.

  11. logs_status: The status of the flow log. It can be "OK" or "NODATA" depending on whether the flow log is delivering data or not.

  12. packets: The number of packets transferred in the network flow.

  13. pkt_dst_aws_service: The AWS service associated with the destination IP address.

  14. pkt_dstaddr: The destination IP address of the packet in the network flow.

  15. pkt_src_aws_service: The AWS service associated with the source IP address.

  16. pkt_srcaddr: The source IP address of the packet in the network flow.

  17. protocol: The protocol number of the network flow (e.g., TCP=6, UDP=17).

  18. region: The AWS region where the flow log was created.

  19. srcaddr: The source IP address of the network flow.

  20. srcport: The source port number of the network flow.

  21. start_time: The start time of the network flow in Unix timestamp format.

  22. sublocation_id: The ID of the sublocation where the flow log data was captured.

  23. sublocation_type: The type of sublocation where the flow log data was captured (e.g., NAT, VPC).

  24. subnet_id: The ID of the subnet associated with the network flow.

  25. tcp_flags: The TCP flags set in the network flow.

  26. traffic_path: The path of the network flow. It can be "ACCEPT" or "REJECT" depending on the network ACL or security group rules.

  27. type: The type of flow log record. It can be "FLOW_LOG" or "VPC_FLOW_LOG".

  28. version: The version of the flow log record format.

  29. vpc_id: The ID of the VPC associated with the network flow.