In this article we will explain a script will analyze an Apache web server’s access log file to find the top 5 IP addresses in terms of the number of requests made.
Shell script:
#!/bin/bash
# Check if a log file has been provided
if [ $# -ne 1 ]
then
echo "Usage: $0 <access_log>"
exit 1
fi
LOG_FILE="/mnt/freshers-web/logs/"
# Check if the provided log file exists
if [ ! -f "$LOG_FILE" ]
then
echo "Log file $LOG_FILE does not exist."
exit 1
fi
# Parse the log file and print the top 5 IP addresses
awk '{print $1}' "$LOG_FILE" | sort | uniq -c | sort -nr | head -n 5
Now, let’s break down how this script works:
- The script expects a path to an Apache access log file to be provided as a command-line argument. If no argument is provided, it outputs a usage message and exits.
- The script then checks if the provided file exists. If it doesn’t, it outputs an error message and exits.
- The awk command is used to extract the first field (i.e., the IP address) from each line in the log file. In an Apache access log file, each line represents a single request, and the first field of each line is the IP address that made the request.
- The sort command is used to sort these IP addresses.
- The uniq -c command is used to count the occurrence of each unique IP address. The -c option prefixes each line with the number of consecutive occurrences.
- The sort -nr command is used to sort these counts in descending numerical order. The -n option tells sort to compare according to string numerical value, and -r tells it to reverse the result of comparisons (i.e., sort in descending order).
- Finally, head -n 5 is used to print the top 5 lines (i.e., the 5 IP addresses with the most requests).