Unix : Shell script that performs log file analysis : Find the top 5 IP addresses

In this article we will explain a script will analyze an Apache web server’s access log file to find the top 5 IP addresses in terms of the number of requests made.

Shell script:


# Check if a log file has been provided
if [ $# -ne 1 ]
    echo "Usage: $0 <access_log>"
    exit 1


# Check if the provided log file exists
if [ ! -f "$LOG_FILE" ]
    echo "Log file $LOG_FILE does not exist."
    exit 1

# Parse the log file and print the top 5 IP addresses
awk '{print $1}' "$LOG_FILE" | sort | uniq -c | sort -nr | head -n 5

Now, let’s break down how this script works:

  1. The script expects a path to an Apache access log file to be provided as a command-line argument. If no argument is provided, it outputs a usage message and exits.
  2. The script then checks if the provided file exists. If it doesn’t, it outputs an error message and exits.
  3. The awk command is used to extract the first field (i.e., the IP address) from each line in the log file. In an Apache access log file, each line represents a single request, and the first field of each line is the IP address that made the request.
  4. The sort command is used to sort these IP addresses.
  5. The uniq -c command is used to count the occurrence of each unique IP address. The -c option prefixes each line with the number of consecutive occurrences.
  6. The sort -nr command is used to sort these counts in descending numerical order. The -n option tells sort to compare according to string numerical value, and -r tells it to reverse the result of comparisons (i.e., sort in descending order).
  7. Finally, head -n 5 is used to print the top 5 lines (i.e., the 5 IP addresses with the most requests).

























