To copy files from Hadoop’s HDFS (Hadoop Distributed File System) to your local machine, you can use the hadoop fs
or hdfs dfs
command, which provides a simple way to interact with HDFS. Here’s a step-by-step guide with examples:
Step 1: Open a terminal
Open a terminal window on your local machine. This is where you’ll run the Hadoop HDFS commands.
Step 2: Use the hadoop fs or hdfs dfs Command
You can use either hadoop fs or hdfs dfs command to interact with HDFS. Both commands are equivalent, so choose the one you prefer. Below, I’ll use hdfs dfs
.
Step 3: Copying files from HDFS to local
To copy files from HDFS to your local machine, you can use the -copyToLocal
or -get
command with the HDFS path and the local destination path.
Here’s the basic syntax:
hdfs dfs -copyToLocal <HDFS_PATH> <LOCAL_DESTINATION_PATH>
Or using -get:
hdfs dfs -get <HDFS_PATH> <LOCAL_DESTINATION_PATH>
<LOCAL_DESTINATION_PATH> is the local directory where you want to copy the file(s).
Examples
Let’s look at a few examples:
To copy a file named myfile.txt from HDFS to your current local directory:
hdfs dfs -copyToLocal /user/freshers_in/myfile.txt .
The .
at the end specifies the current directory as the destination.
To copy a file named mydata.csv
from HDFS to a specific local directory:
hdfs dfs -copyToLocal /user/freshers_in/data/mydata.csv /path/to/local/directory/
Replace /path/to/local/directory/ with the actual path to your desired local directory.
To copy an entire directory from HDFS to your local machine:
hdfs dfs -copyToLocal /user/freshers_in/mydirectory/ /path/to/local/directory/
This will copy all files and subdirectories from the HDFS directory to your local directory.
Refer more on python here : Python