Bigdata – Knowledge Base

HDFS – Commands

Introduction to HDFS #

Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It is highly fault-tolerant and provides high throughput access to application data. HDFS is designed to support large files, streaming data access, and large-scale clusters.

Basic HDFS Commands #

These are the fundamental commands for interacting with HDFS, such as navigating the file system and managing files.

  1. hdfs dfs -ls: List Directory Contents
    • Usage: hdfs dfs -ls /path/to/directory
    • Explanation: Lists the contents of a directory in HDFS. It provides details such as permissions, replication factor, owner, group, file size, and modification date.
    • Example:
      • hdfs dfs -ls /user/hadoop
    • Output:
      • -rw-r--r-- 1 hadoop hadoop 134 2024-01-01 12:00 /user/hadoop/file1.txt
      • drwxr-xr-x - hadoop hadoop 0 2024-01-01 12:01 /user/hadoop/input
  2. hdfs dfs -mkdir: Make Directory
    • Usage: hdfs dfs -mkdir /path/to/new_directory
    • Explanation: Creates a new directory in HDFS. You can create multiple directories at once using -p for parent directories.
    • Example:
      • hdfs dfs -mkdir /user/hadoop/input
    • Output: The directory /user/hadoop/input is created in HDFS.
  3. hdfs dfs -put: Copy Files from Local Filesystem to HDFS
    • Usage: hdfs dfs -put /local/path/to/file /hdfs/destination/path
    • Explanation: Copies files from the local filesystem to the HDFS directory. This command is commonly used to upload data to HDFS for processing.
    • Example:
      • hdfs dfs -put /home/hadoop/data.txt /user/hadoop/input
    • Output: The file data.txt is copied to /user/hadoop/input in HDFS.
  4. hdfs dfs -get: Copy Files from HDFS to Local Filesystem
    • Usage: hdfs dfs -get /hdfs/source/path /local/destination/path
    • Explanation: Downloads files from HDFS to the local filesystem. This is useful for retrieving processed results or backups.
    • Example:
      • hdfs dfs -get /user/hadoop/output/result.txt /home/hadoop/
    • Output: The file result.txt is copied from HDFS to the local directory /home/hadoop/.
  5. hdfs dfs -rm: Remove Files or Directories from HDFS
    • Usage: hdfs dfs -rm /hdfs/path/to/file
    • Explanation: Deletes files from HDFS. Use -r to remove directories recursively.
    • Example:
      • hdfs dfs -rm /user/hadoop/input/data.txt
    • Output: The file data.txt is deleted from the /user/hadoop/input directory in HDFS.
  6. hdfs dfs -rmdir: Remove Empty Directories
    • Usage: hdfs dfs -rmdir /hdfs/path/to/directory
    • Explanation: Removes an empty directory from HDFS. If the directory is not empty, this command will fail.
    • Example:
      • hdfs dfs -rmdir /user/hadoop/input
    • Output: The empty directory /user/hadoop/input is removed from HDFS.

File Viewing and Manipulation Commands #

These commands allow you to view and manipulate the contents of files stored in HDFS.

  1. hdfs dfs -cat: Display File Contents
    • Usage: hdfs dfs -cat /hdfs/path/to/file
    • Explanation: Displays the contents of a file in HDFS. This is useful for quick checks of file data without downloading it.
    • Example:
      • hdfs dfs -cat /user/hadoop/input/data.txt
    • Output: Displays the content of data.txt on the console.
  2. hdfs dfs -tail: Display Last Part of a File
    • Usage: hdfs dfs -tail /hdfs/path/to/file
    • Explanation: Displays the last 1KB of a file in HDFS. Useful for checking log files or seeing recent data appended to a file.
    • Example:
      • hdfs dfs -tail /user/hadoop/logs/application.log
    • Output: Displays the last few lines of application.log.
  3. hdfs dfs -head: Display the First Part of a File
    • Usage: hdfs dfs -head /hdfs/path/to/file
    • Explanation: Displays the first few lines of a file in HDFS. Useful for previewing file content.
    • Example:
      • hdfs dfs -head /user/hadoop/input/data.txt
    • Output: Displays the first few lines of data.txt.
  4. hdfs dfs -appendToFile: Append Local File to HDFS File
    • Usage: hdfs dfs -appendToFile /local/path/to/file /hdfs/destination/path
    • Explanation: Appends the content of a local file to an existing file in HDFS.
    • Example:
      • hdfs dfs -appendToFile /home/hadoop/new_data.txt /user/hadoop/input/data.txt
    • Output: Appends new_data.txt to data.txt in HDFS.
  5. hdfs dfs -cp: Copy Files within HDFS
    • Usage: hdfs dfs -cp /hdfs/source/path /hdfs/destination/path
    • Explanation: Copies files or directories from one location to another within HDFS.
    • Example:
      • hdfs dfs -cp /user/hadoop/input/data.txt /user/hadoop/archive/data_backup.txt
    • Output: Copies data.txt to data_backup.txt within HDFS.
  6. hdfs dfs -mv: Move Files within HDFS
    • Usage: hdfs dfs -mv /hdfs/source/path /hdfs/destination/path
    • Explanation: Moves files or directories from one location to another within HDFS.
    • Example:
      • hdfs dfs -mv /user/hadoop/input/data.txt /user/hadoop/processed/
    • Output: Moves data.txt to the /user/hadoop/processed directory.

File and Directory Management Commands #

These commands are used for advanced management tasks like setting permissions and checking disk usage.

  1. hdfs dfs -chmod: Change File or Directory Permissions
    • Usage: hdfs dfs -chmod [permissions] /hdfs/path/to/file
    • Explanation: Changes the file or directory permissions in HDFS. Uses the same syntax as Linux chmod.
    • Example:
      • hdfs dfs -chmod 755 /user/hadoop/input
    • Output: Sets the permissions of /user/hadoop/input to 755 (rwxr-xr-x).
  2. hdfs dfs -chown: Change File or Directory Ownership
    • Usage: hdfs dfs -chown [owner:group] /hdfs/path/to/file
    • Explanation: Changes the owner and group of files or directories in HDFS.
    • Example:
      • hdfs dfs -chown hadoop:supergroup /user/hadoop/input/data.txt
    • Output: Changes the owner to hadoop and group to supergroup for data.txt.
  3. hdfs dfs -chgrp: Change Group Ownership
    • Usage: hdfs dfs -chgrp [group] /hdfs/path/to/file
    • Explanation: Changes the group of files or directories in HDFS.
    • Example:
      • hdfs dfs -chgrp analytics /user/hadoop/input/data.txt
    • Output: Changes the group of data.txt to analytics.
  1. hdfs dfs -du: Display Disk Usage
    • Usage: hdfs dfs -du /hdfs/path/to/directory
    • Explanation: Displays the disk usage of files and directories in HDFS. It shows the amount of space used by the files or directories. You can add -s for a summary of space usage and -h for human-readable format.
    • Example:
      • hdfs dfs -du -h /user/hadoop/input
    • Output: Displays the disk usage in a human-readable format for the directory /user/hadoop/input.
  2. hdfs dfs -df: Show Filesystem Disk Space Usage
    • Usage: hdfs dfs -df /hdfs/path
    • Explanation: Displays the total, used, and available space on the filesystem where the given HDFS path is located. This is useful for monitoring disk usage and ensuring there is enough space for new files.
    • Example:
      • hdfs dfs -df -h /user/hadoop
    • Output: Shows the disk space usage in a human-readable format for the filesystem containing /user/hadoop.
  3. hdfs dfs -stat: Display File or Directory Statistics
    • Usage: hdfs dfs -stat [format] /hdfs/path/to/file
    • Explanation: Displays statistics of a file or directory in HDFS. You can use format flags like %b for file size in bytes, %n for file name, etc.
    • Example:
      • hdfs dfs -stat "%n %b" /user/hadoop/input/data.txt
    • Output: Displays the name and size in bytes of the file data.txt.
  4. hdfs dfs -checksum: Get Checksum of a File
    • Usage: hdfs dfs -checksum /hdfs/path/to/file
    • Explanation: Calculates and displays the checksum of a file in HDFS. This is useful for verifying data integrity.
    • Example:
      • hdfs dfs -checksum /user/hadoop/input/data.txt
    • Output: Displays the checksum information for data.txt.
  5. hdfs dfs -copyFromLocal: Copy File from Local Filesystem to HDFS (Alternative to -put)
    • Usage: hdfs dfs -copyFromLocal /local/path/to/file /hdfs/destination/path
    • Explanation: Another command to copy files from the local filesystem to HDFS. It works similarly to -put.
    • Example:
      • hdfs dfs -copyFromLocal /home/hadoop/localfile.txt /user/hadoop/input/
    • Output: The file localfile.txt is copied to /user/hadoop/input in HDFS.
  6. hdfs dfs -copyToLocal: Copy File from HDFS to Local Filesystem (Alternative to -get)
    • Usage: hdfs dfs -copyToLocal /hdfs/source/path /local/destination/path
    • Explanation: Another command to copy files from HDFS to the local filesystem. It works similarly to -get.
    • Example:
      • hdfs dfs -copyToLocal /user/hadoop/input/data.txt /home/hadoop/
    • Output: The file data.txt is copied from HDFS to /home/hadoop/.

Advanced HDFS Commands #

These commands provide more control and options for managing HDFS storage, file replication, and access.

  1. hdfs dfsadmin -report: HDFS Cluster Report
    • Usage: hdfs dfsadmin -report
    • Explanation: Provides a summary of the HDFS cluster status, including the number of data nodes, total and used storage capacity, and more.
    • Example:
      • hdfs dfsadmin -report
    • Output: Displays a detailed report of the HDFS cluster status.
  2. hdfs dfs -setrep: Set Replication Factor
    • Usage: hdfs dfs -setrep -w [replication] /hdfs/path/to/file
    • Explanation: Changes the replication factor of a file or directory in HDFS. You can specify a replication factor to ensure data redundancy and fault tolerance.
    • Example:
      • hdfs dfs -setrep -w 3 /user/hadoop/input/data.txt
    • Output: Sets the replication factor of data.txt to 3.
  3. hdfs dfsadmin -safemode: Manage Safe Mode
    • Usage: hdfs dfsadmin -safemode [enter | leave | get]
    • Explanation: Manages HDFS safe mode, a read-only mode for maintenance tasks. You can enter, leave, or get the current status of safe mode.
    • Example:
      • hdfs dfsadmin -safemode get
    • Output: Displays the current safe mode status of HDFS.
  4. hdfs dfsadmin -finalizeUpgrade: Finalize HDFS Upgrade
    • Usage: hdfs dfsadmin -finalizeUpgrade
    • Explanation: Finalizes the HDFS upgrade process after a version upgrade. Once finalized, the previous HDFS version’s files are removed and cannot be rolled back.
    • Example:
      • hdfs dfsadmin -finalizeUpgrade
    • Output: Finalizes the HDFS upgrade process.

Conclusion #

HDFS commands provide powerful and flexible options for managing data in a distributed environment. Understanding these commands can help you effectively manage storage, monitor system health, and ensure data integrity within a Hadoop ecosystem.

What are your feelings
Updated on September 3, 2024