Tar in Linux – Tar GZ, Tar File, Tar Directory, and Tar Compress Command Examples

If you‘ve spent any time at all working in Linux, you‘ve likely encountered the ubiquitous "tar" command. Short for "tape archive", tar is the go-to utility for combining multiple files and directories into a single archive file for easy storage and transfer. In this deep dive, we‘ll explore the inner workings of tar, review best practices, and look at examples of using tar effectively in real-world scenarios.

Table of Contents

  1. What is Tar?
  2. How Tar Works
  3. Basic Tar Command Syntax
  4. Creating and Extracting Archives
  5. Compressing Archives
  6. Listing and Updating Archives
  7. Excluding Files
  8. Scripting with Tar
  9. Advanced Tar Usage
  10. Tar Performance Benchmarks
  11. Tar vs Other Archiving Tools
  12. Tar Best Practices and Pitfalls
  13. Conclusion

What is Tar?

At its core, tar is a utility for storing and extracting files from an archive known as a tarball. A tarball is simply a collection of files and directories bundled into a single file for convenient storage and transfer.

While tar was originally developed for writing data to sequential I/O devices like tape drives, today it‘s most commonly used for distributing software source code, transmitting large numbers of files over networks, and backing up data.

Tarballs preserve the directory structure and file metadata like permissions and timestamps. By default tar does not perform any compression on the files added to the archive. Compression is typically done as a separate step using a utility like gzip or bzip2, resulting in a compressed file with a .tar.gz, .tgz, .tar.bz2, or .tbz extension.

How Tar Works

To understand how to use tar effectively, it helps to know a bit about how it works under the hood.

A tar archive consists of a series of file objects, each representing a file in the archive. Each file object contains metadata about the file (path, permissions, owner, size, etc.) as well as the file data itself.

When creating an archive, tar reads each specified file or directory in turn, creates a file object in the archive, and copies the file data into the object. For directories, it recursively processes all subdirectories and files.

On extraction, tar reads each file object from the archive, creates a corresponding file on disk with the stored metadata, and writes the file data to the new file.

Some key things to understand about how tar handles different file types:

  • Symbolic Links: By default, tar archives the file pointed to by symlinks. The -h option can be used to archive just the symlinks themselves.

  • Hard Links: Tar detects hard links and archives each hard-linked file only once. On extraction, the additional hard links will be recreated.

  • Sparse Files: For sparse files (files with large blocks of zero bytes), tar only stores the non-zero blocks in the archive to save space.

  • Extended Attributes: Linux filesystems support storing extended attributes (xattrs) on files, like ACLs and SELinux contexts. Tar preserves xattrs by default.

  • Pipes and Devices: Special file types like named pipes and device nodes can be stored in tar archives, but restoring them may require root privileges.

Basic Tar Command Syntax

The basic syntax of the tar command is:

tar [options] [archive-file] [file or directory to be archived]

The three main components are:

  1. Options that control tar‘s behavior, specified with a dash and single letters. Some common ones:

    • c: Create an archive
    • x: Extract an archive
    • f: Specify the archive file name
    • z: Compress the archive with gzip
    • j: Compress the archive with bzip2
    • v: Verbose output
  2. The name of the tar archive file to create or extract.

  3. For creating an archive, the list of files and directories to include. For extracting, this is optional and defaults to the current directory.

Creating and Extracting Archives

To create a new tar archive:

tar cfv archive.tar file1 file2 directory1

This creates archive.tar containing file1, file2, and the contents of directory1. The "v" option enables verbose output.

To create an archive from a whole directory:

tar cfv archive.tar ./source-directory 

To extract files from an archive:

tar xfv archive.tar

This extracts all files from archive.tar to the current directory. To only extract specific files:

tar xfv archive.tar file1 directory1

Compressing Archives

Tar archives are frequently compressed to save space, usually with gzip or bzip2.

To create a compressed archive with gzip:

tar czfv archive.tar.gz file1 file2

The "z" option specifies gzip compression. The resulting file has a .tar.gz or .tgz extension.

For bzip2 compression, use the "j" option:

tar cjfv archive.tar.bz2 file1 file2

Bzip2 compresses more than gzip but is slower. The output file has a .tar.bz2 or .tbz extension.

To extract compressed archives, use the appropriate option:

tar xzfv archive.tar.gz
tar xjfv archive.tar.bz2  

Listing and Updating Archives

To list the contents of an archive without extracting:

tar tfv archive.tar

For compressed archives:

tar tzfv archive.tar.gz
tar tjfv archive.tar.bz2

To add files to an existing uncompressed archive:

tar rfv archive.tar new-file

This isn‘t possible with compressed archives – you‘d need to extract, add the files, and recompress.

Excluding Files

To exclude specific files when creating an archive:

tar cfv archive.tar --exclude=‘*.jpg‘ --exclude=‘temp-dir‘ source-dir

This omits .jpg files and the temp-dir directory.

Exclusions can also be read from a file:

tar cfv archive.tar -X exclude-file.txt source-dir

Where exclude-file.txt lists patterns to exclude, one per line.

Scripting with Tar

Tar is often used in scripts for automating deployments, backups, and installations. Here are a couple examples.

A simple backup script:

#!/bin/bash
SOURCE_DIR="/var/www/html"
BACKUP_DIR="/backups"
TIMESTAMP=$(date +%F-%H%M)

tar czfv $BACKUP_DIR/www-$TIMESTAMP.tar.gz $SOURCE_DIR

This script backs up the /var/www/html directory to a timestamped, compressed archive in /backups.

Using tar in a Dockerfile to package an application:

FROM node:14-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN tar czfv app.tar.gz *
CMD ["node", "server.js"]  

This Dockerfile uses tar to bundle the application source into a compressed archive during the Docker image build.

Advanced Tar Usage

Some more advanced tar features for special situations:

  • Multi-volume archives: For archives split across multiple files/devices
  • Incremental archives: For efficiently archiving only changed files
  • Handling network sources: Using ssh or FTP/HTTP URLs as the archive source
  • Streaming to stdout: Sending archive data to another command
  • Setting block size: Tuning tar‘s I/O block size for better performance

Tar Performance Benchmarks

The choice of compression algorithm and level can significantly impact the size and creation speed of compressed archives. Here are some benchmarks comparing common options:

Algorithm Level Compress Time (s) Archive Size (MB)
gzip 1 10.2 23.5
gzip 6 23.8 20.1
gzip 9 34.1 19.8
bzip2 1 42.3 17.3
bzip2 9 58.7 16.9
xz 1 53.4 15.6
xz 6 117.9 11.2

(Benchmarks run on an Intel Core i7-8700K compressing a 100MB directory)

In general, higher compression levels result in smaller archives but longer compression times. Gzip is fastest, while xz achieves the best compression at the cost of speed. Bzip2 falls in-between.

Tar vs Other Archiving Tools

Here‘s how tar compares to some other common archiving utilities:

Tool Strengths Weaknesses Best Used For
tar Widely available, supports compression No built-in encryption Linux system backups and software distribution
cpio More archive formats, better for backups Less user-friendly, no compression initramfs, RPM packages
zip Cross-platform, widely supported No standard Unix metadata Sending archives to Windows users
7zip Highest compression, encryption support Slower, less Unix/Linux support Highly compressed archives, sensitive data
rar Very high compression Closed format, patented Proprietary software distribution

Tar Best Practices and Pitfalls

Tips for using tar effectively in production:

  • Always use –verify for critical archives
  • Be mindful of leading slashes on paths when extracting
  • Use tar over ssh/nc for secure remote transfer
  • Automate testing of your backup/restore process
  • Combine tar with rsync for efficient network backups
  • Consider volume management for very large archives
  • Steer clear of proprietary/patent-encumbered formats

Common tar mistakes to avoid:

  • Forgetting to check the exit code after creating an archive
  • Accidentally clobbering files with an incorrect extract path
  • Trying to modify compressed archives in-place
  • Distributing software with insecure permissions/ownership
  • Not budgeting enough time/space for large, highly-compressed tarballs

Conclusion

We‘ve taken a comprehensive look at the tar command and its role in the Linux ecosystem. While there are many archiving tools available, tar remains the de-facto standard for its wide availability, Unix heritage, and ability to preserve file metadata.

Whether you‘re a developer bundling source code, a sysadmin performing backups, or a devops engineer packaging applications for deployment, tar is an indispensable tool to master. By understanding its strengths, quirks, and best practices, you can wield tar to solve problems and automate workflows like a pro.

Now it‘s your turn! Try out tar for your next archiving task. Experiment with different compression options, craft some bash scripts, and see how much time and hassle tar can save over manually bundling files. You just might find that tar really sticks with you.

Similar Posts