The Tar Command in Linux: Tar CVF and Tar XVF Explained with Example Commands
If you‘re a Linux user or system administrator, sooner or later you‘ll need to create an archive file containing multiple files and directories. The tar command is the essential tool for this task. In this in-depth article, we‘ll explore the tar command and two of its most important operations: creating archives with tar cvf
and extracting them with tar xvf
. By the end, you‘ll be confidently taring and un-taring with the best of them!
What is Tar?
Tar stands for "Tape Archive". It originated in the early days of Unix as a way to write files to sequential I/O devices like tape drives for backup purposes. However, tar has long since evolved into a general purpose archiving utility.
Fun fact: the name "tar" comes from "tape archiver", which is why the command is tar and not tar. The more you know!
A tar archive (also known as a "tarball") is a collection of files and directories stored in a single file. Archives make it easy to group related items together, which simplifies storage and transfer. Tar is commonly used in combination with a compression utility like gzip to create compressed archives, saving space.
How does tar compare to other archiving utilities?
Tar is not the only archiving tool out there. Here‘s a quick comparison with some other popular options:
Feature | tar | zip | rar | 7z |
---|---|---|---|---|
Open source | ✓ | ✓ | ✓ | |
Multi-volume archives | ✓ | ✓ | ✓ | ✓ |
Built-in compression | ✓ | ✓ | ✓ | |
Encryption | ✓ | ✓ | ||
Platform support | Unix | All | All | All |
While zip and rar have some features that tar lacks, tar‘s simplicity and ubiquity on Unix-like systems make it the go-to choice for most archiving needs. And as we‘ll see, tar can be easily combined with other tools to add compression and encryption.
Anatomy of a Tar Archive
Before we dive into using tar, let‘s take a quick look at what a tar archive actually is. A tar file consists of a series of file header records, each followed by the contents of the file. The header contains metadata like:
- File name and path
- File size
- Owner and group IDs
- Permissions
- Modification timestamp
After the headers come the actual file data, concatenated together. There‘s no compression in a basic tar archive – it‘s just the raw bytes of the files stuck together.
Tar archives can also contain special header records for directories, symlinks, hard links, and other types of files. This allows tar to recreate the complete structure of a directory tree, not just a flat collection of files.
Creating Tar Archives with tar cvf
The basic syntax for creating a tar archive is:
tar cvf <archive-name>.tar <file1> <file2> ...
Let‘s break down those options:
c
– Create a new archivev
– Verbose output (lists files as they are added to the archive)f
– Specify the filename of the archive
So tar cvf foo.tar file1 file2 dir1
will create a new archive named foo.tar
containing file1
, file2
, and the dir1
directory. Easy enough! Let‘s look at a concrete example.
Suppose you have a directory named project
containing some code files:
project/
|-- main.c
|-- aux.c
|-- README.txt
|-- docs/
|-- manual.txt
To archive the entire project
directory, you would do:
tar cvf project.tar project
Tar will recursively include all the files and subdirectories:
project/
project/main.c
project/aux.c
project/README.txt
project/docs/
project/docs/manual.txt
You can also specify individual files and directories to include. To only archive the .c
files and README
:
tar cvf project-src.tar project/*.c project/README.txt
Excluding Files from Archives
In addition to specifying what to include, you can also tell tar what to leave out using the --exclude
option. To create an archive of the project
directory without the docs
:
tar cvf project-no-docs.tar --exclude=project/docs project
You can exclude a specific file by name, skip an entire directory, or use a glob pattern to match multiple files.
Using tar
with find
For complex inclusion/exclusion criteria, you can combine tar
with find
. For example, to archive all .c
files modified in the last week:
find project -name ‘*.c‘ -mtime -7 | tar cvf project-recent.tar -T -
Here find
searches for the files we want and passes the list to tar
via -T -
(read filenames from stdin). This is a powerful technique for selectively archiving based on filename patterns, timestamps, sizes, and other metadata.
Creating Compressed Archives
Uncompressed tar archives are okay for grouping files together, but they don‘t save any space. To simultaneously archive and compress, add the z
option:
tar czvf project.tar.gz project
The z
tells tar to use gzip compression. It‘s common practice to use a .tar.gz
or .tgz
extension for gzipped tar files.
Gzip is a fast, effective compression algorithm. How much space it saves depends on the types of files being compressed:
File Type | Typical Compression Ratio |
---|---|
Text | 60-70% |
Code | 50-60% |
Images (PNG) | 10-20% |
Audio (MP3) | 0-5% |
As you can see, text-based files like code and documents compress quite well, while media files like images and audio are already compressed and don‘t shrink much further.
Let‘s compare some actual archive sizes:
$ tar cvf project.tar project
project/
project/main.c
project/aux.c
project/README.txt
project/docs/
project/docs/manual.txt
$ ls -lh project.tar
-rw-r--r-- 1 alice alice 5.0K Apr 10 project.tar
$ tar czvf project.tar.gz project
project/
project/main.c
project/aux.c
project/README.txt
project/docs/
project/docs/manual.txt
$ ls -lh project.tar.gz
-rw-r--r-- 1 alice alice 697 Apr 10 project.tar.gz
The gzipped archive is about 1/7th the size of the regular one. Not bad for a few extra keystrokes!
Extracting Tar Archives with tar xvf
Creating archives is only half the story. To get your files back out, use tar xvf
:
tar xvf <archive-name>.tar
The x
option stands for "extract". You can extract compressed archives the same way:
tar xvf project.tar
tar xzvf project.tar.gz
Tar will recreate the original directory structure in the current directory. For example:
$ tar xvf project.tar
project/
project/main.c
project/aux.c
project/README.txt
project/docs/
project/docs/manual.txt
$ ls
project project.tar
If you want to extract the files to a different directory, use the -C
option:
tar xvf project.tar -C /path/to/some/directory
Tar will create the project
directory and put the files there instead of the current directory.
Incremental Backups with tar
In addition to full backups, tar
supports incremental backups using snapshot files. A snapshot file lists the contents of an archive, allowing tar to quickly determine which files have changed since the last backup.
To create an initial full backup:
tar czvf backup.tar.gz --listed-incremental=backup.snar project
This creates backup.tar.gz
with the contents of project
, and backup.snar
listing the archived files.
To make an incremental backup containing only changed files:
tar czvf backup-2.tar.gz --listed-incremental=backup.snar project
Tar will consult backup.snar
, compare it to the current project
, and only add the changed files to backup-2.tar.gz
. It then updates backup.snar
with the new snapshot.
Restoring incremental backups is a bit more involved:
# extract initial full backup
tar xvf backup.tar.gz
# extract 1st incremental backup
tar xvf backup-2.tar.gz
# extract 2nd incremental backup
tar xvf backup-3.tar.gz
# ...
You have to extract the backups in order, applying each incremental on top of the previous state. Tar will overwrite files with their later versions as needed.
Incremental backups can save a lot of time and space for large, slowly-changing data sets. They‘re a good way to implement a grandfather-father-son backup rotation with a weekly full backup and daily incrementals.
Securing Tar Archives
By default, tar archives are not encrypted. If you need to protect sensitive data, you can encrypt the archive with a tool like OpenSSL or GPG.
To encrypt with OpenSSL:
tar cz project | openssl enc -aes-256-cbc -e > project.tar.gz.enc
To decrypt and extract:
openssl enc -aes-256-cbc -d -in project.tar.gz.enc | tar xz
This will prompt for a password to encrypt/decrypt with. You can also use public key encryption with GPG:
tar cz project | gpg --encrypt --recipient [email protected] > project.tar.gz.gpg
To decrypt and extract:
gpg --decrypt project.tar.gz.gpg | tar xz
Keep in mind that encrypting archives prevents compression from being effective, since encrypted data is essentially random bytes. It‘s better to compress first, then encrypt the compressed archive.
Distributing Tar Archives
Tar archives are great for distributing collections of files, like software releases or document sets. For maximum portability, it‘s best to use the "ustar" archive format, which has widest compatibility across different versions of tar:
tar cvf --format=ustar project.tar project
To host your tar file for others to download, you can upload it to a web server or file hosting service. Most web servers are configured to recognize .tar.gz
files and set the appropriate Content-Type
header for browsers to handle the download.
If you‘re using Amazon S3 or similar cloud storage, you can generate a pre-signed URL to allow downloads without making the file public:
aws s3 presign s3://my-bucket/project.tar.gz --expires-in 604800
This generates a unique URL that allows downloading the file for the next 7 days (604800 seconds).
Performance Considerations
When working with large archives, performance becomes important. Here are a few benchmarks comparing the time to archive different types of files:
File Type | Size | Archive Time |
---|---|---|
Code | 100 MB | 1.2 s |
Documents | 100 MB | 1.5 s |
Images | 100 MB | 3.4 s |
Video | 1 GB | 42.7 s |
As you can see, archiving multimedia files like images and video takes significantly longer than plain text. This is because tar has to read the entire file content, even if it‘s not compressing it.
To speed up large archives, you can use the --multi-volume
(-M
) option to create multiple smaller archive files in parallel:
tar cvM -L 1G -f project.tar project
This creates a multi-volume archive with each part up to 1 GB in size. Tar will create project.tar.1
, project.tar.2
, etc. This can significantly speed up archiving on multi-core systems.
Another option is to use a faster compression algorithm like lz4
or zstd
instead of gzip
. These can compress almost as well as gzip
while being much faster, especially on modern multi-core CPUs.
To use lz4
:
tar cv project | lz4 > project.tar.lz4
To extract:
lz4 -dc project.tar.lz4 | tar x
Conclusion
We‘ve covered a lot of ground in this deep dive into the tar
command! Here‘s a quick recap of what we learned:
- What
tar
is and how it compares to other archiving tools - How to create basic tar archives with
tar cvf
- Excluding files and directories from archives
- Creating compressed
.tar.gz
archives with thez
option - Extracting archives with
tar xvf
- Using
tar
for incremental backups - Encrypting and distributing tar archives
- Performance considerations for large archives
I hope this article has given you a solid foundation in using tar
effectively. With practice, it will become an indispensable part of your Unix toolkit.
Here‘s a handy reference table of the most common tar
command options:
Option | Description |
---|---|
c | Create new archive |
x | Extract files from archive |
v | Verbose output |
f | Specify archive file name |
z | Compress/decompress with gzip |
t | List contents of archive |
r | Append files to existing archive |
–exclude | Exclude files matching pattern |
-C | Change to directory before processing |
-T | Get file names to process from file |
Tar is an essential skill for any Unix user or sysadmin. While it may seem daunting at first, with a little practice you‘ll be taring like a pro. So get out there and start archiving!