Linux: How to use tar to compress files

James Clark
4 min readAug 7, 2020
Photo by Tomas Sobek on Unsplash

In this article, I will talk about the tar command and how to use it to create compressed archive files. If coming from a Windows background, you may be familiar with zip which takes a set of files and directories and compresses them resulting in a single new file with the .zip extension. tar, on the other hand, is a more flexible Unix/Linux command which was even added to Windows in 2018 and can use some different compression algorithms. But first, let’s go through use cases and why you’d want to use something like tar.

There are many use cases for a compression/archive tool like .zipor tar. Imagine you need to send someone hundreds of files over email. You could add each file to the email as an attachment, but this could get unmanageable and is certainly not practical. Imagine you want to take a regular backup of a set of files, maybe from a web server. You might copy all the files into a new directory each time the backup process runs.

I’m sure there are many other use cases too, but one solution to the above would be to archive the files. This means you get one “archive” file that contains all the other files. This archive could also be compressed so that its size is less than the sum of the sizes of all the other files.

tar

tar stands for “tape archive” as its original purpose was to write data to sequential I/O devices such as a magnetic tape drive. Now the tar command is used to simply bundle up one or more files/directories into a single file, often called a “tarball”. tar can be used to create a tarball with or without compression, the user has a choice over which compression algorithm to use if any.

Creating a compressed archive

The most common compression algorithm I have seen used with tar is gzip, resulting in the creation of a .tar.gz file. bzip2 is the other main algorithm that tar supports. It’s slower than gzip but the compressed file produced can potentially be smaller too.

Let’s dive into my go-to command to create one of these gzip-compressed archives:

tar -czvf backup.tar.gz httpdocs

This will create a new gzipped archive called backup.tar.gz of the directory httpdocs.

Those options passed to the tar command are:

-c = Create mode
-z = gzip mode
-v = verbose mode (optional, it will print out the name of each file it compresses)
-f = filename of the archive

The order of the flags doesn’t matter except for f which must come last because this flag expects a value to follow it (the name of the archive file). The expanded version of the command would be:

tar -c -z -v -f backup.tar.gz httpdocs

The v flag, while useful, could potentially slow down the process if you are working over a slow SSH connection and creating a large archive.

Multiple files/directories can also be passed to the list:

tar -czvf backup.tar.gz httpdocs file2 directory2 file3 directory4

Extracting a compressed archive

The command to decompress a tar is similar except instead of create mode one must use extract mode with the -x flag.

tar -xzvf backup.tar.gz

Continuing our example above, when this is run, the original httpdocs directory will be created in the current working directory. Be careful when running this as the default behaviour when extracting is to overwrite any existing files. Ensure this is the behaviour you require first. It may be safer to move the tar file into a new temporary directory and extract in there before moving the extracted files elsewhere. There several flags on the tar command to control overwriting, notably:

-k, --keep-old-files
don't replace existing files when extracting, treat them as errors
--keep-newer-files
don't replace existing files that are newer than their archive copies

Tar bombs

As mentioned above, when an archive is extracted the contents are dumped in the current working directory. So if you need to archive a whole directory, there are two approaches: run tar from within the directory or run it from outside. If you took the former approach, then upon extraction all the files from the source directory will be dumped in your current working directory. This is called a “tar bomb”. This might be what you want to do, or you could take precautions and only extract the file in a new temporary directory. But you run the risk of the extraction polluting the current directory and overwriting existing files. Therefore it is usually safer and best practice to take the latter approach and archive the directory from outside the directory itself.

--

--