Let’s learn more about journaling file systems and how they affect everyday computing.
What Is Journaling?
Imagine every file on a computer as a unique library catalog of journals, periodicals, or documents. Each new issue added to a catalog would change its information slightly. Instead of searching across the library for an entry, you only have to check the relevant catalog. Journaling in computing file systems works very similarly. Its purpose is to keep track of changes not yet committed to the file system. Even after any crashes or unexpected shutdowns, you can still access the latest file version with a lower likelihood of it becoming corrupted. The term “journal” comes from the analogy of a diary. Any changes you record in a diary entry are stored date- and time-wise. In a similar way, journaling allows all the updates to a file to be stored in a contiguous portion of the disk. These updates do not have to be located physically in proximity to each other: in fact, the journal file entries are scattered all over the disk. But instead of accessing them randomly, they are available in a diary-like sequence which is thousands of times faster. Journaling saves much time in file storage retrieval because of contiguous memory allocations.
Definitions
Depending on the operating system, there are different kinds of journaling entries which we will discuss below. Before we do that, we need to be clear about a few numerical terms. Tebibytes (TiB): we all know how much a gigabyte is. A tebibyte (TiB) is equal to 1024 (= 210) gigabytes. TiB is one of the default units to express large values in file storage. Also, 1 TiB = 1.09951 terabytes (TB). Pebibyte (PiB): a pebibyte (PiB) is equal to 1024 TiB or around a million gigabytes – a very large value indeed. Clusters: data clusters are the smallest unit of disk space which can be used to store a file. It can range from 512 bytes for a single sector to 64 KB for 128 sectors.
1. NTFS
New Technology File System (NTFS) is Microsoft’s default journaling system for Windows and Windows Server. It uses log files and checkpoint information to restore the stable values of a file system after a restart. NTFS supports large data volumes: for a 4 KB cluster size, it can accommodate 16 TiB of data. For a 64 KB cluster size (maximum), it means 256 TiB of data with 256 TiB as maximum file size. Nowadays, NTFS fixes any corruption in files online through what is known as “self-healing NTFS.” Windows 10 users might remember a downtime experience due to Chkdsk, which used to plague older Windows versions. In the latest self-healing NTFS update, the problem has been solved online, and no downtime occurs.
2. Ext
Extended File System (ext) has been Linux’s journaling system since the very beginning. It was inspired by the Unix File System (UFS) and has undergone three other iterations since its arrival in the early ’90s.
ext2: originally used in Debian and Red Hat Linux, ext2 is still used in flash media such as SD cards and USB drives. It can accommodate 2 to 32 TiB of data with a maximum cluster size of 8 KB.ext3: as the third extended file system, ext3 has been used with Linux, BSD, and ReactOS. The size limits are similar to ext2.ext4: the latest version of the extended file system, it is used by Google file storage, BSD, PowerPC, and most current Linux distributions. The size limits are equal to 1024 PiB or around a million TiB. The biggest cluster size is 64 KB.
ext4 uses checksums in the journal to improve reliability, as it can safely avoid a disk I/O wait during journaling and slightly improve the performance of the disk.
3. APFS
The Apple File System (APFS) is used with macOS High Sierra, iOS 10.3 and later, and a few other systems. It supports up to 8000 PiB (263 bytes), which is approximately eight times greater than ext4. The core capabilities of APFS are many: they include creating “snapshots,” which is like a photocopy of the system at a particular point. Like NTFS, it uses checksums to ensure data integrity and protects from system crashes using an approach called “copy on writer.” APFS uses full disk encryption.
Conclusion
Journaling in file systems is a basic insurance against system crashes and unexpected shutdowns. By writing changes to a journal quickly, we can ensure that all changes to files are recorded and not lost during power shutdowns or computer crashes. There are many journaled file systems apart from the ones discussed here. Oracle, VMware, BSD, Cisco, Solaris, and many others have their own proprietary journal units.