Why do some filesystems have fragmentation and others don't?

Question

I try to understand why fragmentation is a problem for NTFS and FAT but not when using inodes. In all cases, files are not necessarily stored in a contiguous fashion so I don't see the problem for the former two.

So where is the crucial difference?

Wandering Logic · Answer 1 · 2013-07-22T02:46:09.813

I think the term inode is misplaced here. I am unaware of any way in which Inodes have anything to do with optimizations to reduce disk fragmentation. The term inode goes back to Unix Version 1 from 1971. It was not until the Berkeley Fast File System in BSD 4.2 in 1983 that anything significant was done to combat disk fragmentation. In Unix systems the term inode refers to the data structure that holds all the metadata for a file (its length, permissions, owner, and the pointers to the data blocks for the file (arranged as a tree in most Unix file systems since 1983.) The file name is not stored in the inode. Rather directory entries contain the name and an inode number.

NTFS has a data structure analogous to the inode, that contains similar information, and similarly does not contain the file name. In NTFS this structure is called a master file table entry. Given the similarities between an inode and NTFS's master file table entries, I can't see any impact on any kind of fragmentation. FAT also has a similar data structure, the directory entry. Unlike NTFS and Unix inodes, the FAT directory entry contains the file name along with all the other file information. Putting the file name with the rest of the file metadata has both positive and negative consequences, but I can't figure out how this difference would have any bearing whatsoever on disk fragmentation.

In fact FAT has a subtle (and not very important) advantage over NTFS and most Unix file systems with inodes. In systems with inodes (or NTFS's master file table), the inodes are allocated in inode tables, and in some systems (like Linux's ext2) you had to choose the ratio of inode tables to data blocks. This meant that if you had more small files than you expected you might run out of inode entries even though more data space was still available, and if you have larger files than expected you might run out of data space even though there is empty space allocated for unused inode tables.

Every filesystem has disk fragmentation. Your perception of which actually suffer from real problems may not be completely accurate. For example: NTFS has an interface specifically designed to make it easier to write user-level defragmentation utilities that can keep serving file requests while defragmentation is going on, while ext4 (the most commonly used file system on recent Linux systems) does not. That doesn't necessarily mean that ext4 has less disk fragmentation than NTFS, it just means that if an ext4 file system gets fragmented there is nothing you can do about it. Both ext4 and NTFS have a number of optimizations to reduce the probability of serious disk fragmentation problems.

You should think of a filesystem as a data structure with a set of algorithms for modifying the data structure. As with any other data structure the designer needs to trade off a variety of different desirable and undesirable traits.

One of the decisions a filesystem implementation needs to make is block size. With larger blocks larger chunks of data are guaranteed to be adjacent. On the other hand, it is rarely the case that a filesystem allows a block to be used by more than a single file at a time (the FreeBSD UFS2 system (c. 2003) being an exception), so if you choose larger blocks but then have many small files those files are going to be spread further apart on the disk (this is so called internal fragmentation.)

Almost every modern file system (including NTFS, HFS+ (Mac), UFS2 (FreeBSD), ZFS (Solaris) and ext4 (Linux)) has a variety of heuristics included to help reduce the number of disk-head movements required during a complete file access. These include allocating a file's data near its inode, locating the inode near the directory that contains it, and allocating adjacent blocks for data writes that are known to be longer than a block. Journaling file systems try to combine together many small writes into larger chunks that can be written to adjacent blocks.

Some file systems implement delayed allocation, which permits them to decrement the free-space counter for the disk when the space is requested, but delay choosing which block to write the data to until the time at which the data needs to be flushed from data to memory. This requires coordination between the file system and other parts of the operating system. The wikipedia article does not list NTFS as performing this optimization (and FAT system almost certainly don't bother with anything that sophisticated.)

Another thing you should take into account is whether the file system is going to be used on a hard disk or on a solid-state drive (or flash drive). Disk fragmentation does not matter for solid state drives (because there are no mechanical delays, so every block is equally fast to access). On the other hand for solid-state/flash drives wear leveling matters a lot.

Why do some filesystems have fragmentation and others don't?

1 Answers1