Comments on "3 TB disks are Here" from Linux Magazine

Linux Magazine published an article last week, 3 TB Drives are Here. On Twitter, I originally said it was wrong, but that’s a bit harsh. Parts of it, however, very misleading, and parts of it unnecessarily confusing.

The “2.199 TB” limit describes Logical Block Addressing (aka LBA), a scheme for addressing sectors on modern disks. Sectors are numbered 0 to n, where n is a number dependent on the disk’s size (i.e. disk size in bytes divided by sector size). There’s nothing intrinsically limiting about LBA, other than how many bits you can devote to store such an address. With this in mind, the sentence:

The LBA scheme uses 32-bit addressing under the MBR partitions.

is very misleading. I hate to be a grammar nazi, but it’s a misuse of active versus passive voice. This phrasing makes it seem as if LBA is the limitation; it’s not. Master Boot Record (MBR) blocks are what limit LBA addresses to 32-bits, and are what limit partitions to 2.199 TB.

The article then moves to discuss 4 KB sectors. While nothing here is wrong,it ignores the fact that current “4 KB sector disks” on the market (i.e. marketed as “Advanced Format”) do not work in the way described.

Most Advanced Format disks continue to report that their sectors are 512 bytes, a mode called 512e. Because of this, your “4 KB sector” disk still is limited to 2.199 TB when using MBR partition tables (the article, confusingly, implies otherwise).

However, they do use 4 KB sectors internally. That is, a request for sector 0 and 3 both, internally, request the same 4 KB sector. There are significant performance problems here: if you request sector 3 and 4, these internally map to two different 4 KB sectors. This becomes a problem when your filesystem uses 4 KB blocks (i.e. most modern filesystems, including NTFS, ext4, XFS, etc) that are not aligned to these boundaries: a 4 KB read may cause the drive to unnecessarily read 8 KB. The article does not mention anything about this sector alignment problem.

Discussing other operating systems, the article vaguely mentions “several operating systems” have switched to GPT (GUID Partition Tables). I really hate how vague the article is here: as far as I know, the only OS that does this by default is Apple’s Mac OS X. The article sells Linux short when it says:

In the consumer world this is a downside since most motherboards don’t have a BIOS that is GPT capable. This can affect all operating systems including Linux.

because, in fact, most motherboards do have a BIOS that can boot from GPT, especially when you use a hybrid MBR. And Linux, with GRUB 2, works fantastically with them. Unfortunately, compatibility is a crapshoot, and is not advertised. However, all the systems I’ve experimented on, some as old as 2005, worked fine booting from GPT. Where Linux definitely falls short is that no distribution (AFAIK) will setup a GPT for you.

With that in mind, it’s difficult to say:

Linux is ready for 4KB drive sectors with 64-bit LBA addressing

When it really isn’t. The largest obstacle is the sector alignment problem that the article glosses over, best explained by Theodore T’so’s Aligning filesystems to an SSD’s erase block size. His post, in short:

  • Linux partitioning utilities are hard-coded to assume 512-sectors, which create problems for 4 KB-sector disks and disks with larger block sizes (i.e. SSDs)
  • Various filesystem structures are not aligned to 4 KB boundaries (T’so points out LVM)

All of which kill performance, and in the case of SSDs, shorten lifespan.

One thing that bothers me about this article is that while it tries to explain the issues involved with 4 KB sector disks, it does nothing to tell you how to mitigate or avoid any of them. In the next couple of weeks, stay tuned for a few articles from me explaining how you can get around them with Linux.

Like this article? Please support my writing! Flattr my blog (see my thoughts on Flattr), tip me via PayPal, or send me an item from my Amazon wish list.

Comments

omf's picture
Nice, readable, and thorough response, Samat. I'm looking forward to your follow-up on possible solutions.