Memory alignment

Question

I have understood why memory should be aligned to 4 byte and 8 byte based on data width of the bus. But following statement confuses me

"IoDrive requires that all I/O performed on a device using O_DIRECT must be 512-byte alligned and a multiple of 512 bytes in size."

What is the need for aligning address to 512 bytes.

Answer · Accepted Answer · 2010-12-16T20:21:42.247

Blanket statements blaming DMA for large buffer alignment restrictions are wrong.

Hardware DMA transfers are usually aligned on 4 or 8 byte boundaries since the PCI bus can physically transfer 32 or 64bits at a time. Beyond this basic alignment, hardware DMA transfers are designed to work with any address provided.

However, the hardware deals with physical addresses, while the OS deals with virtual memory addresses (which is a protected mode construct in the x86 cpu). This means that a contiguous buffer in process space may not be contiguous in physical ram. Unless care is taken to create physically contiguous buffers, the DMA transfer needs to be broken up at VM page boundaries (typically 4K, possibly 2M).

As for buffers needing to be aligned to disk sector size, this is completely untrue; the DMA hardware is completely oblivious to the physical sector size on a hard drive.

Under Linux 2.4 O_DIRECT required 4K alignment, under 2.6 it's been relaxed to 512B. In either case, it was probably a design decision to prevent single sector updates from crossing VM page boundaries and therefor requiring split DMA transfers. (An arbitrary 512B buffer has a 1/4 chance of crossing a 4K page).

So, while the OS is to blame rather than the hardware, we can see why page aligned buffers are more efficient.

Edit: Of course, if we're writing large buffers anyways (100KB), then the number of VM page boundaries crossed will be practically the same whether we've aligned to 512B or not. So the main case being optimized by 512B alignment is single sector transfers.

score 4 · Answer 2 · answered Aug 12 '10 at 17:02

4

Usually large alignment requirements like that are due to underlying DMA hardware. Large block transfers can sometimes be made much faster by requiring much stronger alignment restrictions than what you have here.

On several ARM processors, the first level translation table has to be aligned on a 16 KB boundary!

answered Aug 12 '10 at 17:02

Carl Norum

219,201
40
422
469

how is it made faster by aligning to 512 bytes as if data is transfered 4 bytes in a cycle – Sirish Aug 12 '10 at 17:09
@siri, that's the point - it might not be. It might be transferred 8, 16, 32, or even more, like all 512 bytes in a single cycle. DMA hardware can do basically anything - it's all very implementation dependent. – Carl Norum Aug 12 '10 at 17:10
4

@siri: It is made faster by not having the processor involved in the transmission at all (that is what DMA is all about), but DMA hardware sometimes imposes limits above and beyond those implicit in the architecture itself. – dmckee --- ex-moderator kitten Aug 12 '10 at 17:10
A much nicer explanation than mine, and you used the magic word "DMA" – Matt Joiner Aug 12 '10 at 17:29

score 0 · Answer 3 · answered Aug 12 '10 at 17:11

If you don't know what you're doing, don't use O_DIRECT.

O_DIRECT means "direct device access". This means it bypasses all OS caches, hitting the disk (or possibly RAID controller, etc) directly. Disk accesses are on a per-sector basis.

EDIT: ~~The alignment requirement is for the IO offset/size; it's not usually a memory-alignment requirement.~~

EDIT: If you're looking at this page (it appears to be the only hit), it also says that the memory must be page-aligned.

Memory alignment

3 Answers3

Linked