My 3.4GB/s rated SSD reads Numpy arrays of varying sizes at a peak speed of 1.3GB/s, a mere 38%. SSD testing shows it operates at the fully 3.4GB/s. I've had more luck with the previous (now dead) SSD. Arrays are sized 4GB/8GB, application allows changing size.
Problem with memmap is it relies on pagefile, and compression is net-slower in my case. AFAIK Numpy is single-threaded; could parallelism be the solution? If so, any libraries with API-side support?
I've looked at joblib but it lacks any explicit numpy.load examples. I've tried multiprocessing a while back but didn't get far, nor could I find an example anywhere. Note the array is N-dim, where N >= 4, and 'float32' or 'float16'.
Win 10 x64, 24GB RAM, Intel i7-7700HQ 4 core/8 threads, Samsung 970 EVO Plus or Sabrent Rocket.