Question up front:
Is there a pythonic way in the standard library for parsing raw binary files using for ... in ... syntax (i.e., __iter__/__next__) that yields blocks that respect the buffersize parameter, without having to subclass IOBase or its child classes?
Detailed explanation
I'd like to open a raw file for parsing, making use of the for ... in ... syntax, and I'd like that syntax to yield predictably shaped objects. This wasn't happening as expected for a problem I was working on, so I tried the following test (import numpy as np required):
In [271]: with open('tinytest.dat', 'wb') as f:
...: f.write(np.random.randint(0, 256, 16384, dtype=np.uint8).tobytes())
...:
In [272]: np.array([len(b) for b in open('tinytest.dat', 'rb', 16)])
Out[272]:
array([ 13, 138, 196, 263, 719, 98, 476, 3, 266, 63, 51,
241, 472, 75, 120, 137, 14, 342, 148, 399, 366, 360,
41, 9, 141, 282, 7, 159, 341, 355, 470, 427, 214,
42, 1095, 84, 284, 366, 117, 187, 188, 54, 611, 246,
743, 194, 11, 38, 196, 1368, 4, 21, 442, 169, 22,
207, 226, 227, 193, 677, 174, 110, 273, 52, 357])
I could not understand why this random behavior was arising, and why it was not respecting the buffersize argument. Using read1 gave the expected number of bytes:
In [273]: with open('tinytest.dat', 'rb', 16) as f:
...: b = f.read1()
...: print(len(b))
...: print(b)
...:
16
b'M\xfb\xea\xc0X\xd4U%3\xad\xc9u\n\x0f8}'
And there it is: A newline near the end of the first block.
In [274]: with open('tinytest.dat', 'rb', 2048) as f:
...: print(f.readline())
...:
b'M\xfb\xea\xc0X\xd4U%3\xad\xc9u\n'
Sure enough, readline was being called to produce each block of the file, and it was tripping up on the newline value (corresponding to 10). I verified this reading through the code, lines in the definition of IOBase:
571 def __next__(self):
572 line = self.readline()
573 if not line:
574 raise StopIteration
575 return line
So my question is this: is there some more pythonic way to achieve buffersize-respecting raw file behavior that allows for ... in ... syntax, without having to subclass IOBase or its child classes (and thus, not being part of the standard library)? If not, does this unexpected behavior warrant a PEP? (Or does it warrant learning to expect the behavior?:)