21

I'm working on a sparse list implementation and recently implemented assignment via a slice. This led me to discover some behaviour in Python's built-in list implementation that I find suprising.

Given an empty list and an assignment via a slice:

>>> l = []
>>> l[100:] = ['foo']

I would have expected an IndexError from list here because the way this is implemented means that an item can't be retrieved from the specified index::

>>> l[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

'foo' cannot even be retrieved from the specified slice:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]

l[100:] = ['foo'] appends to the list (that is, l == ['foo'] after this assignment) and appears to have behaved this way since the BDFL's initial version. I can't find this functionality documented anywhere (*) but both CPython and PyPy behave this way.

Assigning by index raises an error:

>>> l[100] = 'bar'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

So why does assigning past the end of a list via a slice not raise an IndexError (or some other error, I guess)?


To clarify following the first two comments, this question is specifically about assignment, not retrieval (cf. Why substring slicing index out of range works in Python?).

Giving into the temptation to guess and assigning 'foo' to l at index 0 when I had explicitly specified index 100 doesn't follow the usual Zen of Python.

Consider the case where the assignment happens far away from the initialisation and the index is a variable. The caller can no longer retrieve their data from the specified location.

Assigning to a slice before the end of a list behaves somewhat differently to the example above:

>>> l = [None, None, None, None]
>>> l[3:] = ['bar']
>>> l[3:]
['bar']

(*) This behaviour is defined in Note 4 of 5.6. Sequence Types in the official documentation (thanks elethan) but it's not explained why it would be considered desirable on assignment.


Note: I understand how retrieval works and can see how it may be desirable to be consistent with this for assignment but am looking for a cited reason as to why assigning to a slice would behave in this way. l[100:] returning [] immediately after l[100:] = ['foo'] but l[3:] returning ['bar'] after l[3:] = ['bar'] is astonishing if you have no knowledge of len(l), particularly if you're following Python's EAFP idiom.

Community
  • 1
  • 1
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • 2
    `a = l[100:]` doesn't cause an a error, just `a == []`, It's a reasonable interpretation that given `100` is beyond the `end` it just returns the `end`. In fact all slices where `start > stop` returns an empty list at `start` or `end` which ever is less. – AChampion Nov 12 '16 at 01:23
  • A slice is not an index, so why would you expect an `IndexError`? – ekhumoro Nov 12 '16 at 02:07
  • @ekhumoro: That is a good question! My (naive) thinking is that I have specified the _index_ `slice.start` as 100. – johnsyweb Nov 12 '16 at 02:18
  • My 2 cents: slicing is slicing, whether it happens on the LHS or the RHS of the equals sign. As mentioned in [this answer](http://stackoverflow.com/a/35632876/4014959) `a[1:4]=[1,2,3]` is equivalent to `a.__setitem__(slice(1,4,None), [1,2,3])`. So the unexpected behaviour is primarily due to the way an out of bounds slice is interpreted, not which side of the assignment it occurs on. – PM 2Ring Nov 12 '16 at 03:02
  • 3
    @Johnsyweb. An index refers to a specific element of a sequence, whereas a slice refers to a structural section of the sequence. In a list with three elements, the slice `[1:1]` refers to an empty section *between* elements. It does not refer to any specific, indexed element - and yet it is still possible to assign to it (effectively performing an insert operation). – ekhumoro Nov 12 '16 at 03:54
  • 1
    Note **4** of this section might help: https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange – elethan Nov 12 '16 at 04:02
  • @ekhumoro: Thank you for that clarification! – johnsyweb Nov 12 '16 at 05:14
  • @elethan: Thank you. https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange is the documentation I was looking for. This still seems like unwanted behaviour on assignment, though, for the reasons I've outlined in question. – johnsyweb Nov 12 '16 at 05:19
  • @PM2Ring: Thank you, too, for your commentary. I understand the mechanics of what's happening here. What I don't understand is why it's desirable behaviour to assign to the end of a list when a position *beyond* that has been specified. – johnsyweb Nov 12 '16 at 06:01
  • 1
    I agree that it is a little surprising, but it's consistent with "if the slice location is beyond the end of the list, extend the list". But _why_ it's like this instead of (say) raising IndexError, I have no idea. – PM 2Ring Nov 12 '16 at 06:09
  • 1
    The reasons for assignment to a slice not raising an exception are exactly the same as for accessing a slice without assignment. – Martijn Pieters Nov 14 '16 at 21:58
  • @MartijnPieters: Do you have a citation for these reasons? I've already tried to highlight why this question is different to the one to which you've closed this as a duplicate. – johnsyweb Nov 14 '16 at 22:18
  • 1
    @Johnsyweb: Guido van Rossum [can't quite remember](http://code.activestate.com/lists/python-dev/132599/). – Martijn Pieters Dec 03 '16 at 14:33
  • @MartijnPieters: Ha ha, excellent. That's exactly the kind of thing that I was looking for. Can you please reopen the question and add this as an answer so that I can accept it? – johnsyweb Dec 04 '16 at 21:44

2 Answers2

12

Let's see what is actually happening:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]
>>> l
['foo']

So the assignment was actually successful, and the item got placed into the list, as the first item.

Why this happens is because 100: in indexing position is converted to a slice object: slice(100, None, None):

>>> class Foo:
...     def __getitem__(self, i):
...         return i
... 
>>> Foo()[100:]
slice(100, None, None)

Now, the slice class has a method indices (I am not able to find its Python documentation online, though) that, when given a length of a sequence, will give (start, stop, stride) that is adjusted for the length of that sequence.

>>> slice(100, None, None).indices(0)
(0, 0, 1)

Thus when this slice is applied to a sequence of length 0, it behaves exactly like a slice slice(0, 0, 1) for slice retrievals, e.g. instead of foo[100:] throwing an error when foo is an empty sequence, it behaves as if foo[0:0:1] was requested - this will result on empty slice on retrieval.

Now the setter code should work correctly when l[100:] was used when l is a sequence that has more than 100 elements. To make it work there, the easiest is to not reinvent the wheel, and to just use the indices mechanism above. As a downside, it will now look a bit peculiar in edge cases, but slice assignments to slices that are "out of bounds" will be placed at the end of the current sequence instead. (However, it turns out that there is little code reuse in the CPython code; list_ass_slice essentially duplicates all this index handling, even though it would also be available via slice object C-API).

Thus: if start index of a slice is greater than or equal to the length of a sequence, the resulting slice behaves as if it is a zero-width slice starting from the end of the the sequence. I.e.: if a >= len(l), l[a:] behaves like l[len(l):len(l)] on built-in types. This is true for each of assignment, retrieval and deletion.

The desirability of this is in that it doesn't need any exceptions. The slice.indices method doesn't need to handle any exceptions - for a sequence of length l, slice.indices(l) will always result in (start, end, stride) of indices that can be used for any of assignment, retrieval and deletion, and it is guaranteed that both start and end are 0 <= v <= len(l).

  • 3
    This perfectly explains *what* is happening. What I don't understand is *why* it's desirable behaviour to assign to the end of a list when a position beyond that has been specified. – johnsyweb Nov 12 '16 at 08:44
  • 1
    Thanks for the updated answer. Why are exceptions undesirable? Consistency between assignment, retrieval and deletion is certainly desirable. – johnsyweb Nov 13 '16 at 01:20
5

For indexing, an error must be raised if the given index is out-of-bounds, because there is no acceptable default value that could be returned. (It is not acceptable to return None, because None could be a valid element of the sequence).

By contrast, for slicing, raising an error is not necessary if any of the indexes are out-of-bounds, because it is acceptable to return an empty sequence as a default value. And it also desirable to do this, because it provides a consistent way refer to subsequences both between elements and beyond the ends of the sequence (thus allowing for insertions).

As stated in the Sequence Types Notes, if the start or end value of a slice is greater than len(seq), then len(seq) is used instead.

So given a = [4, 5, 6], the expressions a[3:] and a[100:] both point to the empty subsequence following the last element in the list. However, after a slice assignment using these expressions, they may no longer refer to the same thing, since the length of the list may have been changed.

Thus, after the asignment a[3:] = [7], the slice a[3:] will return [7]. But after the asignment a[100:] = [8], the slice a[100:] will still return [], because len(a) is still less than 100. And given everything else stated above, this is exactly what one should expect if consistency between slice assignment and slice retrieval is to be maintained.

ekhumoro
  • 115,249
  • 20
  • 229
  • 336