There’s another issue that isn’t pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty
bytes, and some short strings (in CPython 3.6, it’s the 256 single-character Latin-1 strings). For example:
>>> a = () >>> b = () >>> a is b True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257 >>> d = 257 >>> c is d False >>> e, f = 258, 258 >>> e is f True
And this isn’t limited to
>>> g, h = 42.23e100, 42.23e100 >>> g is h True
Obviously, CPython doesn’t come with a pre-created
float value for
42.23e100. So, what’s going on here?
The CPython compiler will merge constant values of some known-immutable types like
bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since
d are defined in separate statements, their values aren’t merged. Since
f are defined in the same statement, their values are merged.
You can see what’s going on by disassembling the bytecode. Try defining a function that does
e, f = 128, 128 and then calling
dis.dis on it, and you’ll see that there’s a single constant value
>>> def f(): i, j = 258, 258 >>> dis.dis(f) 1 0 LOAD_CONST 2 ((128, 128)) 2 UNPACK_SEQUENCE 2 4 STORE_FAST 0 (i) 6 STORE_FAST 1 (j) 8 LOAD_CONST 0 (None) 10 RETURN_VALUE >>> f.__code__.co_consts (None, 128, (128, 128)) >>> id(f.__code__.co_consts, f.__code__.co_consts, f.__code__.co_consts) 4305296480, 4305296480, 4305296480
You may notice that the compiler has stored
128 as a constant even though it’s not actually used by the bytecode, which gives you an idea of how little optimization CPython’s compiler does. Which means that (non-empty) tuples actually don’t end up merged:
>>> k, l = (1, 2), (1, 2) >>> k is l False
Put that in a function,
dis it, and look at the
1 and a
(1, 2) tuples that share the same
2 but are not identical, and a
((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There’s one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn’t restricted to source code literals:
>>> m = 'abc' >>> n = 'abc' >>> m is n True
On the other hand, it is limited to the
str type, and to strings of internal storage kind “ascii compact”, “compact”, or “legacy ready”, and in many cases only “ascii compact” will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it’s not worth relying on them in your code. The only safe rule is:
- Do not write code that assumes two equal but separately-created immutable values are identical (don’t use
x is y, use
x == y)
- Do not write code that assumes two equal but separately-created immutable values are distinct (don’t use
x is not y, use
x != y)
Or, in other words, only use
is to test for the documented singletons (like
None) or that are only created in one place in the code (like the
_sentinel = object() idiom).