Parallel assignment: a Python idiom cleverly optimized

Friday, May 15, 2020

Every programmer has had to swap variables. It's common in real programs and it's a frequently used example when people want to show off just how nice and simple Python is. Python handles this case very nicely and efficiently. But how Python handles it efficiently is not always clear, so we'll have to dive into how the runtime works and disassemble some code to see what's happening. For this post, we're going to focus on CPython. This will probably be handled differently in every runtime, so PyPy and Jython will have different behavior, and probably will have similarly cool things going on!

Before we dive into disassembling some Python code (which isn't scary, I promise), let's make sure we're on the same page of what we're talking about. Here's the common example of how you would do it in a language that's Not As Great As Python:

# assume we have variables x, y which we want to swap
temp = x
x = y
y = temp

Okay, so we've all seen that, what's the point, I'm closing the tab now. Well now we get to the part that's people trumpet as evidence of Python's great brevity. Look, Python can do it in one line!

# assume we have variables x, y which we want to swap
x, y = y, x

This method is known as parallel assignment, and is present in languages like Ruby, as well. This method lets you avoid a few lines of code while improving readability, because now we can quickly see that we're doing a swap, rather than having to look through the lines carefully to ensure the swap is ordered correctly. And, we might even save some memory, depending on how this is implemented! If you followed the link to the Wikipedia article about parallel assignment, you'll see the following:

This is shorthand for an operation involving an intermediate data structure: in Python, a tuple; in Ruby, an array.

A very similar statement is made in Effective Python (a great book to read together as a team, by the way!), where the author states that a tuple is made for the right-hand side, then unpacked into the left-hand side.

This makes sense, but it isn't the whole story, which gets far more fascinating. But first, we need to know a little about how the Python runtime works.

Inside the Python runtime (remember that we're talking about CPython specifically, not Python-the-spec), there's a virtual machine and the runtime compiles code into bytecode which is then run on that virtual machine. Python ships with a disassembler you can use, and it provides handy documentation listing all the available bytecode instructions. Another thing to note is that Python's VM is stack based. That means that instead of having fixed registers, it simply has a memory stack. Each time you load a variable, it pushes onto the stack; and you can pop off the stack. Now, let's use the disassembler to take a look at how Python is actually handling this swapping business!

First, let's disassemble the "standard" swap. We define this inside a function, because we have to pass a module or a function into the disassembler.

def swap():
  temp = x
  x = y
  y = temp

This doesn't do anything useful, because it just swaps them in place. We didn't even declare the variables anywhere, so this has no chance of ever actually running. But, because Python is a beautiful language, we can go ahead and disassemble this anyway! If you've defined that in your Python session, you can then import dis and go ahead and disassemble it:

>>> dis.dis(swap)
  2           0 LOAD_FAST                0 (x)
              2 STORE_FAST               1 (temp)

  3           4 LOAD_FAST                2 (y)
              6 STORE_FAST               0 (x)

  4           8 LOAD_FAST                1 (temp)
             10 STORE_FAST               2 (y)
             12 LOAD_CONST               0 (None)
             14 RETURN_VALUE

Stepping through this, you can see that first we have a LOAD_FAST of x which puts x onto the top of the stack. Then STORE_FAST pops the top of the stack and stores it into temp. This general pattern repeats three times, once per line of the swap. Then, at the end, we load in the return value (None) and return it. Okay, so this is about what we'd expect. Barring some really fancy compiler tricks, this is analogous to what I'd expect in any compiled language.

So let's take a look at the version that is idiomatic.

def swap():
  x, y = y, x

Once again, this isn't doing anything useful, and Python miraculously lets us disassemble this thing that would never even run. Let's see what we get this time:

>>> dis.dis(swap)
  2           0 LOAD_FAST                0 (y)
              2 LOAD_FAST                1 (x)
              4 ROT_TWO
              6 STORE_FAST               1 (x)
              8 STORE_FAST               0 (y)
             10 LOAD_CONST               0 (None)
             12 RETURN_VALUE

And here is where we see the magic. First, we LOAD_FAST twice onto the stack. If we just go off the language spec, we'd now expect to form an intermediate tuple (the BUILD_TUPLE command is what does this, and from its absence we know that we aren't building a tuple here the way you would with x = (1,2)). On the contrary, you see... ROT_TWO! This is a cool instruction which takes the top two elements of the stack and "rotates" them (a math term for changing the order of things, kind of shifting everyone along with one moving from the front to the back). Then we STORE_FAST again, twice, to put it back into the variables.

The question now might be, "why do we even need ROT_TWO? Why can't we simply change the order we store them to achieve the same effect?" This is because of how Python has defined its semantics. In Python, variables on the lefthand side of an expression are stored in order from left to right. The righthand side also has these left-to-right semantics. This matters in case like assigning to both an index and a list:

a = [0, 0]
i = 0
i, a[i] = 1, 10

If you didn't define the semantics, the result above would be ambiguous: will a be [0, 10] or [10, 0] after running this? It will be [0, 10] because we assign from left to right. Similar semantics apply on the righthand side for the comma operator, and the end result is that we have to do something in the middle to ensure we adhere to these semantics by changing the order of the stack.

So, at the end of the day, there you have it. Parallel assignment, or swapping without another variable, does not use any extra honest-to-goodness tuples or anything under the hood in Python. It does it through a clever optimization with rotating the top of the stack!

Update 5/16: I made a few edits to make the article clearer and avoid distracting from the content by implying/stating that people were wrong, and making certain things clearer (focus on CPython, focus on implementation vs. spec, etc.).