The course, Deployment of Machine Learning Models is now live on Udemy

Using Python copy

A Basic Guide
07 January 2017

Python copy

Introduction

The background to copy in the docs makes the point:

Assignment statements in Python do not copy objects, they create bindings between a target and an object

This is a super important Python concept. Instead of variables (in the classic sense), Python has names and bindings. When you do:

foo = 1

Then you are binding the name foo to the object 1. To check that 1 really is an object, just enter dir(1) and look all the attributes and methods available. So if you then say:

bar = foo

Then bar is bound to the same object as foo. In practice, this only really matters when it comes to mutable objects, basically lists and classes. With integers, floats, tuples and strings, things behave intuitively:

foo = 1 
bar = foo
print(bar)
# >>> 1
foo = 2
print(foo)
# >>> 2
print(bar)
# >>> 1
# expected behavior - bar behaves as a copy

But this will cause you pain when it comes to to mutable objects (note that strings and integers are not mutable):

foo = [1, 2, 'baz']
bar = foo
print(bar)
# >>> [1, 2, 'baz']

foo[2] = 'pepper steak'
print(bar)
# >>> [1, 2, 'pepper steak']
# GOTCHA - the bar value has *also* changed because it is bound to the same mutable object

Enter the copy library, which allows us to achieve the behavior we want

There are two key types of copying to understanding: “deep” and “shallow”, let’s explore them:

Shallow Copy

By default the copy.copy() method returns a shallow copy.

In our example above, if we made the following adjustment:

from copy import copy
foo = [1, 2, 'baz']
bar = copy(foo)
print(bar)
# >>> [1, 2, 'baz']
foo[2] = 'peper steak'
print(bar)
# >>> [1, 2, 'baz']
# This time bar is a copy as we expect

Deep Copy

At first glance, deepcopy and copy have similar behavior. However, when working with nested structures, you may find yourself in need of a deepcopy. A “shallow” copy of a nested object (or compound object) creates a new copy and inserts references to the original. “References” means that things can go wrong. On the other hand,

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

No referencing. Full whack copies.

Time for code!

If you have list of lists:

import copy
nest_list = [1, 2, ['nested1', 'nested2']]
shallow = copy.copy(nest_list)
deep = copy.deepcopy(nest_list)

print(shallow)
# >>> [1, 2, ['nested1', 'nested2']]
print(deep)
# >>> [1, 2, ['nested1', 'nested2']]
nested_list[0] = 9
print(shallow)
# >>> [1, 2, ['nested1', 'nested2']]
print(deep)
# >>> [1, 2, ['nested1', 'nested2']]
nested_list[2][0] = 'nested9'
print(shallow)
# >>> [1, 2, ['nested9', 'nested2']]
# CHANGED
print(deep)
# >>> [1, 2, ['nested1', 'nested2']]
# STAYED THE SAME

So if you need a reliable copy of a compound object (most likely to be a class) with lots of nested attributes, then your best bet is deepcopy

Category