Chapter 9: Dictionaries

The idea behind a dictionary is straightforward. Start by thinking about an actual dictionary book. The words and definitions are indexed together in a single document. In fact, you can probably imagine it as a list. Each word and definition can make up a string, meaning the value at index 0 will be something like "a: the first letter in the English language." However, it would be nice to think of the index as the word itself, and the value as the definition for that word. For example, instead of the 0th index, what about the "a" index? This is how a real dictionary is used. We look up by word instead of by the index of the word. Instead of englishDictionary[0], we could instead refer to englishDictionary["a"], and get definitions based on a much more intuitive indexing system. This is the idea behind Python dictionaries; retrieve values based on keys that have more meaning than a collection of numbers.

This chapter will provide an overview of the situations where you might want to use a dictionary instead of a list, along with some of the more subtle details between lists and dictionaries. These data structures share many similarities, but although they are both extremely useful, you must be aware of the differences between one and the other.

Keys and values

Lists collect data by placing everything inside of a single linear structure and retrieving values by index. If a set of ten pieces of data are stored in a list, we expect that the list will be ten elements long and that the order of the elements will correspond to the order that we inserted them. Of course, this order can be modified by sorting or reversing the list. In essence, a list is just a linear collection of data.

In a dictionary, the index must be explicitly provided when inserting a value. When a value is placed into a dictionary, the index, also known as the key, is specified by which the value can be retrieved later. These index and data pairs are known as key-value pairs, as each value can be retrieved with a certain key. In a real dictionary from the bookstore, each word in the dictionary has an associated definition. In this paradigm, each word is a key, and the definition for that word is the value.

Dictionaries are identified using curly-braces, and key-value pairs are separated using a colon. For example, to set up a simple dictionary about me, I could write the following Python code:

my_dictionary = {
  first_name": "Alexander",
  last_name": "Coder",
  birth_year": 1979,
  birth_month": 12,
  birth_day": 27
}
print(my_dictionary)

{'birth_day': 27, 'first_name': 'Alexander', 'last_name': 'Coder',
'birth_month': 12, 'birth_year': 1979}

print(my_dictionary["first_name"])

Alexander

Unlike a list, the values inside a dictionary aren't referenced by a number index. We would never expect to be asked to look up the 40,000th word in the English dictionary. Each value has an associated key, and to get the value, you use the same syntax as you used with lists, only with the actual key that had been originally associated with the value.

Values in a dictionary can be any type. In a dictionary, strings, numbers, lists, and even other dictionaries can be stored. However, keys are a separate matter. Keys must be immutable values. A key can be a string or a number, but it can't be a list because a list isn't guaranteed to be static. Let's see what that means.

>>> x = {}
>>> a = [0, 1, 2]
>>> x[a] = True
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
    x[a] = True
TypeError: unhashable type: 'list'

An unhashable type error indicates an attempt to use a mutable type, such as a list, as a key for the dictionary. The fact that the list can be modified later means that there is no guarantee of knowing the actual value of the key at any given point in time. If you use the key False, that key will never change. False is immutable, so False will always be False. A list can be changed, and so it is an unacceptable dictionary key.

Adding new elements to a dictionary is a simple matter. To add a new element, just write a statement setting the new dictionary value for a certain key to that value. There isn't an append function, as the append concept doesn't even really make sense when you're working with key-value pairs. For example, when we decide that "blog" is a new word that warrants addition to the English dictionary, we don't append it to the end of the dictionary. The same idea applies here.

>>> print(my_dictionary)
{'first_name': 'Alexander', 'last_name': 'Coder', 'birth_month': 12, 'birth_day': 27, 'birth_year': 1979}
>>> my_dictionary["birth_town"] = "Kingston"
>>> print(my_dictionary)
{'first_name': 'Alexander', 'last_name': 'Coder', 'birth_town': 'Kingston',
'birth_month': 12, 'birth_day': 27, 'birth_year': 1979}

To remove an element, the del function is used along with the key from the key-value pair to delete.

>>> print(my_dictionary)
{'first_name': 'Alexander', 'last_name': 'Coder', 'birth_town': 'Kingston',
'birth_month': 12, 'birth_day': 27, 'birth_year': 1979}
>>> del(my_dictionary["birth_town"])
>>> print(my_dictionary)
{'first_name': 'Alexander', 'last_name': 'Coder', 'birth_month': 12,
'birth_day': 27, 'birth_year': 1979}

A for loop can be used with a dictionary to iterate over all of the elements in the data structure. However, it works differently with a dictionary than with a list, in that the actual elements you end up getting in the loop are the keys. It sounds strange, but let's examine why this is true.

for key in my_dictionary:
    print("{0}: {1}".format(key, my_dictionary[key]))

birth_day: 27
first_name: Alexander
last_name: Coder
birth_month: 12
birth_year: 1979

Instead of getting the dictionary values back in key, you get each of the key index values. The actual value from the dictionary can be obtained by using the key, obviously, but it's an important distinction to be aware of. With a list, you get the opposite effect, and if you recall, we had to go through some hoops to retrieve the indices. A dictionary is a data structure similar to a list, except for the fact that you might actually care more about the keys than the values themselves.

A fact that follows from this is that sorting a dictionary works in a different way than sorting a list. Think back to the analogy of working with an actual English dictionary, and how you'd go about sorting those values. There aren't really any cases where you would want to sort on the definition of the word, since you're never looking words up based on a definition that an editor has provided. For example, the word "python" could be defined as "a large constricting snake" or "name for a large nonvenomous snake" and each definition would show up in a different location after a definition sort. You want a sorted list of words. With a dictionary, that means that we want a sorted list of the keys.

>>> print(my_dictionary)
{'first_name': 'Alexander', 'last_name': 'Coder', 'birth_month': 12,
'birth_day': 27, 'birth_year': 1979}
>>> sorted(my_dictionary)
['birth_day', 'birth_month', 'birth_year', 'first_name', 'last_name']

Note that the value returned from sorted is a list instead of a dictionary. If you ask for a sorted dictionary, you get a sorted list where the values are the keys. Rather than sorting on the elements of my birthdate and my name, the sorted function returns an alphabetized list of the keys that are present in the my_dictionary variable.

for key in sorted(my_dictionary):
    print("{0}: {1}".format(key, my_dictionary[key]))

birth_day: 27
birth_month: 12
birth_year: 1979
first_name: Alexander
last_name: Coder

This is a particularly important point, especially considering the fact that the other large data structures seen so far return sorted data structures of the same type. A dictionary is inherently unsortable, and this is kind of unsatisfying at first glance. It seems to make a lot of sense to sort the keys. This isn't the goal of the dictionary data structure. In Python, a dictionary is designed to allow you to look up values extremely fast. If a programmer is doing thousands of these lookups, the dictionary data structure will operate extremely efficiently.

The individual keys and values can be isolated from a dictionary using functions like sorted. Python has a number of related built-in functions that can be used.

>>> my_dictionary.keys()
['birth_day', 'first_name', 'last_name', 'birth_month', 'birth_year']
>>> my_dictionary.values()
[27, 'Alexander', 'Coder', 12, 1979]
>>> my_dictionary.items()
[('birth_day', 27), ('first_name', 'Alexander'),
    ('last_name', 'Coder'), ('birth_month', 12), ('birth_year', 1979)]

And as we saw with lists, the max and min functions are also available. You might guess (correctly) that the max and min functions operate on the keys instead of the values.

>>> max(my_dictionary)
'last_name'
>>> min(my_dictionary)
'birth_day'

Practical dictionaries

Let's build a more complicated address book application. In the previous chapter, we built an address book program that used a list of lists to keep track of our contacts. This time, we'll use a list to hold our contacts, but each contact will be represented by a dictionary. The benefit is that we know which index is the name and which is the phone number. If we add more elements to the dictionary later, we don't have to fidget around with numeric indices that might have changed because more data was added. For example, let's say we add an element to the address book list that gives us the person's middle name.

>>> old_contact = ["Alexander", "Coder", 5551234, "Kingston"]
>>> new_contact = ["Alexander", "Andrew", "Coder", 5551234, "Kingston"]

Now any code that previously referenced the 2nd index as the phone number will need to be changed! If we miss a single spot, our code won't work properly anymore. Rage.

Let's try it with a dictionary. Each contact will consist of a first name, a last name, a phone number, and a home town. We can add more later, but for now, let's build a list of contact dictionaries.

contacts_obj = [
  {"first_name": "Alexander",
   "last_name": "Coder",
   "phone": 5551763,
   "town": "Kingston"},
  {"first_name": "Michael",
   "last_name": "Jordan",
   "phone": 5558722,
    "town": "Chicago"},
  {"first_name": "Elaine",
   "last_name": "Benes",
   "phone": 5551915,
   "town": "New York"},
  {"first_name": "Tobias",
   "last_name": "Fünke",
   "phone": 5556742,
   "town": "Newport Beach"},
]

In the data structure we've just created, each contact is represented by a dictionary with four elements. The keys are explicitly named, so while the values in each dictionary may change, each dictionary shares the same set of keys. Three of the values are strings, while phone is a number.

The searching examples from the city program in the previous chapter can be reused here, but we'll need to change the index from a number to an actual key.

user_input = input("Enter a last name: ")
for contact in contacts_obj:
    if contact["last_name"].lower() == user_input.lower():
        print("{0} {1} ({2}): {3}".format(
            contact["first_name"], contact["last_name"],
            contact["town"], contact["phone"]))

Enter a last name: Coder
Alexander Coder (Kingston): 5551763

There isn't much that looks different so far, aside from the indices, and you might be asking yourself why we even bothered. If we were using lists, and we wanted to append a new value like the street that the contact lived on, we could just add it to the end of each contact's list and reference the new index. That's true, and for smaller examples, it might not make a huge difference. As you program larger pieces of code, you run into the situation where you've added so many new components that you have something like this:

user = ["Alexander", "Coder", 5551065, "Kingston", "42 Elm St.", 30,
  12, 27, 1979, "alexander@scootah.com", 545, 15, 1, 0, False]

You can probably guess what a few of those values are, but what about the ones on the end? It gets awfully confusing when you don't know that I have one cat, no dogs, and have never been to Antarctica. (Okay, I might be stretching on justifying those last three values, but if your program wanted to track them, you'd have no real way of knowing aside from copious comments.) As programs grow in size, you'll want some way to keep track of things. Even the order can be jumbled. Why is my street address after the city in the list? That's not how an address book usually does it. With a dictionary, the order doesn't matter. The keys are important, and you can change the order of the keys without affecting the final data structure.

Our program shouldn't require the user to change Python code to change the search key. We don't want to deliver a program that needs a code change to switch from first name searches to phone numbers! Let's modify the program we just wrote so that the user can specify the key they'd like to search on, and then let them do the search themselves.

valid_keys = []
for key in contacts_obj[0].keys():
    valid_keys.append(key)
key_choice = ""
while key_choice != "quit":
    print("Available searches: {0}".format(valid_keys))
    key_choice = input("Enter the type of search you would like to perform, or type quit: ")
    if key_choice in valid_keys:
        value_choice = input("Enter the value you would like to search for: ")
        for contact in contacts_obj:
            if str(contact[key_choice]).lower() == value_choice.lower():
                print("{0} {1} ({2}): {3}".format(
                contact["first_name"], contact["last_name"],
                contact["town"], contact["phone"]))

Available searches: ['town', 'phone', 'first_name', 'last_name']
Enter the type of search you would like to perform, or type quit: town
Enter the value you would like to search for: Kingston
Alexander Coder (Kingston): 5551763

Available searches: ['town', 'phone', 'first_name', 'last_name']
Enter the type of search you would like to perform, or type quit: first_name
Enter the value you would like to search for: Michael
Michael Jordan (Chicago): 5558722

Available searches: ['town', 'phone', 'first_name', 'last_name']
Enter the type of search you would like to perform, or type quit: quit

This is a relatively large program compared to the ones that we've seen, so let's break it down. We have a list of dictionaries where each individual dictionary is a contact to keep track of. Each dictionary has a set of keys corresponding to information about the contacts. We need to recall information about each of our contacts using these keys. The first thing our new program does is examine the first entry in our contact list to retrieve the keys one by one. We'll use this later, when the user provides the key that they'd like to search on. The valid_keys variable is a list of strings, where each string is a key in the contact dictionary.

>>> print(valid_keys)
['town', 'phone', 'first_name', 'last_name']

The key_choice variable is used to keep track of either the key that the user wants to do a search on, or the request to quit. If at any time, the user enters the string "quit" for the search parameter, our code knows to exit out of the while loop. Otherwise, key_choice is going to reference a key in the dictionary. If it doesn't, it's no big deal; we have a line of code that checks whether or not the key that the user requests is in valid_keys. If the string isn't in that list, it's not a key that's been defined in the contact data structure, and we don't consider it valid.

The value_choice variable holds the actual value that the user would like to search for, whether it's a city that they live in, or the contact's phone number. Once value_choice has been set, each element in contacts_obj is checked to see if the key's value matches, and if so, it is printed. This part is no different than the previous examples. It only relies on the fact that the key itself can be a string. We can store that string value in a variable, even if it's one that we got from the user.

The lower function is still present in the comparison, but a string conversion to the value in the dictionary has been added. The reason for this is that we're not guaranteed to only be looking at string values any more. For example, consider a phone query.

Available searches: ['town', 'phone', 'first_name', 'last_name']
Enter the type of search you would like to perform, or type quit: phone
Enter the value you would like to search for: 5551763
Alexander Coder (Kingston): 5551763

Searching for a number is actually searching for a string representation of that number, due to the semantics of input. The input function itself doesn't know whether you're asking for a string or a number, so it simplifies things by making everything a string. If you want that string to be a number, you wrap it in an int or a float function. At the simplest state, when you call input, you're asking for a string value to be returned. If we use this knowledge, we can convert any value in the contact dictionary into a string to do the comparison that way. The lower function doesn't affect numbers, so it doesn't cause any problems here.

If your address book is anything like mine, you don't always have a full set of information about every contact in the list. Some people have cell phone numbers, whereas others prefer to stick to a landline. Some of my friends have even moved recently, and I'd rather have no address on file than an incorrect one. So what happens when we have keys in certain dictionaries that don't exist in others? If I know that I have a cell phone number but don't know if other contacts do, how can I make sure that cell phone numbers are still searchable elements in my program? In addition to this, how do I know that the first contact in the list is an accurate representation of the keys that we'll find in contact_obj?

Let's extend valid_keys. What we'd like to do is look at every key in every dictionary, and where there's a key, there's search functionality. There is a function in Python called get that returns the value if it exists for the key that you specify, or None if it doesn't.

>>> a = {}
>>> print(a.get("Test"))
None
>>> a["Test"]
Traceback (most recent call last):
File "<pyshell#42>", line 1, in <module>
            a["Test"]
KeyError: 'Test'

This is useful, because now dictionaries in contact_obj can define keys in certain places and omit them in others. By using get, we don't have to worry about KeyError when asking for keys that don't exist. Let's strip down the contact_obj variable to test this out.

contacts_obj = [
    {"first_name": "Alexander",
     "last_name": "Coder",
     "phone": 5551763,
     "town": "Kingston"},
    {"first_name": "Michael"},
    {"first_name": "Elaine",
     "last_name": "Benes"},
    {"first_name": "Tobias",
     "town": "Newport Beach"},
]

valid_keys = []
for key in contacts_obj[0].keys():
    valid_keys.append(key)
key_choice = ""
while key_choice != "quit":
    print("Available searches: {0}".format(valid_keys))
    key_choice = input("Enter the type of search you would like to perform, or type quit: ")
    if key_choice in valid_keys:
        value_choice = input("Enter the value you would like to search for: ")
        for contact in contacts_obj:
            if str(contact.get(key_choice)).lower() == value_choice.lower():
            print("{0} {1} ({2}): {3}".format(
                contact.get("first_name"), contact.get("last_name"),
                contact.get("town"), contact.get("phone")))

In addition to removing a lot of information from contacts_obj, the dictionary indices in the later parts of the program have been changed from direct index calls to get function calls. This changes the if-statement significantly, and a missing phone key won't prevent the if-statement from being able to make a comparison.

The print statement has also been modified. If we don't have information about the phone number or the last name, our program won't crash. By using the get function, we ensure that our data is always available, even if there isn't any meaningful data to use.

Available searches: ['town', 'phone', 'first_name', 'last_name']
Enter the type of search you would like to perform, or type quit: first_name
Enter the value you would like to search for: Michael
Michael None (None): None

There is still a problem with this code. The valid_keys list doesn't necessarily reflect the state of all available keys in the dictionaries. Let's imagine that we know Michael's cell phone number. The contacts_obj variable changes to this:

contacts_obj = [
    {"first_name": "Alexander",
     "last_name": "Coder",
     "phone": 5551763,
     "town": "Kingston"},
    {"first_name": "Michael",
     "cell_phone": 5559955},
    {"first_name": "Elaine",
     "last_name": "Benes"},
    {"first_name": "Tobias",
     "town": "Newport Beach"},
]

The previous code won't catch the cell_phone key, because it's not defined in contacts_obj[0]. If I don't have a cell phone, valid_keys won't know that a cell_phone key is available to search on. Let's change that subset of the code to look at every contact, picking out each of the keys that can be searched one-by-one.

valid_keys = []
for contact in contacts_obj:
    for key in contact.keys():
        if key not in valid_keys:
            valid_keys.append(key)

Available searches: ['town', 'phone', 'first_name', 'last_name', 'cell_phone']
Enter the type of search you would like to perform, or type quit:

Instead of looking at only the first element in contacts_obj, iterate over each element one-by-one and add every key that hasn't already been added to valid_keys. This new structure becomes our valid key list, and the cell_phone key shows up as an option for searching.

This is exactly the kind of benefit that shows the real power of indexing by key instead of by an abstract numerical index. If we know a new piece of information about a contact, it can just be added. Maybe it's a birthplace, or maybe it's the name of their second child. It doesn't matter because we can add it in through a unique key. In doing so, it becomes an indexable element in the original data structure.

Here's an explicit demonstration of this technique, for our feline friends.

contacts_obj = [
    {"first_name": "Alexander",
     "last_name": "Coder",
     "phone": 5551763,
     "town": "Kingston"},
    {"first_name": "Michael",
     "cell_phone": 5559955},
    {"first_name": "Elaine",
     "last_name": "Benes"},
    {"first_name": "Tobias",
     "town": "Newport Beach"},
    {"first_name": "Ulysses",
     "species": "Cat"},
]

If we simply append a new element with a new key, how does that affect the output of the original program?

Available searches: ['town', 'phone', 'first_name', 'last_name', 'cell_phone', 'species']
Enter the type of search you would like to perform, or type quit: species
Enter the value you would like to search for: cat
Ulysses None (None): None

This new contact dictionary also has an entirely new key, but none of the other expected keys for use with the print statement. Fortunately, this is completely fine, thanks to the robust construction of our program. It won't crash due to a new piece of data and it won't crash if data is missing that might be expected in print. Things can always be modified later to handle new data, but in the meantime, we can rest assured that our program will run without crashing.

Complex dictionaries

By this point, you probably have a fair grasp of the values you can use as keys, and the values you can store in a dictionary using those keys. So far, the only values that we've stored in a dictionary have been strings and numbers. There is no reason that lists and dictionaries can't be placed in there too though, so let's show that by example.

Let's rebuild the contact list so each of the individuals has a nickname. We'd like to be able to index contacts_obj by nickname instead of identifying someone by the opaque contacts_obj[0] notation.

First, contacts_obj will need to be converted to a dictionary. As we have already seen, this allows for indexing by arbitrary keys, which is exactly what we're looking for. Next, we'll define a unique string for each of the contacts we're tracking. For each of the contacts in the list, a key will be specified.

contacts_obj = {
    "alexander": {"first_name": "Alexander",
     "last_name": "Coder",
     "phone": 5551763,
     "town": "Kingston"},
    "mike": {"first_name": "Michael",
     "cell_phone": 5559955},
    "laney": {"first_name": "Elaine",
     "last_name": "Benes"},
    "doc": {"first_name": "Tobias",
     "town": "Newport Beach"},
    "uly": {"first_name": "Ulysses",
     "species": "Cat"},
      }

nickname = input("Enter the contact you would like to see: ")
if nickname in contacts_obj:
    print("Contact: {0}".format(nickname))
    contact = contacts_obj[nickname]
    for key in contact:
        print("    {0}: {1}".format(key, contact[key]))
    else:
        print("Sorry, I don't know {0}!".format(nickname))

Enter the contact you would like to see: alexander
Contact: alexander
    town: Kingston
    phone: 5551763
    first_name: Alexander
    last_name: Coder
Enter the contact you would like to see: steve
Sorry, I don't know steve!

Now see if you can use some dictionaries in your own examples. Are you able to find some creative ways of using this new data structure in your own applications?

Breaking Stuff

Dictionaries differ from lists in many ways. One of the most fundamental ways, despite what might seem obvious from the way they are written, is the fact that the elements are not ordered. For example, if you create a dictionary that uses the numbers from 0 to 9 as keys, they are not necessarily analogous to indexes in a list. Specifically, it doesn't make sense to talk about the 3rd key in a dictionary.

Now, with that said, you are free to impose an ordering on the keys. You can get the list of keys, as we did in this chapter, and sort it in some way. Still, that is a specific sort imposed by the programmer, and not an inherent part of the data structure itself.

d = {1: True, 0: False}
keys = d.keys()
print(keys)
print(sorted(keys))

[0, 1]
[0, 1]

In this case, the order of the keys variable is different from the order that the dictionary was originally written. You might intuitively assume that keys would be [1, 0], since a list is an ordered data structure. Remember that there is no such restriction on the keys of a dictionary; there is no order among keys. Either a key is set, or it is not.

In addition, remember that the keys in a Python dictionary must be immutable values, as stated earlier. However (and there's usually a however!), the keys don't all have to be the same type. They just have to be immutable. For example, take a look at this dictionary.

d = {1: True, 0: False, "True": 1, "False": 0, "1": True, "0": False}
keys = d.keys()
print(keys)
print(sorted(keys))

[0, 1, 'False', '1', '0', 'True']
[0, 1, '0', '1', 'False', 'True']

Although it's possible to sort the list, any assumption that we make about the set of keys other than the fact that they are immutable values may lead us to break stuff. Treating the keys as sequential integers, for example, will give errors when we hit the strings.

Summary

Dictionaries represent the hash table, another fundamental and extremely useful concept in computer programming. While hash tables come with their own set of complex considerations, Python abstracts away a lot of those details to provide a convenient way to store values by associating them with a key.