Chapter 8: Lists

Variables are the memory of Python. Large numbers of variables can be created, and as long as you know what belongs where, you can hold a great amount of data. This can become problematic when we have large sets of related data though, like a collection of student names enrolled in a class. The list data structure aggregates this information into a single place, so rather than forcing you to define name1, name2, name3, and so on, you can declare a list variable and add each name into that structure one-by-one. Think about a class list, or even a phone book. A list is a data structure, like a bucket, that lets you add and remove elements to it as you see fit. This chapter will explain how these collections are created and accessed, and will explain a number of situations where you might like to put them to good use.

Collections of data

A list is one of the fundamental data types in Python. It exists to manage collections of data that are not bounded to a certain size and that may actually consist of different types. Lists require a minimal amount of management from a programming standpoint. Thanks to their ease of use, they are bound to become one of the most useful tools in your programming toolbox. To begin the introduction, let's start by revisiting strings.

The individual characters in a string are accessed by their index in the string value. For example, we might have code that looks like this:

name = "Alexander"
print("The first letter is {0}.".format(name[0]))
print("The second letter is {0}.".format(name[1]))

The first letter is A.
The second letter is l.

Each position in the string holds a character and is accessed with an index value. By placing the integer index value inside the square brackets after a string, we get the character associated with that index. What if we extend the paradigm to allow us to examine values that are more complicated than a single character? It turns out that this is exactly the type of thing that we can use a list for.

country_names = ["Canada", "USA", "Mexico"]
print("The first country is {0}.".format(country_names[0]))
print("The second country is {0}.".format(country_names[1]))
print("The third country is {0}.".format(country_names[2]))

The first country in the list is Canada.
The second country in the list is USA.
The third country in the list is Mexico.

A list is defined with the square brackets, and individual elements inside the list object are separated by a comma. The actual elements in the list can be any type, including other lists. In the example above, we have three string values in our list, representing three country names. These values could just as easily be numbers, booleans, or a mix of all the types. To keep things easy, you'll often find that a list is made up of all strings, or all numbers, but no such restriction actually exists in the language.

We can also use the built-in len method to get the length of our list, just as we can use it to get the length of a string. One way to use this is to iterate over the elements of a list by combining len with the range function. Since we know the maximum number of elements inside the list, we'd like to get the range of numbers starting at zero and ending at the maximum index of the list.

country_names = ["Canada", "USA", "Mexico"]
print("There are {0} countries in the list.".format(len(country_names)))

There are 3 countries in the list.

counter = 0
while counter < len(country_names):
    print("The country at index {0} is {1}.".format(counter, country_names[counter]))
    counter = counter + 1

The country at index 0 is Canada.
The country at index 1 is USA.
The country at index 2 is Mexico.

for counter in range(len(country_names)):
    print("The country at index {0} is {1}.".format(counter, country_names[counter]))

The country at index 0 is Canada.
The country at index 1 is USA.
The country at index 2 is Mexico.

We can also use for loops to actually iterate directly over the elements of the list, if we're not concerned with the index number. For example, in the same way that range gives us the numbers from a low value up to a high value, a for loop that uses a list can iterate over values from the start of a list up to the end of a list.

country_names = ["Canada", "USA", "Mexico"]
for country in country_names:
    print("This country name is {0}.".format(country))

This country name is Canada.
This country name is USA.
This country name is Mexico.

age_list = [18, 15, 30, 24, 21, 27, 19]
for age in age_list:
    print("This student is {0} years old.".format(age))

This student is 18 years old.
This student is 15 years old.
This student is 30 years old.
This student is 24 years old.
This student is 21 years old.
This student is 27 years old.
This student is 19 years old.

For reference, we can use strings in a similar way. The elements inside a string (the characters themselves) can be indexed one by one using a for loop.

for x in "Alexander":
    print("The next letter is {0}.".format(x))

The next letter is S.
The next letter is c.
The next letter is o.
The next letter is t.
The next letter is t.

Empty list values are declared in Python using an empty pair of square brackets. These lists have a length of zero, and are similar to the empty string in that even though they are empty and hold no data, they are clearly still a value of that type. Adding elements to a list variable is typically done using the append function. As an example, we can combine append with user input to start collecting larger sets of data.

country_list = []
done = False
while done == False:
    country = input("Enter a country name: ")
    if len(country) == 0:
        done = True

for country in country_list:
    print("Country: {0}".format(country))

print("All done!")

Enter a country name: Canada
Enter a country name: USA
Enter a country name: Mexico
Enter a country name:
Country: Canada
Country: USA
Country: Mexico
All done!

In this example, an empty list is stored in country_list. A boolean variable called done is defined that will be set to True, indicating that we should terminate the user input loop only when the user enters in an empty string as a country name. An empty string has a length of zero, as mentioned above, and the only input that will match is when the user hits enter without typing in a country name. If the input has a length that is greater than zero, we append it to country_list variable using the append method, and restart the user input loop. Once all of the data is collected, a for loop is used to print out each of the country names in the order that they were received.

Let's take a closer look at the in keyword, used above in the for loops that use lists to iterate over values. The in keyword actually evaluates to a boolean in an expression, and if you have a preexisting list with values in it already, you can use in to test membership. Specifically, in returns True or False depending on whether or not an item is present in the list.

>>> country_list = ["Canada", "USA", "Mexico"]
>>> "Canada" in country_list
>>> "Japan" in country_list

These expressions can be combined with if-statements. We already know that if-statements use expressions to test whether or not blocks of code should be executed. Let's say that we had a list of acceptable answers from the user, and that we only wanted to go ahead if they gave us some input that was contained in this list.

vowel_list = ["a", "e", "i", "o", "u"]
letter = input("Enter a letter: ")
if letter in vowel_list:
    print("The letter {0} is a vowel.".format(letter))
    print("The letter {0} is a consonant.".format(letter))

Enter a letter: e
The letter e is a vowel.

Enter a letter: h
The letter h is a consonant.

The corresponding if-statement would be a little long, and we definitely save some space by using a list alongside the in keyword to handle a range of non-consecutive values. The vowels fit nicely in a list, and while it may be controversial to omit "y", we still manage to encapsulate all the common vowels clearly in a single data structure.

Python provides the append function for additions to lists, and has a related remove function to strip items out of the list.

vowel_list = ["a", "e", "i", "o", "u", "y"]

['a', 'e', 'i', 'o', 'u', 'y']
['a', 'e', 'i', 'o', 'u']

When using remove, specifying an item that is not found in the list will cause Python to throw a big red error message. Trying to remove a value that isn't found in the list is impossible. Imagine trying to remove bars of gold from a storage box; if there aren't any bars of gold to be found (as is the unfortunate case with my storage boxes -- too many pairs of socks to fit them, I suppose), it is not possible to get the gold out. When you see a ValueError pop up in red text when using remove, it's usually Python's way of telling you that you tried to remove data that wasn't there.

List features

Imagine that you teach a course on programming, and that you'd like to collect the first letter of each student's first name as an empirical examination on the distribution of letters. You collect the information from your students and store the letters in a list. If you have ten students in your course, you might end up with something that looks like this.

letter_list = ["s", "t", "e", "n", "s", "n", "r", "c", "s", "a"]

That's fine, but how can we actually make some sense of it? To start, we can use the Python sorting functions to see the letters in alphabetical order. There are two options for doing this. First, the sorted function accepts a list as an input parameter and returns a new sorted list.

>>> sorted(letter_list)
['a', 'c', 'e', 'n', 'n', 'r', 's', 's', 's', 't']

Note that we actually get a new list, just like the int, float, and str functions that return a new value instead of changing the old one into the new type. The letter_list variable still has the old jumbled list, and if we wanted to make use of the new sorted list, we could just store it in a new variable with a new name.

>>> print(letter_list)
['s', 't', 'e', 'n', 's', 'n', 'r', 'c', 's', 'a']
>>> sorted_letter_list = sorted(letter_list)
>>> print(sorted_letter_list)
['a', 'c', 'e', 'n', 'n', 'r', 's', 's', 's', 't']

If we wanted to change the old list in-place to a sorted list, we could call the sort function of the list object itself. This tells Python to reorganize the old list into a new sorted one.

>>> print(letter_list)
['s', 't', 'e', 'n', 's', 'n', 'r', 'c', 's', 'a']
>>> letter_list.sort()
>>> print(letter_list)
['a', 'c', 'e', 'n', 'n', 'r', 's', 's', 's', 't']

It's a lot easier to look at the sorted data than to use a jumbled list. We can immediately see that only a single student has a first name starting with "a", where there are two students that begin with "n" and three that begin with "s".

What if there are more than ten students? It might be easy to look at a small list to see that there are three letters that are the same, but with larger lists, it can easily become more complicated. Python has a function called count that retrieves the number of instances of a certain value inside a list. For example, if we wanted to know the number of "s" values in letter_list, count should return 3.

>>> print(letter_list.count("s"))

This idea can be extended by using all of the techniques we've built up in this chapter so far. Let's write some code that iterates over a sorted list of letters and prints out the total count for each unique letter.

letter_list = ["s", "t", "e", "n", "s", "n", "r", "c", "s", "a"]
last_letter = ""
for x in letter_list:
    if last_letter != x:
        last_letter = x
        print("Number of times {0} occurs: {1}".format(x, letter_list.count(x)))

Number of times a occurs: 1
Number of times c occurs: 1
Number of times e occurs: 1
Number of times n occurs: 2
Number of times r occurs: 1
Number of times s occurs: 3
Number of times t occurs: 1

In the code above, we keep track of the last letter that was used so that we don't end up printing "Number of times s occurs: 3" three times. If the current letter is the same as the last letter, just skip by and start the next iteration of the loop. If it's a different letter, print out the new letter along with the returned value from count, and move on.

The sorting functions that Python uses are somewhat independent of the type, given one important precondition. As long as it is possible to actually compare all of the values in the list to one another, the list can be sorted. For example, you can't compare str values and int values, and if you try, you'll get something called a TypeError.

>>> "Alexander" < 3
Traceback (most recent call last):
File "<pyshell#39>", line 1, in <module>
  Alexander" < 3
TypeError: unorderable types: str() < int()

It doesn't really make any sense to ask whether or not a name is less than the number three. It does make sense to ask if a name is less than the string representation of the number three, and we can do that without any problem.

>>> "Alexander" < "3"

Strings can be compared to one another. Integer and floating point values can be compared to one another, and booleans can be compared to one another. Even lists can be compared. You can use this fact to sort many different kinds of lists, as long as the data is reasonably consistent inside the list itself.

We can use this information to experiment with the max and min functions when used with lists. If we have a list of numbers, a list of strings, or a list of any values that can be compared against each other, the max and min functions will return the maximum and minimum values found in the list as expected.

>>> letter_list = ["s", "t", "e", "n", "s", "n", "r", "c", "s", "a"]
>>> max(letter_list)
>>> min(letter_list)

>>> number_list = [3, 6, 7.4, 2, 1, 5.2, 5.45, 9]
>>> max(number_list)
>>> min(number_list)

The max, min, and sorted functions can all be used on lists that have comparable members.

>>> sorted([3, 6, 7.4, 2, 1, 5.2, 5.45, 9])
[1, 2, 3, 5.2, 5.45, 6, 7.4, 9]
>>> sorted(["Canada", "USA", "Mexico"])
['Canada', 'Mexico', 'USA']
>>> sorted([True, True, False, True, False, False, False])
[False, False, False, False, True, True, True]

There are also times when you might be interested in the opposite ordering that a sorted list gives you. If you wanted a list of countries sorted in reverse order, you can use the reverse function in the same way that you'd use the sort function.

>>> country_list = ["Canada", "USA", "Mexico"]
>>> country_list.sort()
>>> country_list
['Canada', 'Mexico', 'USA']
>>> country_list.reverse()
>>> country_list
['USA', 'Mexico', 'Canada']

And finally, if you've got one list and you'd like to add the values in it to an entirely separate list, the extend function accepts a list as a function parameter and attempts to append each of the values in the source list to the target.

>>> country_list = ["Canada", "USA", "Mexico"]
>>> country_list.extend(["Japan", "China"])
>>> country_list
['Canada', 'USA', 'Mexico', 'Japan', 'China']

List operators

Earlier, we saw that it is possible to concatenate (glue) two strings together to get a new string by using the plus operator. It is also possible to get multiple copies of the same string by using multiplication.

>>> "Alexander" + "Coder"
>>> "Alexander" * 3

The same approach can be taken with lists. The plus and multiplication operators are used to concatenate lists together or to generate copies of a list inside a new list. These operators are reused to serve a similar purpose that fits nicely within the paradigms of addition and multiplication.

>>> ["Alexander"] + ["Coder"]
['Alexander', 'Coder']
>>> ["Alexander"] * 3
['Alexander', 'Alexander', 'Alexander']

You might also recall the slicing operator. With a string, we can specify the starting and ending characters to reference by using square brackets along with a colon and optional indices. The same approach exists with lists. Sub-lists can be obtained by slicing larger lists into smaller pieces.

>>> country_list = ["Canada", "USA", "Mexico"]
>>> country_list[0:2]
['Canada', 'USA']
>>> country_list[1:]
['USA', 'Mexico']
>>> country_list[-1:]
>>> country_list[-1]

Let's use list slicing in a slightly more practical way. It's time to write a program that, when given a source list of numbers, pulls out all the elements of that list that fall between two user-defined inputs. If our list is a collection of phone numbers, we might want all the numbers that start with 555. If our list is a collection of ages, we might want to know everyone between the ages of 20 and 29. Let's also create a random list of phone numbers.

phone_list = [5555593, 5554710, 5554913, 5555772, 5559913]

We're already familiar with getting input from the user, so let's write some basic code to retrieve the starting and ending range values.

start_int = int(input("Enter the starting phone number: "))
end_int = int(input("Enter the ending phone number: "))

To get all of the numbers in a range, let's start by using the simple approach of testing each number one-by-one and printing out the results as they are seen.

for phone in phone_list:
    if phone >= start_int and phone <= end_int:
        print("Phone number in range: {0}".format(phone))

Enter the starting phone number: 5554000
Enter the ending phone number: 5556000
Phone number in range: 5555593
Phone number in range: 5554710
Phone number in range: 5554913
Phone number in range: 5555772

How about a sorted list? These elements are still in a random order, so we have the option of sorting the phone_list variable using sort.

for phone in phone_list:
    if phone >= start_int and phone <= end_int:
        print("Phone number in range: {0}".format(phone))

Enter the starting phone number: 5554000
Enter the ending phone number: 5556000
Phone number in range: 5554710
Phone number in range: 5554913
Phone number in range: 5555593
Phone number in range: 5555772

We don't have information about the indices of these phone numbers. To identify the indices of all elements in the list in the acceptable range, it will be necessary to modify the code to retain this information. In fact, it might even make more sense to view this as two individual problems; find the left-most index that corresponds to the start of the phone numbers in the range, and find the right-most index that corresponds to the last phone number in the range. Let's rewrite the code to perform two searches.

To retrieve the index of the first value, we'll want to define a variable to hold the index value. Let's call it start_index. We can't just use index to get the right value, since our phone number might not actually be in the list. If the starting phone number is 5334000, that number might not exist, but 5333999 and 5334001 could be there. The search will start from the beginning of the list, and steadily move to the right until a value that is larger than or equal to the starting index is encountered. This is the first number in our valid range.

start_index = len(phone_list)
for x in range(len(phone_list)):
    if phone_list[x] >= start_int:
        start_index = x

The starting index is initialized to the size of the list for a meaningful reason. Slicing with a starting index greater than the length of the list will return an empty list. What that actually means in this example is that we asked for a phone number range that begins with a phone number greater than any of the numbers in our list. The returned list should be empty in that case, as there can be no phone numbers that match the request. A for loop with a range is used to iterate over all the values of the list by their index instead of their value. If the loop encounters a phone number that is greater than or equal to the user's request, save the index that is stored in the x variable, and break out of the loop immediately.

end_index = len(phone_list)
for y in range(len(phone_list) - 1, 0, -1):
    if phone_list[y] <= end_int:
        end_index = y

Getting the ending index is similar to getting the starting index. The major difference is the range of values to iterate over. Since we've moving from the right to the left, we start at the last position in the list, which is one less than the value with index equal to the length of the list. The range function should proceed down to the 0th index, moving by -1 each time (moving along the list to the left).

Close out the program with a final list slice and a print statement similar to the original example, and a properly sliced list is obtained with the indices of the values instead of just the values themselves.

valid_list = phone_list[start_index:end_index + 1]
for phone in valid_list:
    print("Phone number in range: {0}".format(phone))

Lists inside of lists

A list is a versatile data structure. In earlier sections, we alluded to the fact that lists are capable of holding lists as elements. Let's explore a reason for doing this, and show some of the real power of these structures.

We'll start by writing a program that keeps track of the population of Canadian cities. Each city consists of a name, a province, and a population as determined by the 2006 census. For every city, we'll have a list like the following:

>>> city = ["Toronto", "Ontario", 5113149]
>>> city[0]
>>> city[1]
>>> city[2]

One way of storing this data is to collect these city data structures inside of one giant list. We aggregate the collection of cities as individual elements, and when we're looking for a piece of data about a city, we iterate over the container list to find the particular city we want. Let's use the largest Canadian cities as a starting point.

city_obj = [
  ["Toronto", "Ontario", 5113149],
  ["Montreal", "Quebec", 3635571],
  ["Vancouver", "British Columbia", 2116581],
  ["Ottawa", "Ontario", 1130761],
  ["Calgary", "Alberta", 1079310],
  ["Edmonton", "Alberta", 1034945],
  ["Quebec City", "Quebec", 715515],
  ["Winnipeg", "Manitoba", 694668],
  ["Hamilton", "Ontario", 692911],
  ["London", "Ontario", 457720]

['Toronto', 'Ontario', 5113149]

Each inner element inside the city_obj list is itself a list, and can be referenced just like any other list value by an index. It is also possible to iterate over the values using a for loop, just like the previous examples.

for city in city_obj:
    print("{0}, {1}, population: {2}".format(city[0], city[1], city[2]))

Toronto, Ontario, population: 5113149
Montreal, Quebec, population: 3635571
Vancouver, British Columbia, population: 2116581
Ottawa, Ontario, population: 1130761
Calgary, Alberta, population: 1079310
Edmonton, Alberta, population: 1034945
Quebec City, Quebec, population: 715515
Winnipeg, Manitoba, population: 694668
Hamilton, Ontario, population: 692911
London, Ontario, population: 457720

The city variable is set to every element of the city_obj list as the for loop progresses, and each of the parameters of the city variable corresponds to one of the indices of the individual city list. The name, province, and population are all available, once the source list is identified using the for loop.

With all of this data in one place, it is now possible to write programs to access and retrieve information about these Canadian cities. Let's start with a simple test to determine whether or not a city is in the list, and if so, what is known about it.

city_name = input("Enter a city name: ")
for city in city_obj:
    if city[0].lower() == city_name.lower():
        print("{0}, {1}, population: {2}".format(city[0], city[1], city[2]))

Enter a city name: Toronto
Toronto, Ontario, population: 5113149

Enter a city name: QUEBEC CITY
Quebec City, Quebec, population: 715515

Enter a city name: kingston

Each of the cities is accessed one after the other, and the name element in the list is compared to the input retrieved from the user. The code takes advantage of the lower function to ignore capitalization, and uses straightforward string comparison to see if the current city in the list matches what the user asked for. If the city is found, we print out the full set of information about the city, and move on. It would have also been possible to insert a break statement to terminate the loop. Now let's look for substrings in the city name.

city_name = input("Enter part of a city name: ")
for city in city_obj:
    if city[0].lower().find(city_name.lower()) >= 0:
        print("{0}, {1}, population: {2}".format(city[0], city[1], city[2]))

Enter part of a city name: on
Toronto, Ontario, population: 5113149
Montreal, Quebec, population: 3635571
Edmonton, Alberta, population: 1034945
Hamilton, Ontario, population: 692911
London, Ontario, population: 457720

This new version of the code uses the find function and relies on lower to convert all the data to lower case to simplifiy string comparison. If the substring that was requested by the user is found in the original string, the code will print out the data.

Of course, this code can be modified to find all cities in a particular province by swapping out the city[0] comparison with a city[1] comparison. The value at index 0 in the source list is the city, and since we know that index 1 holds the province, we swap out the index value, change the input text, and run again.

province_name = input("Enter a province name: ")
    for city in city_obj:
    if city[1].lower() == province_name.lower():
        print("{0}, {1}, population: {2}".format(city[0], city[1], city[2]))

Enter a province name: ontario
Toronto, Ontario, population: 5113149
Ottawa, Ontario, population: 1130761
Hamilton, Ontario, population: 692911
London, Ontario, population: 457720

Alternately, if we'd like to modify the code to check for cities with a population over a certain size, we can simply change the expression in the if-statement inside the city loop, in conjunction with the data type of the input that we request from the user. You'll find this as a challenge in the exercises section at the end of this chapter.

Have a shot at using some lists in your own examples now. See what data you can stuff inside of there, and set up some loops and inputs to build interesting structures. Try storing different types of data other than cities.

Breaking Stuff

The quickest way to break a list is to reference an element outside the length of the list. For example, if you try to get the fifth element in a list with only four items in it, you're going to get an error.

x = [1, 2, 3, 4]
for i in range(5):

Traceback (most recent call last):
  File "", line 3, in <module>
IndexError: list index out of range

However, this is easy stuff. It's usually pretty obvious when you've broken something and you get an IndexError that suggests that your list index is out of range. We should be breaking stuff like that all the time now in our pursuit of knowledge.

To show you something that might look especially wacky at first glance, consider the following example.

Let's say that we have a list of numbers. These numbers are important to us, and we want to hold on to them--we don't want to go about changing the data. However, we want to use those numbers in interesting ways, like determining the squares of those numbers, or applying some other mathematical operation to them. Consider this code:

my_list = [5, 15, 25, 35, 45]

for i in range(len(my_list)):
    print("{0}, {1}".format(my_list[i], my_list[i] * my_list[i]))

5, 25
15, 225
25, 625
35, 1225
45, 2025

In that code, we have an important list of values called my_list that we use to determine some other set of important values later. In the code above, we are printing the data to the screen, which for the moment is fine. However, let's say that someone else insists that we store those values for later use. We propose creating another list called new_list and storing the new values there. To do this, we write the following code:

my_list = [5, 15, 25, 35, 45]
new_list = my_list
for i in range(len(new_list)):
    new_list[i] = new_list[i] * new_list[i]


[25, 225, 625, 1225, 2025]

Okay, we've got a new list with the squares of the values in my_list. Everything looks great. Or rather, everything looks great until we look at the values in my_list again as a sanity check.

my_list = [5, 15, 25, 35, 45]
new_list = my_list
for i in range(len(new_list)):
    new_list[i] = new_list[i] * new_list[i]


[25, 225, 625, 1225, 2025]
[25, 225, 625, 1225, 2025]

What the heck happened there? Why did the values in my_list get changed as well? We only modified the values referenced in new_list, so what caused the others ones to change?

Remember, Python is a pass by reference language. What this means is that identifiers--variable names--are just a reference to some value in memory. A list, as a value, exists separately from the identifier.

When we made the assignment to new_list, what we actually said was to make new_list point to the same value that my_list pointed to. Making a change to new_list, in that case, would also make a change to the value pointed to by my_list. For a simpler example, check this out:

a = []
b = a



To actually get the functionality we're looking for, we have to use a list copy. Just like a string slice from earlier, we can write the code in the following way:

my_list = [5, 15, 25, 35, 45]
new_list = my_list[:]
for i in range(len(new_list)):
    new_list[i] = new_list[i] * new_list[i]


[25, 225, 625, 1225, 2025]
[5, 15, 25, 35, 45]

By forcing an explicit copy, we make sure that the identifier new_list points to a copy of the my_list value, and we don't stomp on a value we don't want to change.


Lists are one of the most powerful data structures in Python due to their combination of flexibility and ease-of-use. They abstract away the complicated elements of a data structure called a linked list, which is another fundamental concept in programming. Due to the ability to resize the list at will, to add elements in arbitrary places, to remove elements, and to compare, index, and who knows what else, you'll find yourself using these all the time.

Lists aren't the only way to store data, however. In the next section, we'll look at the dictionary, a similar data structure that orders your values differently. Each data structure has their use, and by trying them out (and of course by breaking them once in a while), you'll come to understand where and why to use each one.


1. Take the city_obj example code from the Lists inside of Lists section, and change it so that instead of searching by city name, the code searches by city size. Accept an int from the user, and return any city with a population greater than or equal to the population provided.

2. Set up a simple address book program, where you provide a list of individuals with names, phone numbers, and addresses, and allow the user to search based on one of these parameters. Populate the data in the address book yourself, but use a similar program structure to the city_obj example to search through and find the people you're looking for.