Python's Data Types

Sue Evans & David Walser

Press space bar for next page

Learning Outcomes

Understand basic data types used by all languages
Become familiar with Python's data types
Understand that large collections of data require the use of data structures
Be able to write a program using lists
Be able to write a program using sets
Be able to write a program using dictionaries

Data types

Whenever we describe a data type, we not only need to give a definition of what that type is but also the operations that can be done on that type. The definition may include the range of possible values that type can hold.

Example: integer

Definition:
- Integers are positive whole numbers, zero and negative whole numbers in the range of -2³¹ to 2³¹ - 1
Operations:
- Arithmetic operations (addition, subtraction, multiplication, division, exponentiation, and modulus)
- Numerical comparisons (is equal to, is not equal to, less than, less than or equal to, greater than, greater than or equal to)

Python's data types

integers
floats
long ints
strings
lists
sets
dictionaries (associative arrays)

Strings

Definition:
- A string is a group of characters that are thought of as a single object.
Operations:
- concatenation
- comparison
- indexing
- extracting substrings
- finding substrings
- splitting
- stripping
- changing case

Data Structures

Variables work for a small set of distinct values we know we'll need ahead of time.
What about larger collections of data whose size is unknown ahead of time?
For collections of data we'll need to use Data Structures.

In addition to the string, Python has several other built-in data structures:

lists - ordered collections of data
sets - unordered collections of data, no duplicates
associative arrays - name/value pairs, like a dictionary

Why is this important?

Choosing the right data structure to solve a problem affects how easy the problem is to solve, how readable the code will be, and how efficient the resulting program will be.

"ALGORITHMS + DATA STRUCTURES = PROGRAMS" -- Niklaus Wirth, creator of the Pascal programming language
"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds, creator of the Linux operating system
"Show me your flowcharts [algorithms] and conceal your tables [data structures], and I shall continue to be mystified. Show me your tables [data structures], and I won't usually need your flowcharts [algorithms]; they'll be obvious."
-- Fred Brooks, famous software engineer and author of The Mythical Man-Month

Lists

Definition:
- A list is an ordered collection of items, where the items can be of any type.
- Each item has a numerical index, indicating the position in the list, which is what is meant by "ordered". Indices start at zero.
Operations:
- adding items to the list
- removing items from the list
- finding items in the list
- indexing (accessing or replacing an item in the list)

A Review of Lists

There are several ways to create new lists:

>>> # assign the literal empty list
>>> empty = []

>>> # use the list constructor
>>> empty = list()    

>>> # give a literal assignment of items
>>> items = ["bread", "milk", "cheese", "cider"]
>>> items
['bread', 'milk', 'cheese', 'cider']

>>> # use the list constructor with a string
>>> letters = list("hello")
>>> letters
['h', 'e', 'l', 'l', 'o']

>>> # use string.split which returns a list of strings
>>> words = "a bunch of words".split()
>>> words
['a', 'bunch', 'of', 'words']

Operators:

indexing list[index] - get or set an item at the given index in the list

>>> items[2]        
'cheese'
>>> items[3]
'cider' 
>>> items[3] = "lemonade"
>>> items[3]
'lemonade'

containment item in list - see if the item is in the list

>>> "milk" in items
True
>>> "juice" in items       
False

equality list1 == list2 - see if two lists are the same

>>> "two words".split() == ["two", "words"]     
True
>>> ["this", "that"] == ["that", "this"]        
False

Methods:

append(item) - add the item to end of the list
count(item) - count occurences of the item in the list
extend(list) - append multiple items to the end of the list
index(item) - locate the index of an item in the list
insert(index, item) - insert an item before the one at the index
remove(item) - remove the first occurence of the item in the list
reverse() - reverse the order of the items in the list
sort() - sort the list

Built-in functions that operate on lists:

len(list) - count number of items in the list
del(list[index]) - remove the item at index from the list

Examples

Examples Using List Methods

append(item) - add an item to the end of the list

>>> items
['bread', 'milk', 'cheese', 'lemonade']
>>> items.append('ham')
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham']
>>> items.append('ham')
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham']
>>>

count(item) - count occurences of an item in the list

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham']
>>> items.count('ham')
2
>>> items.count('cheese')
1
>>>

remove(item) - remove the first occurence of an item in the list

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham']
>>> items.remove('ham')
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham']
>>>

extend(list) - append multiple items to end of the list

>>> items.extend(['lettuce', 'tomatoes'])
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>>

index(item) - locate the index of an item in the list

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>> items.index('milk')
1
>>>

insert(index, item) - insert an item before the one at the index

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>> items.insert(2, 'eggs')
>>> items
['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>>

reverse() - reverse the order of the items in the list

>>> items
['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>> items.reverse()
>>> items
['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread']
>>>

sort() - sort the list

>>> items
['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread']
>>> items.sort()
>>> items
['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes']
>>>

Built-In Functions

len(list) - count number of items in the list

>>> items
['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes']
>>> len(items)
8
>>>

del(list[index]) - remove the item at the index from the list

>>> items
['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes']
>>> del(items[4])
>>> items
['bread', 'cheese', 'eggs', 'ham', 'lettuce', 'milk', 'tomatoes']
>>>

Using lists

Use a list when order matters.

Items stay in the order they're inserted in with append()
A list is indexed in order
A list can also be changed so that the items are in sorted order.

Mutability

An object that is mutable can be changed in-place. Lists are mutable.
Example: list[2] = "foo"

>>> myList = ['cat', 'dog', 'bird', 'hamster']
>>> myList
['cat', 'dog', 'bird', 'hamster']
>>> myList[2] = "foo"
>>> myList
['cat', 'dog', 'foo', 'hamster']
>>>

Strings are immutable (aka, not mutable).
Example: myString[3] = 'f' is not supported

>>> myString = 'fooooo'
>>> myString[3] = 'f'
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: object does not support item assignment
>>>

Lists, and other data structures can be used as return values from functions. They can also be used as arguments. Mutable data structures can be changed by the function without returning them.


# fillList() takes an empty list and fills it with the integers 0 - 9
# Input: aList, an empty list
# Output: None, but the list has been modified

def fillList(aList):

    for i in range(10):
        aList.append(i)


def main():

    print
    print "This program uses a function to fill a list with values"
    print "Notice that since the list is mutable, we need only pass"
    print "it to the function and let the function modify it.  The list"
    print "doesn't need to be returned."
    print

    myList = []
    print "Before calling fillList, myList = ", myList
    fillList(myList)
    print "After calling fillList, myList =", myList


main()

Let's see it work!

This program uses a function to fill a list with values
Notice that since the list is mutable, we need only pass
it to the function and let the function modify it.  The list
doesn't need to be returned.

Before calling fillList, myList =  []
After calling fillList, myList = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

So lists are definitely mutable!

Strings are immutable so let's see the difference:


# fillString() takes an empty string and concatenates letters to the string
# Input: aString, an empty string
# Output: None, and the original string has NOT been modified
def fillString(aString):

    for ch in "letters":
        aString += ch


def main():

    print
    print "This program uses a function to fill a string with characters"
    print "Notice that since the string isn't mutable, the function won't"
    print "be able to modify the original string.  The string would have to"
    print "be returned.  In fact, since strings are immutable, for each"
    print "character that is concatenated onto the string, a new string has"
    print "to be made."
    print

    myString = ""
    print "Before calling fillString, myString = ", myString
    fillString(myString)
    print "After calling fillString, myString =", myString


main()

What happens when we run this:

This program uses a function to fill a string with characters
Notice that since the string isn't mutable, the function won't
be able to modify the original string.  The string would have to
be returned.  In fact, since strings are immutable, for each
character that is concatenated onto the string, a new string has
to be made.

Before calling fillString, myString =  
After calling fillString, myString =

Let's write a program that reads names from a file, sorts them, and prints them out (numbered, and on separate lines) in all-caps.
In this example, we create the list in a function and return it to main(). After that, the other functions simply modify the list without returning it. Since lists are mutable, this works fine.

names.dat

Nicholas, Charles
Finin, Tim
desJardins, Marie
Evans, Sue

Example output:

1    DESJARDINS, MARIE
2    EVANS, SUE
3    FININ, TIM
4    NICHOLAS, CHARLES

How about a top-down design.

Let's figure out what the sequence of events needs to be
1. Print a Greeting
2. Prompt for filename
3. Open the file
4. Read in a name
5. Change to all caps
6. Append the name to the list
7. Repeat steps 4 - 6 until EOF
8. Close the file
9. Sort the list
10. Print the list numbered with each name on a separate line


import string

# printGreeting() explains the program to the user
# Inputs: None
# Output: None
def printGreeting():
    
    print "This program reads in a list of names from a file,"
    print "capitalizes them, sorts them alphabetically, and prints"
    print "them out in a numbered table."


# readNames() reads in the names from a file
# and puts them into a list called names
# Input: filename, the name of the file to open
# Output: names, the list of names 
def readNames(filename):

    names = []

    file = open(filename, 'r')

    for line in file:
        name = line.strip()
        names.append(name)

    file.close()

    return names


# capitalizeNames() capitalizes names (changes the list in-place)
# Inputs: the list of names
# Outputs: None, but the list was modified by this function
def capitalizeNames(names):

    for i in range(len(names)):
        names[i] = names[i].upper()


# printTable() prints the table of numbered names
# one per line
def printTable(names):

    length = len(names)

    for i in range(length):
        print (i + 1), "\t", names[i]


def main():

    printGreeting()

    # get filename from user    
    filename = raw_input("What's the name of the file ? ")

    # process the names
    names = readNames(filename)
    capitalizeNames(names)
    names.sort()

    # print the table
    printTable(names)

main()

Sets

Definition:

Sets are an unordered collection of items, where duplicates are not allowed.
Items must be immutable (such as numbers or strings).

Operations:

adding items to the set
removing items from the set
mathematical set operations (union, intersection, difference)

Sets in Python

Creating new sets

Use the constructor to make an empty set

>>> empty = set()

Use the constructor passing it a list

>>> cards = set(["jack", "queen", "king"])
>>> cards
set(['king', 'queen', 'jack'])
>>>

Use the constructor passing it a list that contains duplicates

>>> cards1 = set(["jack", "jack", "queen", "king", "king"])
>>> cards1
set(['king', 'queen', 'jack'])
>>>

Operators

containment

>>> cards = set(['jack', 'queen', 'king'])
>>> cards
set(['king', 'queen', 'jack'])
>>> 'queen' in cards
True

set equality

>>> cards2 = set(['queen', 'jack', 'king'])
>>> cards2
set(['king', 'queen', 'jack'])
>>> cards == cards2
True

Methods

add(item) - add item to the set

>>> cards
set(['king', 'queen', 'jack'])
>>> cards.add('ace')
>>> cards
set(['king', 'queen', 'jack', 'ace'])
>>>

copy() - create a copy of the set

>>> cards
set(['king', 'queen', 'jack', 'ace'])
>>> cards2 = cards.copy()
>>> cards2
set(['king', 'queen', 'jack', 'ace'])
>>>

remove(item) - remove item from the set

>>> cards2
set(['king', 'queen', 'jack', 'ace'])
>>> cards2.remove('king')
>>> cards2
set(['queen', 'jack', 'ace'])
>>>

difference(set) - create the set difference with another set

>>> cards
set(['king', 'queen', 'jack', 'ace'])
>>> cards2
set(['queen', 'jack', 'ace'])
>>> cards.difference(cards2)
set(['king'])
>>>
>>> cards2.add('deuce')
>>> cards2
set(['deuce', 'queen', 'jack', 'ace'])
>>> cards.difference(cards2)
set(['king'])
>>> cards2.difference(cards)
set(['deuce'])
>>>

intersection(set) - create an intersection with another set

>>> cards
set(['king', 'queen', 'jack', 'ace'])
>>> cards2
set(['deuce', 'queen', 'jack', 'ace'])
>>> cards.intersection(cards2)
set(['queen', 'jack', 'ace'])
>>>

union(set) - create a union with another set

>>> cards
set(['king', 'queen', 'jack', 'ace'])
>>> cards3
set(['trey', 'deuce'])
>>> cards3.add('ace')
>>> cards3
set(['trey', 'deuce', 'ace'])
>>> allcards = cards.union(cards3)
>>> allcards
set(['trey', 'king', 'deuce', 'jack', 'ace', 'queen'])
>>>

issubset(set) - determine if the set is a subset of another one

>>> allcards
set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce'])
>>> cards3
set(['trey', 'deuce', 'ace'])
>>> cards3.issubset(allcards)
True
>>> allcards.issubset(cards3)
False
>>>

issuperset(set) - determine if the set is a superset of another one

>>> allcards
set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce'])
>>> cards3
set(['trey', 'deuce', 'ace'])
>>> allcards.issuperset(cards3)
True
>>> cards3.issuperset(allcards)
False
>>>

clear() - remove everything from the set

>>> cards3
set(['trey', 'deuce', 'ace'])
>>> cards3.clear()
>>> cards3
set([])
>>>

Built-in functions that operate on sets

len(set) - count # of items in the set

>>> allcards
set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce'])
>>> len(allcards)
6
>>> allcards.clear()
>>> len(allcards)
0
>>>

Using sets

When would you use a set?

When order doesn't matter and you don't want to allow duplicates.

Let's write a program to get names from a user, but don't allow duplicates. Print out the list of names at the end.

names = set()
name = raw_input("Enter a name (or . to quit): ")

while name != ".":
    names.add(name)
    name = raw_input("Enter a name (or . to quit): ")

for name in names:
    print name

Let's run it!

linuxserver1.cs.umbc.edu[128] python nameset.py
Enter a name (or . to quit): sue
Enter a name (or . to quit): ben
Enter a name (or . to quit): dan
Enter a name (or . to quit): dawn
Enter a name (or . to quit): tess
Enter a name (or . to quit): dan
Enter a name (or . to quit): dawn
Enter a name (or . to quit): .
sue
ben
dan
dawn
tess
linuxserver1.cs.umbc.edu[129]

Using Sets for HW3

Would sets have been a better choice for the data structures to use for HW3, the Computer Science Requirements. Let's look at the implementation just for parts A and B.


import string

# partAB() prints part A or part B courses the student still needs to take
# Inputs: the set of courses needed for this part, needed
#         the set of courses taken, taken
#         the part being processed, part
# Output: None
def partAB(needed, taken, part):

    print
    if needed.issubset(taken):
        print 'You have satisfied the Part', part, 'requirements.'
    else:
        remaining = needed.difference(taken)
        print 'Part', part, 'Requirements:'
        for course in remaining:
            print "You need to take", course



def main():

    # set up course sets
    requiredSet = set(['CMSC 201', 'CMSC 202', 'CMSC 203', 'CMSC 304'])
    more = set(['CMSC 313', 'CMSC 331', 'CMSC 341', 'CMSC 345', 'CMSC 411'])
    more2 = set(['CMSC 421', 'CMSC 441'])
    requiredSet = requiredSet.union(more)
    requiredSet = requiredSet.union(more2)

    requiredMath = set(['MATH 151', 'MATH 152', 'MATH 221'])

    # start userClasses as the empty set
    userClasses = set() 

    # get the classes the user has taken putting them into a set
    print "You should have a file that contains the names of the courses"
    print "you've taken toward the CS major.  There should be one course"
    print "per line."
    filename = raw_input("Please enter the name of this file : ")

    file = open(filename, 'r')

    # add each course taken to the user's set
    for line in file:
        course = line.strip()
        userClasses.add(course)

    file.close()
    
    # call functions to produce output for each part
    partAB(requiredSet, userClasses, 'A')
    partAB(requiredMath, userClasses, 'B')


main()

This code looks easier to write than the code we wrote using lists. Let's see what's in the input file :


CMSC 201
MATH 151
CMSC 202
CMSC 203
MATH 152

Let's run it!

You should have a file that contains the names of the courses
you've taken toward the CS major.  There should be one course
per line.

Please enter the name of this file : courses.txt

Part A Requirements:
You need to take CMSC 411
You need to take CMSC 313
You need to take CMSC 421
You need to take CMSC 341
You need to take CMSC 345
You need to take CMSC 331
You need to take CMSC 441
You need to take CMSC 304

Part B Requirements:
You need to take MATH 221

Oh, no!
We have the right answer, but our list of courses is out of order!

So after trying the 2 different data structures, lists and sets for HW3, what data structure do you think would be the best for this program ?

Associative Arrays

Definition:

Associative arrays are tuples of key-value pairs. The keys must be immutable (such as numbers or strings). Values can be any type. They associate one value with each key. Keys aren't in any particular order. Associative arrays are implemented with the Python dictionary type.

Operations:

Associating a value with a key
Looking up a key's value
Checking for a key
Removing a key

Associative arrays are built into many languages, but they are called many different things:

Python/ObjC - dictionary
Perl/Ruby - hash
Java/C++ - map
Awk/PHP - array

Associative Arrays in Python

Creating new dictionaries:

# use the constructor to make an empty dictionary
empty = dict()

# use the empty dictionary literal
empty = {}        

# use the constructor with some initial pairings
# in this case the keys are strings
numbers = dict(one=1, two=2)        

# use the dictionary literal with some initial pairings
capitals = {'MD':'Annapolis', 'ME':'Augusta'}

Operators:

indexing
key containment

Methods:

copy() - create a copy of the dictionary
clear() - remove all entries from the dictionary
get(key) - find the value associated with a key
get(key, default) - use a default value when a key isn't found
has_key(key) - another way to test for key containment
keys() - create a list of the keys in the dictionary
values() - create a list of the values in the dictionary

Built-in functions that operate on dictionaries:

len(dict) - count # of keys in the dictionary
del(dict[key]) - remove a key (and its value) from the dictionary

Dictionary Examples

Operators:

indexing

>>> capitals = {'MD':'Annapolis', 'ME':'Augusta'}
>>> capitals
{'ME': 'Augusta', 'MD': 'Annapolis'}
>>> capitals['MI'] = 'Lansing'
>>> capitals
{'ME': 'Augusta', 'MD': 'Annapolis', 'MI': 'Lansing'}
>>> capitals['MD']
'Annapolis'
>>> capitals['MI']
'Lansing'
>>> capitals['VA']
Traceback (most recent call last):
  File "", line 1, in ?
KeyError: 'VA'

key containment

>>> 'ME' in capitals
True
>>> 'VA' in capitals
False

Methods:

copy() - create a copy of the dictionary

>>> cap2 = capitals.copy()
>>> cap2
{'ME': 'Augusta', 'MD': 'Annapolis', 'MI': 'Lansing'}

clear() - remove all entries from the dictionary

>>> cap2.clear()
>>> cap2
{}

get(key) - find the value associated with a key

>>> cap = capitals.get('MD')
>>> cap
'Annapolis'
>>> cap = capitals.get('VA')
>>> cap
>>>

get(key, default) - use a default value when a key isn't found

>>> cap = capitals.get('MD', 'unknown')
>>> cap
'Annapolis'
>>> cap = capitals.get('VA', 'unknown')
>>> cap
'unknown'

has_key(key) - another way to test for key containment

>>> capitals.has_key('ME')
True
>>> capitals.has_key('ND')
False

keys() - create a list of the keys in the dictionary

>>> capitals.keys()
['ME', 'MD', 'MI']

values() - create a list of the values in the dictionary

>>> capitals.values()
['Augusta', 'Annapolis', 'Lansing']

Built-in functions that operate on dictionaries:

len(dict) - count # of keys in the dictionary

>>> len(capitals)
3

del(dict[key]) - remove a key (and its value) from the dictionary

>>> del(capitals['ME'])
>>> capitals
{'MD': 'Annapolis', 'MI': 'Lansing'}
>>>

Using associative arrays

Why use an associative array?

When you want to associate keys with values. Keys don't have to be numbers, or if they are, they don't have to be contiguous and starting from zero (like in a list).

Write a program that simulates rolling pair of a dice 1000 times. Print out a table showing how many times each outcome was rolled.

import random
counts = {}

for i in range(1000):

    # simulates rolling 2 dice
    a = random.randrange(1, 7)
    b = random.randrange(1, 7)
    sum = a + b

    # gets the old count for that roll or
    # 0 if that sum hasn't been rolled yet
    count = counts.get(sum, 0)

    # writes the new count to an existing entry or
    # creates a new dictionary entry with a key of
    # sum and a count of 1
    counts[sum] = count + 1

# sort the keys, sums is a List of keys
sums = counts.keys()
sums.sort()

# print table 
print "Sum\tOccurences"
print "-" * 18
for sum in sums:
    print sum, "\t", counts[sum]

Let's run it!

ite207-pc-01.cs.umbc.edu[126] python dice.py
Sum     Occurences
------------------
2       38
3       60
4       92
5       104
6       141
7       169
8       132
9       120
10      68
11      43
12      33
ite207-pc-01.cs.umbc.edu[127]

Dictionary Exercise

Write a program that reads locations and their zipcodes from a file into a dictionary and then lets a user query locations to find the associated zipcode.

Here's the data file

Plymouth 02360
Bar_Harbor 04609
Mystic 06355
Cape_May 08204
Hyde_Park 12538
Niagara_Falls 14301
Paradise 17562
Rehoboth_Beach 19971
D.C. 20001
UMBC 21250
Key_West 33040
Nashville 37201
Custer 57730
West_Yellowstone 59758
Whitefish 59901
Chicago 60610
St._Louis 63121
Kansas_City 66101
Baton_Rouge 70801
Austin 78701
Boulder 80301
Cody 82414
Ketchum 83340
Moab 84532
Tempe 85281
Taos 87571
Las_Vegas 89110
Big_Sur 93920
Carmel 93923
Monterey 93940
San_Francisco 94110

Here's what the output should look like:

linuxserver1.cs.umbc.edu[149] python zipcodes.py
Enter the filename of the zipcode file : zipcodes.txt
Which location (Q to quit) ? Plymouth
02360
Which location (Q to quit) ? Carmel
93923
Which location (Q to quit) ? UMBC
21250
Which location (Q to quit) ? Oshkosh
unknown
Which location (Q to quit) ? Niagara_Falls
14301
Which location (Q to quit) ? Q
linuxserver1.cs.umbc.edu[150]

Which data structure ?

What data structures would you use for these programs?

A program that reads in a dictionary file and then lets a user query words to see if they are spelled correctly
A program that gets up to 100 numbers from the user and prints out a table showing how many times each number occured. The table doesn't need to be in numeric order. The numbers can range from 0 to 2,000,000,000.
A program that gets 8 scores from the user (range 0-6) and prints the scores in numerical order as well as the average score after throwing out the lowest and highest score.