Sorting II &
Command-line arguments

Sue Evans & Will Murnane

Learning Outcomes

The students will be able to :

explain some problems that can be solved by sorting.
use a key function to sort in some order other than the default order.
use a cmp function to sort with some other criterion.
use command-line arguments to pass information to your program

Sorting -
a problem-solving tool

Problems:

Find the mimimum & maximum of a list of values
- Recall from Lab 10 that an easy way to find the minimum and maximum is to sort the list of values.

>>> values = [7, 6, 9, 3, 4, 1]
>>> values.sort()
>>> minimum = values[0]
>>> size = len(values)
>>> maximum = values[size - 1]
>>> minimum
1
>>> maximum
9

Find your CMSC 201 lab score - Sum of the best 10 of 13 scores.
- Sort the scores.
- lab score = sum of the scores at index 3 - 12

>>> scores = [3, 3, 3, 3.5, 3.5, 0, 3, 2, 3, 3.5, 3.5, 2, 0]
>>> scores.sort()
>>> labScore = 0
>>> size = len(scores)
>>> for i in range(3, size):
...     labScore += scores[i]
... 
>>> labScore
31.0
>>>

Find a gymnast's score - Average after removing the lowest and highest score.
- Sort the scores.
- Find the average of scores at index 1 to len(list) - 2

>>> gymScores = [9, 8, 9, 9, 8, 10, 10, 9, 9, 8]
>>> gymScores.sort()
>>> total = 0
>>> size = len(gymScores)
>>> for i in range(1, size - 1):
...     total += gymScores[i]
... 
>>> average = float (total) / (size - 2) 
>>> average
8.875

uniq - Suppose we had a list of numbers, and we wanted to return only the unique items in it. That is, if we are given [1,2,4,2,3,5,2,1,3], we want to return the list [1,2,3,4,5].
- Sort the scores.
- Add items to the result list only when the item changes.

def uniq(aList):

  aList.sort()

  result = []
  lastItem = None

  for item in aList:
    if item != lastItem:
      result.append(item)
      lastItem = item

  return result


print uniq([1,2,4,2,3,5,2,1,3])
[1, 2, 3, 4, 5]

Decorate-sort-undecorate

Suppose we have a list of things that we want to sort by something other than just the natural ordering—for example, if we had a list of strings that we want to sort by length instead of alphabetically. We could take this list and turn it into a list of things whose natural sort order puts the items of the original list into the order we want them in. Here's what it looks like in code:

>>> stringList = ["a", "apple", "hello", "trucks", "junk"]
>>> decorated = []
>>> for item in stringList:
...   decorated.append((len(item), item))
...
>>> print decorated
[(1, 'a'), (5, 'apple'), (5, 'hello'), (6, 'trucks'), (4, 'junk')]

Now if we sort this list, the strings we started with are in order by length.

>>> decorated.sort()
>>> print decorated
[(1, 'a'), (4, 'junk'), (5, 'apple'), (5, 'hello'), (6, 'trucks')]

Then we can take this list and undecorate it (remove the counts and leave only the original strings) which will still be in sorted order.

>>> undecorated = []
>>> for item in decorated:
...   undecorated.append(item[1])
...
>>> undecorated
['a', 'junk', 'apple', 'hello', 'trucks']

This method, as suggested by the variable names we've used, is called
"decorate-sort-undecorate" or "the Schwartzian transform".
It's a common method of sorting by orders other than the default one.

Using the `key` function to
Sort by non-default fields

The decorate-sort-undecorate method shown in the last slide works, but it's a little messy. It takes up temporary space as it does its work that is bigger than the original list of items. This can be a concern if a lot of items are present in the list. To alleviate this problem, Python (and many other languages) have built-in functionality that lets us do this non-default sort without the extra steps.

Suppose we are tracking how many students get a particular grade, and we want to sort the results in various ways. We've got a dictionary built like this:

grades = {'A': 13, 'B': 27, 'C': 11, 'D': 4, 'F': 0}

Now we want to display this data in two ways: first, sorted by grade. We can extract the elements of the dictionary to a list by using the items method, and then sort it:

>>> grades.items()
[('A', 13), ('C', 11), ('B', 27), ('D', 4), ('F', 0)]
>>> sorted(grades.items())
[('A', 13), ('B', 27), ('C', 11), ('D', 4), ('F', 0)]

So, we can easily sort by letter grade. But, suppose we wanted to see which grades were most common?
In other words, we want to sort by the second part of each item. To do this, we could write a function that gets the second part of an item, and pass it to the sort function:

>>> def getSortKey(item):
...   return item[1]
...
>>> sorted(grades.items(), key = getSortKey)
[('F', 0), ('D', 4), ('C', 11), ('A', 13), ('B', 27)]

Now the list is sorted by the second attribute of the items. This is a common thing to do, so Python has a function that does the same thing as getSortKey. We can load and use it like this:

>>> from operator import itemgetter
>>> sorted(grades.items(), key = itemgetter(1))
[('F', 0), ('D', 4), ('C', 11), ('A', 13), ('B', 27)]

As you can see, the itemgetter function takes an argument that specifies which attribute of the items it should return for comparison.

What would the earlier "sort strings by length" example look like with a key function?

>>> stringList = ["apple", "a", "hello", "trucks", "junk"]
>>> sorted(stringList, key = len)
['a', 'junk', 'apple', 'hello', 'trucks']

Using the `cmp` function

Sometimes it's not easy to come up with a function that generates a sort key for a particular sorted order that we want. We can also use a cmp function that will be given two items to compare. As an example, here's what the default cmp function does with integers:

>>> cmp(5, 3)
1
>>> cmp(5, 5)
0
>>> cmp(5, 7)
-1

So our function returns a positive number if the first item is larger,
zero if they're equal, or a negative number if the first item is smaller.

Suppose we wanted to sort the grades dictionary as before, but this time using a comparison function:

>>> def myCmp(a, b):
...   if a[1] > b[1]:
...     return 1
...   if a[1] < b[1]:
...     return -1
...   return 0
...
>>> sorted(grades.items(), cmp = myCmp)
[('A', 13), ('B', 27), ('C', 11), ('D', 4), ('F', 0)]

This may seem like a trivial example, since we could use a key function. But some languages don't have an equivalent to the key function, so we want to introduce many ways of accomplishing this task.

Let's try a more complicated example. Suppose we have a list of lists: each item in the list is itself a list of the answers a student gave on the midterm exam. We have a list of the correct answers, and we want to sort the students by the grade they got.

ANSWER_KEY = [ ... ]

def byGrade(student1, student2):

  score1 = score2 = 0

  for question in range(len(student1)):
    if student1[question] == ANSWER_KEY[question]:
      score1 += 1
    if student2[question] == ANSWER_KEY[question]:
      score2 += 1

  return cmp(score1, score2)

`reverse`

Three ways to get the list sorted in the opposite order:

Suppose we want to sort a list in descending order: the big items come first, then the smaller ones. The default sort in Python (and many other languages) is an ascending-order sort, how could we go about getting the items sorted in the opposite order? We could sort the list and then use the reverse method:
```
>>> aList = [4,1,6,3,8,2,5]
>>> aList.sort()
>>> aList
[1, 2, 3, 4, 5, 6, 8]
>>> aList.reverse()
>>> aList
[8, 6, 5, 4, 3, 2, 1]
```
This works, but it doesn't feel quite right; we just spent a lot of time putting it in one order, then we go back and swap everything around.

Instead, we can define a comparator function that puts things in backwards order:

>>> def reverseCmp(a, b):
...   if a[1] > b[1]:
...     return -1
...   if a[1] < b[1]:
...     return 1
...   return 0
>>> aList.sort(cmp = reverseCmp)
>>> aList
[8, 6, 5, 4, 3, 2, 1]

This, too, works, but there's one more thing we could do to optimize it.

Python can take another argument reverse to its sort function. If we give True as an argument, then the list is sorted in descending order.
```
>>> aList.sort(reverse = True)
>>> aList
[8, 6, 5, 4, 3, 2, 1]
```

Command-line arguments

There are times when it would be convenient for your program to be able to get information from the operating system's command line. This allows programs to be run in batch mode where the output from one program can be the input for another, etc.

Any number of arguments can be passed from the command line into your program.

Here's an example of code that uses command-line arguments:

import sys

num = len(sys.argv)

for i in range(num):

    print "sys.argv[%d] = %s" % (i, sys.argv[i])

Here's the output:

linuxserver1.cs.umbc.edu[160] python commandLine.py 2 foo 7.5 bar snoopy jazz
sys.argv[0] = commandLine.py
sys.argv[1] = 2
sys.argv[2] = foo
sys.argv[3] = 7.5
sys.argv[4] = bar
sys.argv[5] = snoopy
sys.argv[6] = jazz
linuxserver1.cs.umbc.edu[161]

In the sys module there is a list of strings known as argv. This list will consist of one or more strings.
sys.argv[0] always contains the name of the executable
As you see from the example above, each item entered on the command line after that becomes another item in the list of strings. These are known as command-line arguments.
Everything entered on the command line is a string, so you'll need to cast items that are to be used as numbers.

Using command-line arguments

This example uses command line arguments to give the program the name of the input file to use and the name of the output file to write during processing. This is a very common use of command line arguments.

If your program needs command line arguments in order to run, then you should have a clearly marked usage instructions in your file header comment to explain how to run the program.

# commandLine.py
# Sue Evans
# 11/17/09
# All sections
# bogar@cs.umbc.edu
#
# This program illustrates using command-line arguments
# It also uses file-handling, strings, lists & dictionaries
#
# This program requires command line arguments which are the
# filename of the input file and the filename of the output
# file, in that order.

import sys
import string

ANSWER_KEY = ['T','T','F','a','b','c']

def main():

    # make sure there are 3 arguments on the command line
    # exit if not
    if len(sys.argv) != 3:
        print "This program requires command line arguments."
        print "The first argument is the filename of the input file."
        print "The second argument is the filename of the output file."    
        sys.exit()

    grades = {}

    # open file for input
    infile = open(sys.argv[1], "r")

    # for each student read in a line, strip it and
    # split it into the student's name and a string of his answers
    for line in infile:
        student = string.strip(line)
        student = string.split(student)
        size = len(student[1])

        # change the string of answers to a list of answers
        answers = []
        for i in range(2, size, 4):
            answers.append(student[1][i])

        # make the student's name the key and his list of answers the value
        grades[student[0]] = answers

        # calculate the student's score
        score = 0
        size = len(answers)
        for question in range(size):
            if answers[question] == ANSWER_KEY[question]:
                score += 1

        # change the value to being just the score instead of a
        # list of answers
        grades[student[0]] = score

    # close the infile and open the outfile
    infile.close()
    outfile = open(sys.argv[2], "w")

    # make a list of the keys and sort them
    names = grades.keys()
    names.sort()

    # write the sorted student's names and their scores to the outfile
    for name in names:
        outfile.write(name + "\t" + str(grades[name]) + "\n")

    # close the output file
    outfile.close()


main()

Here's the input file, answers.txt :

Barnes,Beth ['T','T','F','a','b','c']
Carson,Ed ['T','F','T','a','b','b']

Let's run it!

linuxserver1.cs.umbc.edu[205] python commandLine.py answers.txt grades.out
linuxserver1.cs.umbc.edu[206]

Here's grades.out :

Barnes,Beth     6
Carson,Ed       3

Since I chose to use a dictionary, things became out of order immediately. Therefore, getting a list of keys and sorting them was necessary to get the roster back into sorted order by the students' last names.

Command-line argument Exercise

Write a program that will add values passed in as command-line arguments and will print their sum. The user may enter as many values as they choose on the command line.

Sorting II & Command-line arguments