ARSC Hackers Club

Issue 1. May 2007

Theory

Introduction

We had discussed a few basic (and not so basic) things in Python during this month. I discussed lists, strings and dictionaries, Don Bahls also told about lists and strings, and Ed Kornkven covered an exciting topic of polymorphic features of dynamically typed languages like Python. We also had a lot of contributions and suggestions from all attendees.

Below is a summary of what have been discussed. I omitted all “boring” things and try to emphasis only on very specific pythonish features that usually not very evident. By the way, it is not a Python tutorial!

Lists

One of the most important data type in Python is list. The examples of lists:

lst1 = [1,2,3,4,5,6,7] # list of integer numbers
lst2 = ['a','b','cde','fg'# list of strings

It is very important to notice, that lists in Python are not lists! They are dynamical arrays. This means that cost of element access is O(1) as well as length determination. However, an insertion is O(n) operation. This means that many typical functional techniques that are so effective in some other languages like Common Lisp or Haskell cannot be optimized in Python. For example, Python would never has a tail recursion optimization. That means, among other consequences, that in spite of Python does have a recursion, it never really encourages to use it.

List elements can be accessed by their indexes starting from 0. Thus, in the previous example, lst1[3] is 4. There are some important tips and tricks:
lst1[-1] is the last element of the list;
lst1[3:5] is the list with elements number 3 and 4 from the lst1 (the last element is not included);
lst1[1:] a list that has all elements of lst1 but the first one; lst1[:-1] a list that has all elements of lst1 but the last one.

If you send a list to a function, it can modify it. If you do not want such a side effect, copy the list first. However, you should take into account that for lists a and b

b = a # it is not copying, b will be another name for a
b = a[::] # this is real copying. b is a copy of a now
b = [ x for x in a ] # b is a copy of a as well as above

List comprehensions

Lists are often the most convent way to deal with data. One of the reason are list comprehensions. This is a Haskell idea. The only difference is that Haskell has a mathematical notation while Python has a more human language notation. Personally, I like Haskell syntax better... There are some examples of list comprehensions below.

lst = [ 2*x for x in range(0,101) ] # it is [0,2,4,...,200]
lst = [ x for x in range(0,101) if x % 2 == 0 ] # it is [0,2,4,...,100]
lst = [ (x,y) for x in range(0,3) for y in range(0,3) if x != y ]
# it is [(0,1), (0,2), (1,0), (1,2), (2,0), (2,1)]

Very often, it is much clearer and shorter to write a list comprehension than a loop. More about this later, see problems.

Relations between strings, lists and dictionaries will be described in the next issue. Polymorphism and dynamic typing by Ed Kornkven will be also included into the next issue. Let us consider a couple of neat problems now.

Problems

Digits

Let x and y be 3-digit numbers (like 123, or 325). Find all couples (x,y) such that

  1. x+y is a 4-digit number;
  2. digits of x, y, and x+y are all different (that means they cover the whole range from 0 to 9).

Write a standalone program that would print all such pairs and their total number.

An example of such numbers: 357 + 849 = 1206

Origin: somewhere in Internet mathematical forums.

The contest was to write a standalone program that prints all pairs and the total number of pairs on screen as fast as possible (1) in Python (2) in anything (free style).

  1. Starightforward solution

    The problem can be solved by a very clear approach using python list comprehensions:

    Program Digits-01
    def digits (x):
        "Returns the list of all digits in natural number x (x>0)"
        if x < 10:
            return [x]
        else:
            return digits(x/10) + [x%10]

    def are_all_digits_different (lst):
        "Returns True if all numbers are different in the list lst"
        if lst == []: return True
        else:
            if lst.pop() in lst:
                return False
            else:
                return are_all_digits_different(lst)

    def list_of_all_pairs():
        '''Returns the list of all pairs of 3-digit numbers such that sum of
        these numbers is 4-digit number and all 10 digits together are
        different. That means they cover the whole range from 0 to 9'''

        return [ (x,y,x+y) for x in range(101,877) for y in range (501,988)
                 if x < y and x+y>1022 and
                 are_all_digits_different( digits(x)
                                           + digits(y) + digits(x+y) ) ]

    if __name__ == "__main__":
        lst = list_of_all_pairs()
        for el in lst:
            print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
        print len(lst)

    This is a brute force approach like just do, do not think way even in spite of (or especially) functional spirit of the whole program. The most interesting is list_of_all_pairs function. It does not describe how to get the numbers but rather describe what we do need to get. This is the list of (x,y,x+y) where x<y, all digits are different and number of digits of x+y is equal to 4 (x+y>1022). Looks neat, works slow.

    You might noticed that x range is ended with 877 (that means 876, actually). That is because for bigger numbers we have either repeting digits in x or there is 9 present. However, y>x; then y will start from 9. That means we will have digit 9 at least twice. This simple consideration allow us to shorten the range.

    I believe that other functions are pretty clear and the code describes itself better than words.

  2. Regular expressions solution

    There is a neat idea by James Halliday to use regular expressions for defining if all digits are different and strings for numbers when it is more convenient. In this case we do not need digits function at all:

    Program Digits-02
    import re
    expr = re.compile(r'(.).*\1')

    def are_all_digits_different(x,y,z):
        return not expr.search( "%d%d%d" % (x,y,z) )

    def list_of_all_pairs():
        return [ (x,y,x+y) for x in range(101,877) for y in range (501,988)
                 if x < y and x+y > 1022
                 and are_all_digits_different( x,y,x+y ) ]

    if __name__ == "__main__":
        lst = list_of_all_pairs()
        for el in lst:
            print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
        print len(lst)

    The key is a regular expression '(.).*\1' which means take first character and find some same character somewhere farther in the string (\1). That is, the subgroup submatching. If there is any of such an expression in the string, then the string contains two same digits. It is interesting to note, that '(\d)\d*\1' expression where we match digits instead of any characters seems to work slower. This solution still a kind of brute force. Now it is time to think better.

  3. Mathematical trick

    Python, as well as many other languages, evaluates logical expressions lazily. That means that if there are expressions connected by 'and' statement, then Python calculates the first expression and if that meaning is false, it does not calculate the rest. It just returns False. We could use it to filter numbers by additional filter in regular expression. That is the key idea: make a filter which is faster than some of condition and use it first to cut most of numbers. Thus, we want to find a necessary condition which would cut most of numbers before they hit 'are_all_digits_different' function. This idea is useful in many other problems as well as here. Can we find such a condition here? Let us try.

    Assume x contains digits ax, bx, and cx; y contains digits ay, by, and cy; and z = x + y contains digits dz, az, cb, and cz. Then we have,

    x = 100ax + 10bx + cx ,
    y = 100ay + 10by + cy ,
    z = 1000dz + 100az + 10bz+ cz .

    We have

    x+y+z = 1000dz + 100(az + ax + ay) + 10(bz+ bx + by)+ cz + cx + cy
    = 999dz + 99(az + ax + ay) + 9(bz+ bx + by)
    + ax + bx + cx+ ay + by + cy+ az + bz + cz+ dz
    = 9M + 45,
    where M is some natural number and 45 = 0 + 1 + 2... + 9 is the sum of all digits from 0 to 9. That is always true if we consider only such numbers as described in the problem condition.

    That is it! We found an excellent filter that will remove 8/9 of all numbers before they hit a time consuming 'are_all_digits_different' function. x + y + z = 2(x + y) must devide 9 if all digits are different. That gives us a much smarter version of Program Digits-02:

    Program Digits-03
    import re
    expr = re.compile(r'(.).*\1')

    def are_all_digits_different(x,y,z):
        return not expr.search( "%d%d%d" % (x,y,z) )

    def list_of_all_pairs():
        return [ (x,y,x+y) for x in range(101,877) for y in range (501,988)
                 if x < y and x+y > 1022 and (2*(x+y))%9 == 0
                 and are_all_digits_different( x,y,x+y ) ]

    if __name__ == "__main__":
        lst = list_of_all_pairs()
        for el in lst:
            print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
        print len(lst)

    We may try to improve even this version by using some tricks with numbers instead of using pretty slow regular expressions and conversions from digits to strings. We also will play with ranges to improve the usage of our knowledge that x < y. Our last version of this approach looks as following:

    Program Digits-04
    def digits10(x,y,z):
        return [x//100, (x%100)//10, x%10, y//100, (y%100)//10,
                y%10, z//1000, (z%1000)//100, (z%100)//10, z%10]

    def are_all_digits_different(lst):
        while lst:
            e = lst.pop()
            if e in lst: return False
        return True

    def list_of_all_pairs():
        return [ (x,y,x+y) for x in range(103,876)
                 for y in range ( max(501,x,1023-x), 988 )
                 if (2*(x+y)) % 9 == 0
                 and are_all_digits_different( digits10(x,y,x+y) ) ]

    if __name__ == "__main__":
        lst = list_of_all_pairs()
        for el in lst:
            print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
        print len(lst)
  4. "See from another side" approach

    However, the winner is Don Bahls. He decided to see from another side: why should we check thousand of thousand number if we have just 10 digits to compose those numbers? He also find a neat application of Python sets for this problem, to remove all repetitions of digits and consider only necessary numbers from the beginning. Well, the code looks ugly and you probably would prefer something easier to read. However, the code below is the surest winner of our small contest!

    Program Digits-05

    def find_them_fast_elimination():
        # The Process:
        # ====================
        # a2 a1 a0 + b2 b1 b0 = c3 c2 c1 c0
        #
        # c3 is 1 (two three digit numbers cannot add up to more than 1998=999+999)
        #
        # a0 is in set([0,2,3,4,5,6,7,8,9])
        # b0 is in set([0,2,3,4,5,6,7,8,9]) - set([a0])
        # c0 = (a0 + b0) % 10
        # r0 = (a0 + b0 ) // 10
        # if c0 is in set([0,2,3,4,5,6,7,8,9]) - set([a0, b0]) then continue...
        #
        all=set([0,2,3,4,5,6,7,8,9])
        checks=0
        found=0
        for a0 in all:
          for b0 in all - set([a0]):
            c0=(a0+b0)%10
            r0=(a0+b0)//10
            if (all - set([a0, b0])) & set([c0]) == set([c0]):
              for a1 in all - set([a0, b0, c0]):
                for b1 in all - set([a0, b0, c0, a1]):       
                  c1=(a1+b1+r0)%10
                  r1=(a1+b1+r0)//10
                  if (all - set([a0, b0, c0, a1, b1])) \
                         & set([c1]) == set([c1]):
                    for a2 in all - set([a0, b0, c0, a1, b1,c1, 0]):
                      for b2 in all - set([a0, b0, c0, a1, b1, c1, a2, 0]):
                        c2=(a2+b2+r1)%10
                        car2=(a2+b2+r1)//10
                        if (all - \
                            set([a0, b0, c0, a1, b1, c1, a2, b2, c2])) \
                            == set([]):
                          a=             100 * a2 + 10 * a1 + a0
                          b=             100 * b2 + 10 * b1 + b0
                          checks+=1
                          if car2 == 1 and a<b:
                            print a,b, a+b
                            found+=1     
        print "total:", found

    if __name__ == '__main__':
        find_them_fast_elimination()

I made some profiling on my computer. The results are below in the table:

 
Program   Time on my machine, seconds
Digits-01   2.03
Digits-02   1.02
Digits-03   0.29
Digits-04   0.20
Digits-05   0.05

Sorting by Suffixes

Make a sorting function which takes a list of files and sorts them alphabetically by file extension so that all similar types of files are grouped together and alphabetical (sorted by name in same suffix groups).

l=[ 'main.f', 'calc_com.f', 'input.nc', 'output.nc', 'driver.c', 'getusage.c', 'rotate.f', 'invert.f' ]
do something... which returns the following list.
[ 'driver.c', 'getusage.c', 'calc_com.f', 'invert.f', 'main.f', 'rotate.f', 'input.nc', 'output.nc' ]
Hint: http://wiki.python.org/moin/HowTo/Sorting

Origin: Don Bahls

The solution to Don's problem of sorting lines by suffixes and then by names within suffix groups can be solved by the following one line of Python code:

lst.sort( key = lambda line: ( line.split('.')[-1],
                               line.split('.')[:-1] ) )

The key is the key keyword. This is the function that will be applied to the list before sorting (I believe the concept is borrowed from Common Lisp where keywords like this are widely used). In our case the function make a tuple of suffix (which is line.split('.')[-1], the last element of list of parts of line, splitted by dots) and the line without suffix. Python will sort first by suffix and then, if suffixes are equal, by the line itself. That how tuples are sorted. For example, key function applied for the string "driver.c" will return tuple ( "c", ["driver"] ). You might find some better key function. Unfortunately, it is not very easy to create one. For example, the following functions are not good:

key = lambda line: line.split('.')[-1] + line
key = lambda line: ( line.split('.')[-1], line )

Note: lambda is a keyword that defines anonymous function: for example (lambda x: x*x) is a function that gets x and returns x*x.

Well, that key key keyword appeared only in Python 2.4 and higher (thanks Don for investigation). Thus, this solution is not applicable if you use older versions of Python (like default version on midnight and nelchina, use an appropriate module there if you would like newer python). In this case you need to write a longer function. Don Bahls showed how to write your own comparison function and then use it for sorting: you need to send this comparison function to sort. There is Don's code below:

# !/usr/bin/env python
def mycmp(a,b):
   ''' Comparison function which sorts first on 'extension' then
   on prefix '''

   #make a list from 'a' using '.' as the separator
   at=a.split('.')
   #make a list from 'b' using '.' as the separator
   bt=b.split('.')
   # if the extensions are different, use them for comparison.
   if at[-1] > bt[-1]:
      return -1
   # if the extensions are same, use the prefix for comparison
   if at[-1] == bt[-1]:
      print '.'.join(at[:-1])
      print '.'.join(bt[:-1])
      return -1*cmp('.'.join(at[:-1]),'.'.join(bt[:-1]))
   # otherwise swap the other way
   else:
      return 1
   
   l=[ 'main.f''calc_com.f''input.nc''output.nc''driver.c', \
       'getusage.c''rotate.f''invert.f' ]
   print l
   l.sort(mycmp)
   print l

All Issues

Issue 1. May 2007
Issue 2. June 2007