ARSC Hackers Club
Issue 1. May 2007
Theory
Introduction
We had discussed a few basic (and not so basic) things in Python during this month. I discussed lists, strings and dictionaries, Don Bahls also told about lists and strings, and Ed Kornkven covered an exciting topic of polymorphic features of dynamically typed languages like Python. We also had a lot of contributions and suggestions from all attendees.
Below is a summary of what have been discussed. I omitted all “boring” things and try to emphasis only on very specific pythonish features that usually not very evident. By the way, it is not a Python tutorial!
Lists
One of the most important data type in Python is list. The examples of lists:
lst2 = ['a','b','cde','fg'] # list of strings
It is very important to notice, that lists in Python are not lists! They are dynamical arrays. This means that cost of element access is O(1) as well as length determination. However, an insertion is O(n) operation. This means that many typical functional techniques that are so effective in some other languages like Common Lisp or Haskell cannot be optimized in Python. For example, Python would never has a tail recursion optimization. That means, among other consequences, that in spite of Python does have a recursion, it never really encourages to use it.
List elements can be accessed by their indexes starting from
0. Thus, in the previous example, lst1[3] is 4. There are some
important tips and tricks:
lst1[-1] is the last element of the
list;
lst1[3:5] is the list with elements number 3 and 4 from
the lst1 (the last element is not included);
lst1[1:] a list that has all elements of lst1 but the first
one; lst1[:-1] a list that has all elements of lst1 but the last one.
If you send a list to a function, it can modify it. If you do not want such a side effect, copy the list first. However, you should take into account that for lists a and b
b = a[::] # this is real copying. b is a copy of a now
b = [ x for x in a ] # b is a copy of a as well as above
List comprehensions
Lists are often the most convent way to deal with data. One of the reason are list comprehensions. This is a Haskell idea. The only difference is that Haskell has a mathematical notation while Python has a more human language notation. Personally, I like Haskell syntax better... There are some examples of list comprehensions below.
lst = [ x for x in range(0,101) if x % 2 == 0 ] # it is [0,2,4,...,100]
lst = [ (x,y) for x in range(0,3) for y in range(0,3) if x != y ]
# it is [(0,1), (0,2), (1,0), (1,2), (2,0), (2,1)]
Very often, it is much clearer and shorter to write a list comprehension than a loop. More about this later, see problems.
Relations between strings, lists and dictionaries will be described in the next issue. Polymorphism and dynamic typing by Ed Kornkven will be also included into the next issue. Let us consider a couple of neat problems now.
Problems
Digits
Let x and y be 3-digit numbers (like 123, or 325). Find all couples (x,y) such that
- x+y is a 4-digit number;
- digits of x, y, and x+y are all different (that means they cover the whole range from 0 to 9).
Write a standalone program that would print all such pairs and their total number.
An example of such numbers: 357 + 849 = 1206
Origin: somewhere in Internet mathematical forums.
The contest was to write a standalone program that prints all pairs and the total number of pairs on screen as fast as possible (1) in Python (2) in anything (free style).
- Starightforward solution
The problem can be solved by a very clear approach using python list comprehensions:
Program Digits-01def digits (x):
"Returns the list of all digits in natural number x (x>0)"
if x < 10:
return [x]
else:
return digits(x/10) + [x%10]
def are_all_digits_different (lst):
"Returns True if all numbers are different in the list lst"
if lst == []: return True
else:
if lst.pop() in lst:
return False
else:
return are_all_digits_different(lst)
def list_of_all_pairs():
'''Returns the list of all pairs of 3-digit numbers such that sum of
these numbers is 4-digit number and all 10 digits together are
different. That means they cover the whole range from 0 to 9'''
return [ (x,y,x+y) for x in range(101,877) for y in range (501,988)
if x < y and x+y>1022 and
are_all_digits_different( digits(x)
+ digits(y) + digits(x+y) ) ]
if __name__ == "__main__":
lst = list_of_all_pairs()
for el in lst:
print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
print len(lst)
This is a brute force approach like just do, do not think way even in spite of (or especially) functional spirit of the whole program. The most interesting is list_of_all_pairs function. It does not describe how to get the numbers but rather describe what we do need to get. This is the list of (x,y,x+y) where x<y, all digits are different and number of digits of x+y is equal to 4 (x+y>1022). Looks neat, works slow.
You might noticed that x range is ended with 877 (that means 876, actually). That is because for bigger numbers we have either repeting digits in x or there is 9 present. However, y>x; then y will start from 9. That means we will have digit 9 at least twice. This simple consideration allow us to shorten the range.
I believe that other functions are pretty clear and the code describes itself better than words.
- Regular expressions solution
There is a neat idea by James Halliday to use regular expressions for defining if all digits are different and strings for numbers when it is more convenient. In this case we do not need digits function at all:
Program Digits-02import re
expr = re.compile(r'(.).*\1')
def are_all_digits_different(x,y,z):
return not expr.search( "%d%d%d" % (x,y,z) )
def list_of_all_pairs():
return [ (x,y,x+y) for x in range(101,877) for y in range (501,988)
if x < y and x+y > 1022
and are_all_digits_different( x,y,x+y ) ]
if __name__ == "__main__":
lst = list_of_all_pairs()
for el in lst:
print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
print len(lst)
The key is a regular expression '(.).*\1' which means take first character and find some same character somewhere farther in the string (\1). That is, the subgroup submatching. If there is any of such an expression in the string, then the string contains two same digits. It is interesting to note, that '(\d)\d*\1' expression where we match digits instead of any characters seems to work slower. This solution still a kind of brute force. Now it is time to think better.
- Mathematical trick
Python, as well as many other languages, evaluates logical expressions lazily. That means that if there are expressions connected by 'and' statement, then Python calculates the first expression and if that meaning is false, it does not calculate the rest. It just returns False. We could use it to filter numbers by additional filter in regular expression. That is the key idea: make a filter which is faster than some of condition and use it first to cut most of numbers. Thus, we want to find a necessary condition which would cut most of numbers before they hit 'are_all_digits_different' function. This idea is useful in many other problems as well as here. Can we find such a condition here? Let us try.
Assume x contains digits ax, bx, and cx; y contains digits ay, by, and cy; and z = x + y contains digits dz, az, cb, and cz. Then we have,
x = 100ax + 10bx + cx ,y = 100ay + 10by + cy ,z = 1000dz + 100az + 10bz+ cz .We have
x+y+z = 1000dz + 100(az + ax + ay) + 10(bz+ bx + by)+ cz + cx + cy= 999dz + 99(az + ax + ay) + 9(bz+ bx + by)+ ax + bx + cx+ ay + by + cy+ az + bz + cz+ dz= 9M + 45,where M is some natural number and 45 = 0 + 1 + 2... + 9 is the sum of all digits from 0 to 9. That is always true if we consider only such numbers as described in the problem condition.That is it! We found an excellent filter that will remove 8/9 of all numbers before they hit a time consuming 'are_all_digits_different' function. x + y + z = 2(x + y) must devide 9 if all digits are different. That gives us a much smarter version of Program Digits-02:
Program Digits-03import re
expr = re.compile(r'(.).*\1')
def are_all_digits_different(x,y,z):
return not expr.search( "%d%d%d" % (x,y,z) )
def list_of_all_pairs():
return [ (x,y,x+y) for x in range(101,877) for y in range (501,988)
if x < y and x+y > 1022 and (2*(x+y))%9 == 0
and are_all_digits_different( x,y,x+y ) ]
if __name__ == "__main__":
lst = list_of_all_pairs()
for el in lst:
print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
print len(lst)
We may try to improve even this version by using some tricks with numbers instead of using pretty slow regular expressions and conversions from digits to strings. We also will play with ranges to improve the usage of our knowledge that x < y. Our last version of this approach looks as following:
Program Digits-04def digits10(x,y,z):
return [x//100, (x%100)//10, x%10, y//100, (y%100)//10,
y%10, z//1000, (z%1000)//100, (z%100)//10, z%10]
def are_all_digits_different(lst):
while lst:
e = lst.pop()
if e in lst: return False
return True
def list_of_all_pairs():
return [ (x,y,x+y) for x in range(103,876)
for y in range ( max(501,x,1023-x), 988 )
if (2*(x+y)) % 9 == 0
and are_all_digits_different( digits10(x,y,x+y) ) ]
if __name__ == "__main__":
lst = list_of_all_pairs()
for el in lst:
print "%d + %d = %d" % (el[0],el[1],el[0]+el[1])
print len(lst)
- "See from another side" approach
However, the winner is Don Bahls. He decided to see from another side: why should we check thousand of thousand number if we have just 10 digits to compose those numbers? He also find a neat application of Python sets for this problem, to remove all repetitions of digits and consider only necessary numbers from the beginning. Well, the code looks ugly and you probably would prefer something easier to read. However, the code below is the surest winner of our small contest!
Program Digits-05
def find_them_fast_elimination():
# The Process:
# ====================
# a2 a1 a0 + b2 b1 b0 = c3 c2 c1 c0
#
# c3 is 1 (two three digit numbers cannot add up to more than 1998=999+999)
#
# a0 is in set([0,2,3,4,5,6,7,8,9])
# b0 is in set([0,2,3,4,5,6,7,8,9]) - set([a0])
# c0 = (a0 + b0) % 10
# r0 = (a0 + b0 ) // 10
# if c0 is in set([0,2,3,4,5,6,7,8,9]) - set([a0, b0]) then continue...
#
all=set([0,2,3,4,5,6,7,8,9])
checks=0
found=0
for a0 in all:
for b0 in all - set([a0]):
c0=(a0+b0)%10
r0=(a0+b0)//10
if (all - set([a0, b0])) & set([c0]) == set([c0]):
for a1 in all - set([a0, b0, c0]):
for b1 in all - set([a0, b0, c0, a1]):
c1=(a1+b1+r0)%10
r1=(a1+b1+r0)//10
if (all - set([a0, b0, c0, a1, b1])) \
& set([c1]) == set([c1]):
for a2 in all - set([a0, b0, c0, a1, b1,c1, 0]):
for b2 in all - set([a0, b0, c0, a1, b1, c1, a2, 0]):
c2=(a2+b2+r1)%10
car2=(a2+b2+r1)//10
if (all - \
set([a0, b0, c0, a1, b1, c1, a2, b2, c2])) \
== set([]):
a= 100 * a2 + 10 * a1 + a0
b= 100 * b2 + 10 * b1 + b0
checks+=1
if car2 == 1 and a<b:
print a,b, a+b
found+=1
print "total:", found
if __name__ == '__main__':
find_them_fast_elimination()
I made some profiling on my computer. The results are below in the table:
| Program | Time on my machine, seconds | |
| Digits-01 | 2.03 | |
| Digits-02 | 1.02 | |
| Digits-03 | 0.29 | |
| Digits-04 | 0.20 | |
| Digits-05 | 0.05 |
Sorting by Suffixes
Make a sorting function which takes a list of files and sorts them alphabetically by file extension so that all similar types of files are grouped together and alphabetical (sorted by name in same suffix groups).
l=[ 'main.f', 'calc_com.f', 'input.nc', 'output.nc', 'driver.c', 'getusage.c', 'rotate.f', 'invert.f' ]do something... which returns the following list.
[ 'driver.c', 'getusage.c', 'calc_com.f', 'invert.f', 'main.f', 'rotate.f', 'input.nc', 'output.nc' ]
Hint: http://wiki.python.org/moin/HowTo/Sorting
Origin: Don Bahls
The solution to Don's problem of sorting lines by suffixes and then by names within suffix groups can be solved by the following one line of Python code:
line.split('.')[:-1] ) )
The key is the key keyword. This is the function that will be applied to the list before sorting (I believe the concept is borrowed from Common Lisp where keywords like this are widely used). In our case the function make a tuple of suffix (which is line.split('.')[-1], the last element of list of parts of line, splitted by dots) and the line without suffix. Python will sort first by suffix and then, if suffixes are equal, by the line itself. That how tuples are sorted. For example, key function applied for the string "driver.c" will return tuple ( "c", ["driver"] ). You might find some better key function. Unfortunately, it is not very easy to create one. For example, the following functions are not good:
key = lambda line: ( line.split('.')[-1], line )
Note: lambda is a keyword that defines anonymous function: for example (lambda x: x*x) is a function that gets x and returns x*x.
Well, that key key keyword appeared only in Python 2.4 and higher (thanks Don for investigation). Thus, this solution is not applicable if you use older versions of Python (like default version on midnight and nelchina, use an appropriate module there if you would like newer python). In this case you need to write a longer function. Don Bahls showed how to write your own comparison function and then use it for sorting: you need to send this comparison function to sort. There is Don's code below:
def mycmp(a,b):
''' Comparison function which sorts first on 'extension' then
on prefix '''
#make a list from 'a' using '.' as the separator
at=a.split('.')
#make a list from 'b' using '.' as the separator
bt=b.split('.')
# if the extensions are different, use them for comparison.
if at[-1] > bt[-1]:
return -1
# if the extensions are same, use the prefix for comparison
if at[-1] == bt[-1]:
print '.'.join(at[:-1])
print '.'.join(bt[:-1])
return -1*cmp('.'.join(at[:-1]),'.'.join(bt[:-1]))
# otherwise swap the other way
else:
return 1
l=[ 'main.f', 'calc_com.f', 'input.nc', 'output.nc', 'driver.c', \
'getusage.c', 'rotate.f', 'invert.f' ]
print l
l.sort(mycmp)
print l
All Issues
Issue 1. May 2007Issue 2. June 2007
