Google

2011-09-02

Guess which Python string find method is faster?

I came across a question on finding which of the two simple string find methods was faster. So, let's play a game. All we are trying to determine is whether a single character ('ch') passed to our function is lowercase or not. Can you guess which method will be fastest out of these four?


# check result of string find function

def is_lower1(ch):

    return (string.find(string.lowercase, ch) !=-1)

## compare the string char to lower case version of it

def is_lower2(ch):   

    return (ch.lower() == ch)

# check string char against all lowercase chars

def is_lower3(ch):

    return (ch in string.lowercase)

# check the char against the lowercase boundries

def is_lower4(ch):

    return 'a' <= ch <= 'z'

Clearly, you can guess the first one will be the sore loser. It is using a string function (string.find) on all the possible lowercase characters (string.lowercase) to check if the passed character matches one. 'Find' Function will return -1, if it cannot find the passed character, that's why result is compared against '-1'. OK, but how about the rest?

is_lower2 function is also using a string function (lower) to lower only the passed character and is then comparing it against its original value. So, basically there are two operations here, but no iteration as in find.

is_lower3 is using 'in' operator against all possible lower case values. So, our string operation here is to list all possible values with (string.lowercase). Is this faster than is_lower2?

is_lower4 is comparing the passed character against the boundaries of lower case letters. There are no iterations or string operations as before but two comparison operations. That should be fast, right? Note that we are using Ascii characters here for comparison. If you print string.lowercase, 'z' is not the last character, it's '\xff' which looks like 'y' with two dots over it on my PC, but be assured that results are not affected any noticeable way.

So, let's timeit :

if __name__ == '__main__':

    import string

    from timeit import Timer

    t = Timer("is_lower1('A')", "from __main__ import is_lower1")

    print "is_lower1 result: %f" % t.timeit()

    t = Timer("is_lower2('A')", "from __main__ import is_lower2")

    print "is_lower2 result: %f" % t.timeit()

    t = Timer("is_lower3('A')", "from __main__ import is_lower3")

    print "is_lower3 result: %f" % t.timeit()

    t = Timer("is_lower4('A')", "from __main__ import is_lower4")

    print "is_lower4 result: %f" % t.timeit()


You probably guessed it but here are results to prove our hunch on which string search method is faster:

is_lower1 result: 0.957694

is_lower2 result: 0.322355

is_lower3 result: 0.256491

is_lower4 result: 0.201267

Did you guess it right?

No comments: