Porting Python Code

Introduction

As of January 2015, WindRiver python code has been made compliant with both python2 or python3 interpreters. This document tries to list the main problems we faced when migrating.

Many sites already list what to do, depending on the philosophy of the migration.

We decided to adopt a python code which is compliant with both python2 and python3, using the 2to3 utility.

Before using 2to3

The tool introduced some errors:

print

When using print as a statement, the output file is empty. A simple file containing:

def myprint(message):
    print '** ' + str(message) + '**'

Is emptied by the 2to3 tool, and only contains:

Non

To fix that, first change all the print calls to be function calls and not statements :

def myprint(message):
    print('** ' + str(message) + '**')

This should be enough.

raise

The issue intriduced by 2to3 is about the opposite to the print problem. The raise calls are statements, not functions. Removing the parenthesis when calling raise should fix it. If you do not, then the followind code:

raise(MyException.message('My message'))

Is truncated to:

raise MyException

by 2to3. To solve that, better write:

raise MyException.message('My message')

Running 2to3

After running 2to3 on the python tree, the major changes are:

Old code New code Potential issues
unicode() str() Unicode disappeared
dict.keys() list(dict.keys()) Line too long
cStringIO io.StringIO String encoding
StandardError Exception  
long int Data type check
basestring str Data type check
xrange range  
__nonzero__(self): __bool__(self): NonZero disappeared

Issues

Unicode disappeared

The unicode type disappeared. So calling for either

>>> unicode(x)

or

>>> isinstance(x, unicode)

Is not possible with python 3. One of the main issues it raises, is the use of StringIO objects.

Python 2 Python 3
>>> import io
>>> f = io.StringIO()
>>> f.write('toto')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unicode argument expected, got 'str'
>>> f.write('toto'.decode('utf-8'))
4L
>>> import io
>>> f = io.StringIO()
>>> f.write('toto')
4
>>>  f.write('toto'.decode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

To write compatible code, a super class of StringIO must be created:

>>> class CompatStringIO(io.StringIO):
...     def write(self, s):
...         if hasattr(s, 'decode'):
...             # Should be python 2
...             return int(super(CompatStringIO, self).write(s.decode('utf-8')))
...         else:
...             return super(CompatStringIO, self).write(s)
...     def getvalue(self):
...         return str(super(CompatStringIO, self).getvalue())
...
>>> f = CompatStringIO()
>>> f.write('mine')
4
>>> f.getvalue()
'mine'
>>> f.close()

Line too long

After the changes, some of the lines may be too long to be PEP8 compliant, some code modifications may be needed to remain PEP8.

String encoding

This was the major threat. Just to explain it shortly :

Python 2 Python 3 Compatible code
>>> e = bytearray([55, 56, 57])
>>> e
bytearray(b'789')
>>> str(e)
'789'
>>>
>>> e = bytearray([55, 56, 57])
>>> e
bytearray(b'789')
>>> str(e)
"bytearray(b'789')"
>>>
>>> e = bytearray([55, 56, 57])
>>> e
bytearray(b'789')
>>> str(e.decode())
'789'
>>>

To have the same behaviour, calling for decode() is needed, and it works for both python 2 and 3.

Data type check

In some cases, data manipulation may depend on the data type. As some types have disappeared in python 3 (long, unicode, basestring …), checking for the data type needs some attention.

Running 2to3 converts the following Python 2 code into python 3

Python 2 Python 3
>>> isinstance(u'mystr', basestring)
>>> isinstance(u'mystr', str)

But the behavior to that code (with a python 2 interpreter) is different:

Python 2 Python 3
>>> isinstance(u'mystr', basestring)
True
>>> isinstance(u'mystr', str)
False
>>> isinstance(u'mystr', basestring)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'basestring' is not defined
>>> isinstance(u'mystr', str)
True

To make this work for both python 2 and 3, and keep the same behavior, a small code checking the interpreter version may be needed :

>>> import sys
>>> if sys.version_info[0] >= 3:
...     strings = (str,)
...     ints = (int,)
... else:
...     ints = (int, long)
...     strings = (basestring,)
...
>>> isinstance(u'mystr', strings)
True

The same poblem is true for int and long values, which ma be gathered in an ints variable.

NonZero disappeared

With python 2, the object method called by the bool() function is object.__nonzero__(). This has disappeared with python 3, and the preferred method called by bool() is now object.__bool__()

Running 2to3 replaces the object.__nonzero__() with object.__bool__(), which is a problem when using a python 2 interpreter.

Python 2 Python 3
>>> class Test(object):
...     def __init__(self):
...         self._value = 0
...     def __nonzero__(self):
...         return self._value != 0
...
>>> bool(Test())
False
>>> class Test(object):
...     def __init__(self):
...         self._value = 0
...     def __nonzero__(self):
...         return self._value != 0
...
>>> bool(Test())
True

In order to write code compatible with both python 2 and 3, both __nonzero__() and __bool__() should be present.

>>> class Test(object):
...     def __init__(self):
...         self._value = 0
...     def __nonzero__(self):
...         return self._value != 0
...     def __bool__(self):
...         return self.__nonzero__()
...
>>> bool(Test())
False

Cmp Disappeared

With python3, the cmp() function has disappeared along with the associated __cmp__(self, other) methods. It should be replaced with calls to __lt__(self, other). So, for objects implementing the __cmp__(self, other) method, the comparison is broken as shown below:

Python 2 Python 3
>>> class Test(object):
...     def __init__(self):
...         self._value = 0
...     def __cmp__(self, other):
...         if self._value < other._value:
...             return -1
...         elif self._value > other._value:
...             return 1
...         return 0
...
>>> a = Test()
>>> # comparison tests
... b = Test()
>>> b._value = 1
>>> a < b
True
>>> class Test(object):
...     def __init__(self):
...         self._value = 0
...     def __cmp__(self, other):
...         if self._value < other._value:
...             return -1
...         elif self._value > other._value:
...             return 1
...         return 0
...
>>> a = Test()
>>> # comparison tests
... b = Test()
>>> b._value = 1
>>> a < b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: Test() < Test()

A possible fix to have this class able to be compared with both python 2 and 3, is to also have the __lt__(self, other) method in the object:

>>> class Test(object):
...     def __init__(self):
...         self._value = 0
...     def __cmp__(self, other):
...         if self._value < other._value:
...             return -1
...         elif self._value > other._value:
...             return 1
...         return 0
...     def __lt__(self, other):
...         return self.__cmp__(other) == -1
...
>>> a = Test()
>>> # comparison tests
... b = Test()
>>> b._value = 1
>>> a < b
True

Division returns float values

When dividing an integer, python2 was returning an integer, this is not true with python3, which always returns a float value. This may be problematic when using a division for iterable index:

Python 2 Python 3
>>> l = [1, 2, 3]
>>> a = 3/2
>>> a
1
>>> l[a]
2
>>> l = [1, 2, 3]
>>> a = 3/2
>>> a
1.5
>>> l[a]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not float

The beast way to have this code working for both python 2 and 3, is to call for the int function.

>>> l = [1, 2, 3]
>>> a = int(3/2)
>>> a
1
>>> l[a]
2

Conclusion

Writing code which is compatible for both python 2 an 3 interpreters is not that simple, it points out a lot of issues when using string conversions. Those issues may be hiding deeper problems, so doing the effort is time sonsuming, but it may unveil problems which would take ages to investigate.

A common way to solve the biggest threats, is to write a compat.py file, and define there the common types:

import io
import sys


class CompatStringIO(io.StringIO):
    def write(self, s):
        if hasattr(s, 'decode'):
            return int(super(CompatStringIO, self).write(s.decode('utf-8')))
        else:
            return super(CompatStringIO, self).write(s)

    def getvalue(self):
        return str(super(CompatStringIO, self).getvalue())


if sys.version_info[0] >= 3:
    import builtins
    import queue as queuemodule
    strings = (str,)
    ints = (int,)
else:
    import __builtin__ as builtins
    import Queue as queuemodule
    ints = (int, long)
    strings = (basestring,)


def bytes2str(data):
    if data and isinstance(data[0], int):
        return ''.join('%c' % (chr(b)) for b in data)
    elif data:
        return str(data)


StringIO = CompatStringIO
builtinlist = builtins.list
queue = queuemodule

Using this compat.py module may then look like:

from compat import StringIO

b = StringIO()
b.write('mine')