Best way to convert string to bytes in Python 3?
There appear to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface
Which of these methods would be better or more Pythonic? Or is it just personal preference?
b = bytes(mystring, 'utf-8') b = mystring.encode('utf-8')
If you look at the docs for
bytes , it points you to
bytearray([source[, encoding[, errors]]])
Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.
Optional source parameters can be used to initialize an array in a number of different ways
If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().
If it is an integer, the array will have that size and will be initialized with null bytes.
If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.
Without an argument, an array of size 0 is created.
bytes can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.
For encoding a string, I think that
some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than
bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.
Edit: I checked the Python source. If you pass a unicode string to
bytes using CPython, it calls PyUnicode_AsEncodedString , which is the implementation of
encode ; so you're just skipping a level of indirection if you call
Also, see Serdalis' comment --
unicode_string.encode(encoding) is also more Pythonic because its inverse is
byte_string.decode(encoding) and symmetry is nice.