Sending emails with Unicode address headers

Unicode characters in address headers can cause email send failures. Here's how to fix it.

2020-06-18

Preface

Recently I needed to add emailing features into a product I built for a client. The client and product users are Norwegian, so some users' names contain letters that aren't represented by the ASCII character set. During testing, I discovered that emails would fail to send to these people. The quick solution was to omit names from the address headers, and everything was fine.

However, I was really curious about what was causing the problem. I did some research and found a few others with the same problem, but no solution. I was nerd sniped and dug into it a bit further on my own time.

I'm sharing my solution here in case it helps someone. I've generalised it to handle both names and email addresses in any address header. If you only want the code and not the explanation, it's at the bottom.

Introduction

Let's say we want to send an email from an imaginary organisation named Blåbærsyltetøy, which is Norwegian for "blueberry jam" and includes all three of Norway's non-ASCII letters. Let's give it the unimaginative but instructive email address blåbærsyltetøy@blåbærsyltetøy.no. Email address headers are in the form name <address>, so we want to construct Blåbærsyltetøy <blåbærsyltetøy@blåbærsyltetøy.no>. But how do we do that?

We could try to normalise Blåbærsyltetøy to "Blabaersyltetoy", but the characters æ and ø don't have ASCII equivalents according to the Unicode standard. Plus, that's not actually the correct name! We shouldn't ignore Unicode characters just because they're inconvenient.

We could also drop the name altogether and just use the address. But that's a little impersonal and unpolished, and it can be confusing if a name isn't given for the "from" or "reply-to" fields. Users will want to know where emails are coming from and where they are sending emails to.

On top of that, we still might have Unicode characters in the email address to deal with. Unlike names, those really need to be exact or else the email won't reach its destination. Ultimately, trying to avoid the problem is pointless. We're going to have to encode the address headers.

Solving the problem

I encountered this issue while building MIME email objects in Python and sending them with the Gmail API. Unicode characters worked fine in the email body and subject, but the Gmail send would fail specifically when any of the address headers contained Unicode. I tried out a bunch of suggested methods for recognising that the characters were Unicode, but it all resulted in invalid header errors, send failures, or "successful" sends that actually failed.

I realised that if Unicode was working for the email subject and body, that might mean that the address headers were not being encoded correctly. I inspected the resulting MIME objects to see what the final encoded Unicode text looked like. Then, I wrote a small Python function that would build a string in the right encoding so that I could assign that string to the MIME object directly.

Encoding names

Thankfully, the MIME standard defines an encoding method for 8-bit text that allows transmitting data using ASCII characters. It's called quoted-printable (QP) encoding. Because 8-bit text is equivalent to byte strings, we can use QP encoding to turn our Unicode names into MIME-safe ASCII.

In Python we can use the encodestring method from the quopri library. It operates on bytes, so first we have to encode our name string into a byte string, specifying that the name is in UTF-8. We then have to decode the resulting byte string into a regular string, specifying that the byte string is in ASCII. Finally, we have to wrap this string in some characters which indicate that the string consists of QP-encoded UTF-8 characters.

Here's the code for the name:

>>> name = 'Blåbærsyltetøy'
>>> name = quopri.encodestring(name.encode('utf-8')).decode('ascii')
>>> name
'Bl=C3=A5b=C3=A6rsyltet=C3=B8y'
>>> f'=?utf-8?q?{name}?='
'=?utf-8?q?Bl=C3=A5b=C3=A6rsyltet=C3=B8y?='

As you can see, each Unicode character in the name has been converted into the form =HH where HH is the hexidecimal representation of that character. These representations require two characters for each UTF-8 character because UTF-8 characters are encoded with two bytes each.

Encoding addresses

Email addresses have the form local@domain. Unicode characters in the local part are handled the same way as names. The domain part is handled differently, but luckily there's a standard for that too. RFC 3490 defines the internationalised domain names in applications (IDNA) mechanism for dealing with such domains.

First, we'll have to split the email address into its local and domain parts so that we can handle them differently. Next, we can QP encode the local part the same way we encoded the name part before. To encode the domain part, we have to IDNA encode it then decode the resulting byte string. We can then combine the two parts back together to get the encoded email address.

Here's the code for the domain:

>>> address = 'blåbærsyltetøy@blåbærsyltetøy.no'
>>> local, domain = address.split('@')
>>> domain = domain.encode('idna').decode('ascii')
>>> domain
'xn--blbrsyltety-y8ao3x.no'

Solution

Here's the complete function:

import quopri

def build_address_header(name, address, charset='utf-8'):
    def qp_encode(chars):
        chars = quopri.encodestring(chars.encode(charset)).decode('ascii')
        return f'=?{charset}?q?{chars}?='
    name = qp_encode(name)
    local, domain = address.split('@')
    local = qp_encode(local)
    domain = domain.encode('idna').decode('ascii')
    header = f'{name} <{local}@{domain}>'
    return header

Here's what it returns:

>>> build_address_header('Blåbærsyltetøy', 'blåbærsyltetøy@blåbærsyltetøy.no')
'=?utf-8?q?Bl=C3=A5b=C3=A6rsyltet=C3=B8y?= <=?utf-8?q?bl=C3=A5b=C3=A6rsyltet=C3=B8y?=@xn--blbrsyltety-y8ao3x.no>'

The resulting address header can be assigned directly to the MIME object. The name and address will display as regular Unicode in clients that support it, like Gmail.