A robust email address syntax and deliverability validation library for Python 2.7/3.4+ by Joshua Tauberer.
This library validates that a string is of the form
firstname.lastname@example.org. This is
the sort of validation you would want for an email-based login form on
The library is NOT for validation of the To: line in an email message
My Name <email@example.com>), which
flanker is more appropriate for.
And this library does NOT permit obsolete forms of email addresses, so
if you need strict validation against the email specs exactly, use
This library was first published in 2015. The current version is 1.1.1
(posted May 19, 2020). Starting in version 1.1.0, the type of the value returned
validate_email has changed, but dict-style access to the validated
address information still works, so it is backwards compatible.
This package is on PyPI, so:
pip install email-validator
pip3 also works.
If you're validating a user's email address before creating a user account, you might do this:
from email_validator import validate_email, EmailNotValidError email = "firstname.lastname@example.org" try: # Validate. valid = validate_email(email) # Update with the normalized form. email = valid.email except EmailNotValidError as e: # email is not valid, exception message is human-readable print(str(e))
This validates the address and gives you its normalized form. You should put the normalized form in your database and always normalize before checking if an address is in your database.
When validating many email addresses or to control the timeout (the default is 15 seconds), create a caching dns.resolver.Resolver to reuse in each call:
from email_validator import validate_email, caching_resolver resolver = caching_resolver(timeout=10) while True: valid = validate_email(email, dns_resolver=resolver)
The validator will accept internationalized email addresses, but not all
mail systems can send email to an addresses with non-ASCII characters in
the local part of the address (before the @-sign). See the
The module provides a function
takes an email address (either a
str or ASCII
EmailNotValidErrorwith a helpful, human-readable error message explaining why the email address is not valid, or
When an email address is not valid,
validate_email raises either an
EmailSyntaxError if the form of the address is invalid or an
EmailUndeliverableError if the domain name does not resolve. Both
exception classes are subclasses of
EmailNotValidError, which in turn
is a subclass of
But when an email address is valid, an object is returned containing a normalized form of the email address (which you should use!) and other information.
The validator doesn't permit obsoleted forms of email addresses that no one uses anymore even though they are still valid and deliverable, since they will probably give you grief if you're using email for login. (See later in the document about that.)
The validator checks that the domain name in the email address resolves. There is nothing to be gained by trying to actually contact an SMTP server, so that's not done here. For privacy, security, and practicality reasons servers are good at not giving away whether an address is deliverable or not: email addresses that appear to accept mail at first can bounce mail after a delay, and bounced mail may indicate a temporary failure of a good email address (sometimes an intentional failure, like greylisting).
The function also accepts the following keyword arguments (default as shown):
allow_smtputf8=True: Set to
False to prohibit internationalized addresses that would
check_deliverability=True: Set to
False to skip the domain name resolution check.
allow_empty_local=False: Set to
True to allow an empty local part (i.e.
@example.com), e.g. for validating Postfix aliases.
dns_resolver=None: Pass an instance of dns.resolver.Resolver to control the DNS resolver including setting a timeout and a cache. The
caching_resolver function shown above is a helper function to construct a dns.resolver.Resolver with a LRUCache. Reuse the same resolver instance across calls to
validate_email to make use of the cache.
The email protocol SMTP and the domain name system DNS have historically only allowed ASCII characters in email addresses and domain names, respectively. Each has adapted to internationalization in a separate way, creating two separate aspects to email address internationalization.
The first is internationalized domain names (RFC
5891), a.k.a IDNA 2008. The DNS
system has not been updated with Unicode support. Instead, internationalized
domain names are converted into a special IDNA ASCII "Punycode"
form starting with
xn--. When an email address has non-ASCII
characters in its domain part, the domain part is replaced with its IDNA
ASCII equivalent form in the process of mail transmission. Your mail
submission library probably does this for you transparently. Note that
most web browsers are currently in transition between IDNA 2003 (RFC
3490) and IDNA 2008 (RFC 5891) and compliance around the web is not
in any case, so be aware that edge cases are handled differently by
different applications and libraries. This library conforms to IDNA 2008
using the idna module by Kim Davies.
The second sort of internationalization is internationalization in the local part of the address (before the @-sign). These email addresses require that your mail submission library and the mail servers along the route to the destination, including your own outbound mail server, all support the SMTPUTF8 (RFC 6531) extension. Support for SMTPUTF8 varies.
By default all internationalized forms are accepted by the validator.
But if you know ahead of time that SMTPUTF8 is not supported by your
mail submission stack, then you must filter out addresses that require
SMTPUTF8 using the
allow_smtputf8=False keyword argument (see above).
This will cause the validation function to raise a
delivery would require SMTPUTF8. That's just in those cases where
non-ASCII characters appear before the @-sign. If you do not set
allow_smtputf8=False, you can also check the value of the
field in the returned object.
If your mail submission library doesn't support Unicode at all --- even
in the domain part of the address --- then immediately prior to mail
submission you must replace the email address with its ASCII-ized form.
This library gives you back the ASCII-ized form in the
field in the returned object, which you can get like this:
valid = validate_email(email, allow_smtputf8=False) email = valid.ascii_email
The local part is left alone (if it has internationalized characters
allow_smtputf8=False will force validation to fail) and the domain
part is converted to IDNA ASCII.
(You probably should not do this at account creation time so you don't
change the user's login information without telling them.)
Note that when using Python 2.7, it is required that it was built with UCS-4 support (see here); otherwise emails with unicode characters outside of the BMP (Basic Multilingual Plane) will not validate correctly.
The use of Unicode in email addresses introduced a normalization
problem. Different Unicode strings can look identical and have the same
semantic meaning to the user. The
valid = validate_email("me@Ｄｏｍａｉｎ.com") email = valid.ascii_email print(email) # prints: email@example.com
Because an end-user might type their email address in different (but equivalent) un-normalized forms at different times, you ought to replace what they enter with the normalized form immediately prior to going into your database (during account creation), querying your database (during login), or sending outbound mail. Normalization may also change the length of an email address, and this may affect whether it is valid and acceptable by your SMTP provider.
The normalizations include lowercasing the domain part of the email address (domain names are case-insensitive), Unicode "NFC" normalization of the whole address (which turns characters plus combining characters into precomposed characters where possible, replacement of fullwidth and halfwidth characters in the domain part, possibly other UTS46 mappings on the domain part, and conversion from Punycode to Unicode characters.
For the email address
firstname.lastname@example.org, the returned object is:
ValidatedEmail( email@example.com', local_part='test', domain='joshdata.me', firstname.lastname@example.org', ascii_local_part='test', ascii_domain='joshdata.me', smtputf8=False, mx=[(10, 'box.occams.info')], mx_fallback_type=None)
For the fictitious address
example@ツ.life, which has an
internationalized domain but ASCII local part, the returned object is:
ValidatedEmail( email='example@ツ.life', local_part='example', domain='ツ.life', email@example.com', ascii_local_part='example', ascii_domain='xn--bdk.life', smtputf8=False)
False even though the domain part is
SMTPUTF8 is only needed if the
local part of the address is internationalized (the domain part can be
converted to IDNA ASCII Punycode). Also note that the
fields provide a normalized form of the email address and domain name
(casefolding and Unicode normalization as required by IDNA 2008).
validate_email with the ASCII form of the above email address,
firstname.lastname@example.org, returns the exact same information (i.e., the
For the fictitious address
ツemail@example.com, which has an
internationalized local part, the returned object is:
ValidatedEmail( email='ツfirstname.lastname@example.org', local_part='ツ-test', domain='joshdata.me', ascii_email=None, ascii_local_part=None, ascii_domain='joshdata.me', smtputf8=True)
None because the local
part of the address is internationalized. The
When an email address passes validation, the fields in the returned object are:
|The normalized form of the email address that you should put in your database. This merely combines the |
|If set, an ASCII-only form of the email address by replacing the domain part with IDNA Punycode. This field will be present when an ASCII-only form of the email address exists (including if the email address is already ASCII). If the local part of the email address contains internationalized characters, |
|The local part of the given email address (before the @-sign) with Unicode NFC normalization applied.|
|If set, the local part, which is composed of ASCII characters only.|
|The canonical internationalized Unicode form of the domain part of the email address. If the returned string contains non-ASCII characters, either the SMTPUTF8 feature of your mail relay will be required to transmit the message or else the email address's domain part must be converted to IDNA ASCII first: Use |
|The IDNA Punycode-encoded form of the domain part of the given email address, as it would be transmitted on the wire.|
|A boolean indicating that the SMTPUTF8 feature of your mail relay will be required to transmit messages to this address because the local part of the address has non-ASCII characters (the local part cannot be IDNA-encoded). If |
|A list of (priority, domain) tuples of MX records specified in the DNS for the domain (see RFC 5321 section 5). May be |
By design, this validator does not pass all email addresses that strictly conform to the standards. Many email address forms are obsolete or likely to cause trouble:
Tests can be run using
pip install -r test_requirements.txt make test
The package is distributed as a universal wheel and as a source package.
pip3 install twine rm -rf dist python3 setup.py sdist python3 setup.py bdist_wheel twine upload dist/* git tag v1.0.XXX # replace with version in setup.py git push --tags
Notes: The wheel is specified as universal in the file
setup.cfg by the
universal = 1 key in the