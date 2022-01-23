Splits a hostname into subdomains, domain and (effective) top-level domains.
Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains. In order to do that parse-domain uses a large list of known top-level domains from publicsuffix.org:
import { parseDomain, ParseResultType } from "parse-domain";
const parseResult = parseDomain(
// This should be a string with basic latin letters only.
// More information below.
"www.some.example.co.uk"
);
// Check if the domain is listed in the public suffix list
if (parseResult.type === ParseResultType.Listed) {
const { subDomains, domain, topLevelDomains } = parseResult;
console.log(subDomains); // ["www", "some"]
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
} else {
// Read more about other parseResult types below...
}
This package has been designed for modern Node and browser environments with ECMAScript modules support. It assumes an ES2015 environment with
Symbol(),
URL() and
TextDecoder() globally available. You need to transpile it down to ES5 (e.g. by using Babel) if you need to support older environments.
The list of top-level domains is stored in a trie data structure and serialization format to ensure the fastest lookup and the smallest possible library size.
npm install parse-domain
💡 Please note: publicsuffix.org is updated several times per month. This package comes with a prebuilt list that has been downloaded at the time of
npm publish. In order to get an up-to-date list, you should run
npx parse-domain-update everytime you start or build your application. This will download the latest list from
https://publicsuffix.org/list/public_suffix_list.dat.
⚠️
parseDomain does not parse whole URLs. You should only pass the puny-encoded hostname section of the URL:
|❌ Wrong
|✅ Correct
https://user@www.example.com:8080/path?query
www.example.com
münchen.de
xn--mnchen-3ya.de
食狮.com.cn?query
xn--85x722f.com.cn
There is the utility function
fromUrl which tries to extract the hostname from a (partial) URL and puny-encodes it:
import { parseDomain, fromUrl } from "parse-domain";
const { subDomains, domain, topLevelDomains } = parseDomain(
fromUrl("https://www.münchen.de?query")
);
console.log(subDomains); // ["www"]
console.log(domain); // "xn--mnchen-3ya"
console.log(topLevelDomains); // ["de"]
// You can use the 'punycode' NPM package to decode the domain again
import { toUnicode } from "punycode";
console.log(toUnicode(domain)); // "münchen"
fromUrl parses the URL using
new URL(). Depending on your target environments you need to make sure that there is a polyfill for it. It's globally available in all modern browsers (no IE) and in Node v10.
When parsing a hostname there are 5 possible results:
parseDomain returns a
ParseResult with a
type property that allows to distinguish these cases.
The given input is first validated against RFC 3696 (the domain labels are limited to basic latin letters, numbers and hyphens). If the validation fails,
parseResult.type will be
ParseResultType.Invalid:
import { parseDomain, ParseResultType } from "parse-domain";
const parseResult = parseDomain("münchen.de");
console.log(parseResult.type === ParseResultType.Invalid); // true
Check out the API if you need more information about the validation error.
If you don't want the characters to be validated (e.g. because you need to allow underscores in hostnames), there's also a more relaxed validation mode (according to RFC 2181).
import { parseDomain, ParseResultType, Validation } from "parse-domain";
const parseResult = parseDomain("_jabber._tcp.gmail.com", {
validation: Validation.Lax,
});
console.log(parseResult.type === ParseResultType.Listed); // true
See also #134 for the discussion.
If the given input is an IP address,
parseResult.type will be
ParseResultType.Ip:
import { parseDomain, ParseResultType } from "parse-domain";
const parseResult = parseDomain("192.168.2.1");
console.log(parseResult.type === ParseResultType.Ip); // true
console.log(parseResult.ipVersion); // 4
It's debatable if a library for parsing domains should also accept IP addresses. In fact, you could argue that
parseDomain should reject an IP address as invalid. While this is true from a technical point of view, we decided to report IP addresses in a special way because we assume that a lot of people are using this library to make sense from an arbitrary hostname (see #102).
There are 5 top-level domains that are not listed in the public suffix list but reserved according to RFC 6761 and RFC 6762:
localhost
local
example
invalid
test
In these cases,
parseResult.type will be
ParseResultType.Reserved:
import { parseDomain, ParseResultType } from "parse-domain";
const parseResult = parseDomain("pecorino.local");
console.log(parseResult.type === ParseResultType.Reserved); // true
console.log(parseResult.labels); // ["pecorino", "local"]
If the given hostname is valid, but not listed in the downloaded public suffix list,
parseResult.type will be
ParseResultType.NotListed:
import { parseDomain, ParseResultType } from "parse-domain";
const parseResult = parseDomain("this.is.not-listed");
console.log(parseResult.type === ParseResultType.NotListed); // true
console.log(parseResult.labels); // ["this", "is", "not-listed"]
If a domain is not listed, it can be caused by an outdated list. Make sure to update the list once in a while.
⚠️ Do not treat parseDomain as authoritative answer. It cannot replace a real DNS lookup to validate if a given domain is known in a certain network.
Technically, the term top-level domain describes the very last domain in a hostname (
uk in
example.co.uk). Most people, however, use the term top-level domain for the public suffix which is a namespace "under which Internet users can directly register names".
Some examples for public suffixes:
com in
example.com
co.uk in
example.co.uk
co in
example.co
com.co in
example.com.co
If the hostname is listed in the public suffix list, the
parseResult.type will be
ParseResultType.Listed:
import { parseDomain, ParseResultType } from "parse-domain";
const parseResult = parseDomain("example.co.uk");
console.log(parseResult.type === ParseResultType.Listed); // true
console.log(parseResult.labels); // ["example", "co", "uk"]
Now
parseResult will also provide a
subDomains,
domain and
topLevelDomains property:
const { subDomains, domain, topLevelDomains } = parseResult;
console.log(subDomains); // []
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
parseResult.type to distinguish between different parse results
We recommend switching over the
parseResult.type:
switch (parseResult.type) {
case ParseResultType.Listed: {
const { hostname, topLevelDomains } = parseResult;
console.log(`${hostname} belongs to ${topLevelDomains.join(".")}`);
break;
}
case ParseResultType.Reserved:
case ParseResultType.NotListed: {
const { hostname } = parseResult;
console.log(`${hostname} is a reserved or unknown domain`);
break;
}
default:
throw new Error(`${hostname} is an ip address or invalid domain`);
}
What's surprising to a lot of people is that the definition of public suffix means that regular user domains can become effective top-level domains:
const { subDomains, domain, topLevelDomains } = parseDomain(
"parse-domain.github.io"
);
console.log(subDomains); // []
console.log(domain); // "parse-domain"
console.log(topLevelDomains); // ["github", "io"] 🤯
In this case,
github.io is nothing else than a private domain name registrar.
github.io is the effective top-level domain and browsers are treating it like that (e.g. for setting
document.domain).
If you want to deviate from the browser's understanding of a top-level domain and you're only interested in top-level domains acknowledged by ICANN, there's an
icann property:
const parseResult = parseDomain("parse-domain.github.io");
const { subDomains, domain, topLevelDomains } = parseResult.icann;
console.log(subDomains); // ["parse-domain"]
console.log(domain); // "github"
console.log(topLevelDomains); // ["io"]
domain can also be
undefined
const { subDomains, domain, topLevelDomains } = parseDomain("co.uk");
console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // ["co", "uk"]
"" is a valid (but reserved) domain
The empty string
"" represents the DNS root and is considered to be valid.
parseResult.type will be
ParseResultType.Reserved in that case:
const { type, subDomains, domain, topLevelDomains } = parseDomain("");
console.log(type === ParseResultType.Reserved); // true
console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // []
🧩 = JavaScript export
🧬 = TypeScript export
export parseDomain(hostname: string | typeof NO_HOSTNAME, options?: ParseDomainOptions): ParseResult
Takes a hostname (e.g.
"www.example.com") and returns a
ParseResult. The hostname must only contain basic latin letters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.
import { parseDomain } from "parse-domain";
const parseResult = parseDomain("www.example.com");
Use
Validation.Lax if you want to allow all characters:
import { parseDomain, Validation } from "parse-domain";
const parseResult = parseDomain("_jabber._tcp.gmail.com", {
validation: Validation.Lax,
});
export fromUrl(input: string): string | typeof NO_HOSTNAME
Takes a URL-like string and tries to extract the hostname. Requires the global
URL constructor to be available on the platform. Returns the
NO_HOSTNAME symbol if the input was not a string or the hostname could not be extracted. Take a look at the test suite for some examples. Does not throw an error, even with invalid input.
export NO_HOSTNAME: unique symbol
NO_HOSTNAME is a symbol that is returned by
fromUrl when it was not able to extract a hostname from the given string. When passed to
parseDomain, it will always yield a
ParseResultInvalid.
export type ParseDomainOptions
export type ParseDomainOptions = {
/**
* If no validation is specified, Validation.Strict will be used.
**/
validation?: Validation;
};
export Validation
An object that holds all possible Validation
validation values:
export const Validation = {
/**
* Allows any octets as labels
* but still restricts the length of labels and the overall domain.
*
* @see https://www.rfc-editor.org/rfc/rfc2181#section-11
**/
Lax: "LAX",
/**
* Only allows ASCII letters, digits and hyphens (aka LDH),
* forbids hyphens at the beginning or end of a label
* and requires top-level domain names not to be all-numeric.
*
* This is the default if no validation is configured.
*
* @see https://datatracker.ietf.org/doc/html/rfc3696#section-2
*/
Strict: "STRICT",
};
export Validation
This type represents all possible
validation values.
export ParseResult
A
ParseResult is either a
ParseResultInvalid,
ParseResultIp,
ParseResultReserved,
ParseResultNotListed or
ParseResultListed.
All parse results have a
type property that is either
"INVALID",
"IP",
"RESERVED",
"NOT_LISTED"or
"LISTED". Use the exported ParseResultType to check for the type instead of checking against string literals.
All parse results also have a
hostname property that provides access to the sanitized hostname that was passed to
parseDomain.
export ParseResultType
An object that holds all possible ParseResult
type values:
const ParseResultType = {
Invalid: "INVALID",
Ip: "IP",
Reserved: "RESERVED",
NotListed: "NOT_LISTED",
Listed: "LISTED",
};
export ParseResultType
This type represents all possible ParseResult
type values.
export ParseResultInvalid
Describes the shape of the parse result that is returned when the given hostname does not adhere to RFC 1034:
type ParseResultInvalid = {
type: ParseResultType.INVALID;
hostname: string | typeof NO_HOSTNAME;
errors: Array<ValidationError>;
};
// Example
{
type: "INVALID",
hostname: ".com",
errors: [...]
}
export ValidationError
Describes the shape of a validation error as returned by
parseDomain
type ValidationError = {
type: ValidationErrorType;
message: string;
column: number;
};
// Example
{
type: "LABEL_MIN_LENGTH",
message: `Label "" is too short. Label is 0 octets long but should be at least 1.`,
column: 1,
}
export ValidationErrorType
An object that holds all possible ValidationError
type values:
const ValidationErrorType = {
NoHostname: "NO_HOSTNAME",
DomainMaxLength: "DOMAIN_MAX_LENGTH",
LabelMinLength: "LABEL_MIN_LENGTH",
LabelMaxLength: "LABEL_MAX_LENGTH",
LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
LastLabelInvalid: "LAST_LABEL_INVALID",
};
export ValidationErrorType
This type represents all possible
type values of a ValidationError.
export ParseResultIp
This type describes the shape of the parse result that is returned when the given hostname was an IPv4 or IPv6 address.
type ParseResultIp = {
type: ParseResultType.Ip;
hostname: string;
ipVersion: 4 | 6;
};
// Example
{
type: "IP",
hostname: "192.168.0.1",
ipVersion: 4
}
According to RFC 3986, IPv6 addresses need to be surrounded by
[ and
] in URLs.
parseDomain accepts both IPv6 address with and without square brackets:
// Recognized as IPv4 address
parseDomain("192.168.0.1");
// Both are recognized as proper IPv6 addresses
parseDomain("::");
parseDomain("[::]");
export ParseResultReserved
This type describes the shape of the parse result that is returned when the given hostname
"")
localhost,
local,
example,
invalid or
test
type ParseResultReserved = {
type: ParseResultType.Reserved;
hostname: string;
labels: Array<string>;
};
// Example
{
type: "RESERVED",
hostname: "pecorino.local",
labels: ["pecorino", "local"]
}
⚠️ Reserved IPs, such as
127.0.0.1, will not be reported as reserved, but as
ParseResultIp. See #117.
export ParseResultNotListed
Describes the shape of the parse result that is returned when the given hostname is valid and does not belong to a reserved top-level domain, but is not listed in the downloaded public suffix list.
type ParseResultNotListed = {
type: ParseResultType.NotListed;
hostname: string;
labels: Array<string>;
};
// Example
{
type: "NOT_LISTED",
hostname: "this.is.not-listed",
labels: ["this", "is", "not-listed"]
}
export ParseResultListed
Describes the shape of the parse result that is returned when the given hostname belongs to a top-level domain that is listed in the public suffix list.
type ParseResultListed = {
type: ParseResultType.Listed;
hostname: string;
labels: Array<string>;
subDomains: Array<string>;
domain: string | undefined;
topLevelDomains: Array<string>;
icann: {
subDomains: Array<string>;
domain: string | undefined;
topLevelDomains: Array<string>;
};
};
// Example
{
type: "LISTED",
hostname: "parse-domain.github.io",
labels: ["parse-domain", "github", "io"]
subDomains: [],
domain: "parse-domain",
topLevelDomains: ["github", "io"],
icann: {
subDomains: ["parse-domain"],
domain: "github",
topLevelDomains: ["io"]
}
}
MIT
When I needed validation of the domain address given by user, I gave this library a go and it was a perfect fit. Has good examples shown in the package.