PHP 7: Introducing a domain name validator and making the URL validator stricter

PHP Origami

DNS comes with a set of rules defining valid domain names. A domain name cannot exceed 255 octets (RFC 1034) and each label cannot exceed 63 octets (RFC 1035). It can contain any character (RFC 2181) but extra rules apply for hostnames (A and MX records, data of SOA and NS records): only alphanumeric ASCII characters and hyphens are allowed in labels (we’ll talk about IDNs at the end of this post), and they cannot start nor end with a hyphen.

Until now, there was no PHP’s filter validating that a given a string is a valid domain name (or hostname). Worst, FILTER_VALIDATE_URL was not fully enforcing domain name validity (this is mandatory for schemes such as http and https) and was allowing invalid URLs. FILTER_VALIDATE_URL was also lacking IPv6 host support.

<?php

// PHP 5.6.3

// Label ends with a hyphen
var_dump(filter_var('http://a-.bc.com', FILTER_VALIDATE_URL));
// string(16) "http://a-.bc.com"

// Label is more than 63 octets
var_dump(filter_var('http://toolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolong.com', FILTER_VALIDATE_URL));
// string(81) "http://toolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolong.com"

// Lack of IPv6 support
var_dump(filter_var('http://[2001:0db8:0000:85a3:0000:0000:ac1f:8001]', FILTER_VALIDATE_URL));
// bool(false)

These limitations will be fixed in PHP 7. I’ve introduced a new FILTER_VALIDATE_DOMAIN filter checking domain name and hostname validity. This new filter is now used internally by the URL validator. I also added IPv6 host support in URL validation:

<?php

// PHP 7.0.0-dev

// Validate a domain name
var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN));
// string(33) "mandrill._domainkey.mailchimp.com"

// Validate an hostname (here, the underscore is invalid)
var_dump(filter_var('mandrill._domainkey.mailchimp.com', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME));
// bool(false)

// Label ends with a hyphen
var_dump(filter_var('http://a-.bc.com', FILTER_VALIDATE_URL));
// bool(false)

// Label is more than 63 octets
var_dump(filter_var('http://toolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolongtoolong.com', FILTER_VALIDATE_URL));
// bool(false)

// Lack of IPv6 support
var_dump(filter_var('http://[2001:0db8:0000:85a3:0000:0000:ac1f:8001]', FILTER_VALIDATE_URL));
// string(48) "http://[2001:0db8:0000:85a3:0000:0000:ac1f:8001]"

There is still a big lack in PHP’s domain names and URLs handling: internationalized domain names are not supported at all in the core. I’ve already blogged about an userland workaround, but as IDNs becomes more and more popularsa core support by PHP in streams and validation is necessary. For instance, almost all french registrars support them, and even TLDs – such as the Chinese one – are available in the wild in a non-ASCII form). I’ve started a patch enabling IDN support in PHP’s streams. It works on Unix but still lacks a Windows build system. As it requires making ICU a dependency of PHP, I’ll publish a PHP RFC on this topic soon!

4 Comments

  1. Would like to mention that the following filter works well however it even works for domains without TLDs. So if you use this ‘a’ or ‘hello’ without a tld is still considered as valid domain/host names.

    I wonder if it’s possible to introduce a flag to require at least a TLD or perhaps one “subdonm” level

    Reply

    1. Hi, indeed it’s intended because both `a` and `hello` are valid domain name (and useful when configuring a local network). You can check if the domain has a TLD by checking if it contains a dot. Why not adding a flag for that! If you open a PR for that, please ping me!

      Reply

      1. I was actually thinking of adding a flag for that but not too sure how to contribute to the php source at the moment but will definitely go check it out and see if I can do it. hahaha!

        I am using this simple check at the moment:

        (bool) (filter_var($value, FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) && preg_match(‘@\.(.*[A-Za-z])@’, $value))

        Reply

Leave a Reply