This is a very old article. It has been imported from older blogging software, and the formatting, images, etc may have been lost. Some links may be broken. Some of the information may no longer be correct. Opinions expressed in this article may no longer be held.
On Usenet an often-asked question is how to programmatically determine the “domain” of a particular hostname. That is, excluding the components traditionally thought of as subdomains. As an example, groups.google.com and www.google.com both have a domain of google.com.
Invariably, one answer comes back stating that you just need to chop off everything from the front, leaving only the last two components. But then someone will chime in pointing out that groups.google.co.uk would be left as just co.uk that way, when what is really wanted is google.co.uk. And the eventual resolution of the argument will be “it just can’t be done”.
The problem is that there’s technically no difference between a domain and a subdomain: it’s simply a matter of convention. Fortunately, this issue is actually quite important to browser programmers, as it’s a key issue in cookie security: browsers must allow subdomains within a domain to share cookie data, but not allow cookies to be passed from one domain to another. And so, the Mozilla project has created the Public Suffix List, a codified list of convention.
The following PHP class can be used to download the latest Public Suffix List and store it in your temp directory, and then find the domain name for a particular host. You may use it as follows:
get_reg_domain(); // goddamn.co.uk.
$domain2 = new Domain(“british-library.uk”);
echo $domain->get_etld(); // uk.