Looking Ahead to PHP 6

This is a very old article. It has been imported from older blogging software, and the formatting, images, etc may have been lost. Some links may be broken. Some of the information may no longer be correct. Opinions expressed in this article may no longer be held.

This is my look at what’s planned for the forthcoming revision to the PHP language.

Removal of Deprecated Features

PHP 6 includes a lot of tidying up, removing features of the language that have caused annoyance, confusion and security headaches. Although these changes are too numerous to list here, and the list will probably change before the official release, here are three of the major ones:

PHP has for some time included two different regular expression libraries: POSIX Regex and PCRE. PCRE is both faster and more capable, so in PHP 6, the POSIX Regex library will be removed from the PHP core and exist only as an optional extension.
PHP includes a feature called register_globals, which automatically creates global variables for any data provided to a script through an HTTP GET or POST request, through a cookie or set in a session. Coupled with a failure to properly initialise variables, this can create big security problems. Since PHP 4.2, it has been disabled by default, but is still available as an option. As of PHP 6, it’s gone.
“Magic quotes” is the automatic slash-escaping of incoming data. This is intended as a security feature, and if it could be relied upon to always be switched on, world work fine, albeit in a slightly ugly and kludgy way. But because it is often disabled, this feature becomes an annoyance. Well, in PHP 6 is will annoy no more. The feature is gone.

Late Static Binding

In PHP 5, the above code rather oddly will print “Hello world”. In PHP 6, it will print the more expected output of “Waaah!”

Namespaces

Namespaces were developed for PHP 6, but have now been backported to PHP 5.3, so you can start using them already (as long at you don’t need to support earlier versions of PHP!). Namespaces work using two new keywords: namespace and use.

The namespace keyword can be used once or fewer times within each PHP file. It tells the PHP interpreter that all classes, functions and constants (introduced with the const keyword; constants introduced with the define keyword are not namespace-aware) defined within that file belong to a particular namespace. The examples in the rest of this write-up refer only to functions, but the same princples apply to constants and classes.

Within a namespaced file, any function calls will attempt to call a “local” function before using the “global” function. For example, within a namespace you could define your own function called header() and this will be used in preference to the built-in PHP header() function. You can explicitly call the standard function by calling ::header().

If you want to use the function header() from namespace ACMECorp::Web::HTML then you call ACMECorp::Web::HTML::header(). Obviously this is quite long and unwieldly, so if you need to repeatedly refer to various functions from a particular namespace, you can alias it using the use keyword:

tag.
function header ($x, $y)
{
// Call the built-in PHP function “header()”.
::header(“$x: $y”);

// Call “header()” from ACMECorp::Web::HTML.
TheHTML::header($x, $y);
}
1?>

Namespaces neatly solve several problems with PHP. Firstly, the tendancy, particularly within PEAR for class names and method names to grow incredibly long. Secondly, the fact that every version of PHP tends to add even more functions, increasing the chances that there will be one with the same name as one of your own functions.

By putting your functions into a namespace, you’ll virtually eliminate the chances of naming conflicts with built-in PHP functions and third-party libraries.

Unicode

The other eagerly-anticipated feature of PHP 6 is full Unicode support. It has always been possible to use UTF-8 and other Unicode encodings in PHP, but when PHP 6 is put into Unicode mode (enabled by default, but about 25% slower overall), all strings are treated as Unicode. This means:

PHP scripts can be written in non-ASCII character sets, which will be recoded to Unicode on the fly. Functions, classes, variables, constants and so forth can safely use non-ASCII characters in their names.
String processing functions, such as strlen() will understand multi-byte characters and process them as single characters rather than multiple octets.
PHP’s case conversion functions will not only understand how to convert the characters A—Z between upper and lower cases, but also accented characters and non-Latin characters.
Alphabetical sorting will work much better across different alphabets.
Strings can be cast to a new type called binary enabling them to be treated as octet-streams for those occasional purposes where that is necessary.

What’s Staying the Same

There are still a number of messy bits of PHP that are not being cleaned up. In particular, variables are still case-sensitive while functions and classes are case-insensitive. Although the PHP developers do consider this an annoying inconsistency, they’ve decided not to fix it for PHP 6.

The “$needle, $haystack/$haystack, $needle” question remains unresolved:

Regular expression matching functions use “$needle, $haystack”;
Regular expression replacement functions use “$needle, $replacement, $haystack”;
String search functions use “$haystack, $needle”;
String manipulation functions use “$needle, $replacement, $haystack”;
Array search functions use $needle, $haystack

Recalling the parameter order for the various PHP search and replace functions requires a memory the size of an elephant, or handy access to the PHP manual. It seems that the PHP 6 developers are determined to ignore this problem.

But overall, PHP 6 looks like as much of an improvement over its predecessor as PHP 5 was over PHP 4, so I’m looking forward to being able to use it. It should be thought of as an evolution of the language though, rather than the revolution that Perl 6 will be. (More on Perl 6 in a future blog article!)