Looking Ahead to Perl 6

This is a very old article. It has been imported from older blogging software, and the formatting, images, etc may have been lost. Some links may be broken. Some of the information may no longer be correct. Opinions expressed in this article may no longer be held.

One of the most important changes in Perl 6 over earlier versions is that it has started out as a written specification, which may end up with several different implementations. In previous versions of Perl, alternative versions had to implement all the quirks of the official Perl interpreter, as the definition of the Perl language was “whatever the Perl interpreter will interpret”, which was (of course) a moving target, as each released version introduced new features and changed existing behaviour (though usually only on the peripheries of the language). The lack of a stable written specification killed off many useful projects, such as the Perl compiler (perlcc).

The written specification is what allows me to write this article right now, as the current implementations of Perl 6 are only partial — indeed, the specification is not yet complete, but I can comment on those parts that have been written. There are way too many changes to touch on them all, but I'll try to write about some of the most important and most interesting.

Unicode

Like PHP 6, Perl 6 introduces full native support for Unicode, which means that source code can be written in UTF-8 or UTF-16 — or recoded to Unicode at compile time — and that identifiers (function names and variable names) can use non-ASCII characters.

In fact, Perl 6 goes even further in that many of its built-in operators include non-ASCII characters, although they have ASCII equivalents for people lacking a decent input mechanism for such characters. For example, the greater-than-or-equal-to operator can be written as ≥ though the traditional >= alternative is still supported.

As another example, while (...), [...], {...} and <...> are still supported as bracketing characters, so too are all the Unicode bracketing characters, such as ⌈...⌉ and ⎩...⎭.

Classes

Perl 5 supports object-oriented programming, but it does so in a fairly laissez fair way — you create a package and put all your methods (public or private, static or dynamic) in there. Any function in the package which returns a blessed hash or array ref can be used as a constructor. Although this takes a bit of getting used to, it's actually an extremely flexible system, supporting not just Java-style object orientation (which many programmers seem to think is the only way that OO can be done, or at least the “One True Way”) but also prototype-based OO (a la ECMAScript) and various methods in between. Allowing alternative constructors for a class reduces the need for all these “Factory” classes you see popping up in Java code.

The Perl 5 OO system is actually entirely ripped off from Python. Yet Perl gets lambasted for the ugliness of its code, while Python is praised for its elegance.

Perl 6 still allows programmers to use this flexible system for OO programming, but also introduces a new data structure along the lines of arrays and hashes: the object. Objects are instances of a class, defined through the new class keyword. Classes support both interfaces and multiple inheritance. Classes may have methods, introduced through the method keyword instead of sub. Multiple methods with different numbers of arguments can be defined with the same name, allowing method overloading.

Here's an example of using classes in Perl 6, with Dog being defined as a subclass of Mammal and used:


class Dog is Mammal
{
	has $.name;

	submethod BUILD ($name)
	{
		$.name = $name;
	}

	method bark ()
	{
		# The "say" function is new in Perl 6. It's like "print"
		# but automatically adds a new line character to the end.
		say "woof!";
	}
	
	method bark ($name)
	{
		if $name eq $.name
		{
			self.bark();
		}
	}
}

my $pet = Dog.new("fido");
$pet.bark();       # say "woof!"
$pet.bark("bob");  # nothing
$pet.bark("fido"); # say "woof!"

Junctions, Bags and Ranges

Junctions are inspired by Schrodinger's cat. Schrodinger was a famous quantum physicist who proposed that a cat that is in a black box with an equal possibility of being alive or dead, is in fact both alive and dead until you open the box and take a peek.

Junctions are a type of Perl variable that can take several different values simultaneously. The operators |, ^ and & are no longer used for bitwise arithmetic and have instead been retasked as junction constructors.


my $x = 1 & 2 & 3;   # $x is simultaneously 1, 2, and 3.
my $y = 1 ^ 2 ^ 3;   # $y is one of 1, 2, or 3, but we don't know which yet.
my $z = 1 | 2 | 3;   # $z is one of 1, 2, or 3, or a combination of them.

$z = 1 ^ 2 ^ 3 ^ (1&2) ^ (1&3) ^ (2&3) ^ (1&2&3);  # $z = 1 | 2 | 3

# Note that brackets are no longer required around an "if" clause.
if $x==1 && $x==2
{
	# Yes, this will happen!
}

Bags are more of a sensible sounding idea. They are simply arrays with no inherant order.


my $bag1 = Bag(1, 2, 3, 4);
my $bag2 = Bag(2, 1, 3, 4);
my $bag3 = Bag(1, 2, 3, 4, 4);

# Note that brackets are no longer required around an "if" clause.
if $bag1 == $bag2
{
	# Yes, this will happen.
}
if $bag1 == $bag3
{
	# This will not happen.
}

Ranges are similar to arrays except that you only need to specify the first element and the last element. Perl 6 doesn't bother filling in the middle elements until it really needs to.

my @range1 = 1..20;
my @range2 = 'A'..'Z';     # Ranges are not just numeric
my @range3 = 'α'..'ω';     # Not just western either
my @range4 = 1..Inf;       # No, this won't exhaust your memory

 
# Note that you no longer use $ to reference array elements.
say @range2[4];   # say "D"
 
# OK, now we can cause problems...
my $sum = 0;

for @range4 -> $i
{
    $sum += $i;
}

Operators

As you may have noticed above, several of the old Perl operators have been repurposed. The Perl 5 concatenation operator (.) is now used for calling object methods and the bitwise operators now construct junctions. So what do we do to replace them?

The new string concatenation operator is ~
The new exponent (raise to power of) operator is **
Bitwise operators (the plus signs can be replaced with a tilde to coerce the variables to strings before bitwise operation, or a question mark to coerce to booleans):
- Not: +^
- And: +&
- Or: +|
- Xor: +^
The regular expression match operator has changed from =~ to ~~, with !~ now !~~.

Perl 6 introduces some new operators making it easy to deal with entire arrays simultaneously.

Hyper Operators

The unicode characters “ÃƒÂ‚Ã‚Â»” and “ÃƒÂ‚Ã‚Â«” are used to operate on whole groupings (arrays, junctions, bags, etc) simultaneously. They are attached to normal operators such as addition, subtraction, string concatenation, etc. Here are some examples from the Perl 6 spec:

-« (1,2,3);                     # (-1, -2, -3)
(1,1,2,3,5) »+« (1,2,3,5,8);    # (2, 3, 5, 8, 13)
@array »+=» 42;                 # add 42 to each element

An ASCII equivalent is also provided:


-<< (1,2,3);                    # (-1, -2, -3)
(1,1,2,3,5) >>+<< (1,2,3,5,8);  # (2, 3, 5, 8, 13)
@array >>+=>>42;                # add 42 to each element

Reduction Operators

Similarly, square brackets can be used to reduce an entire array to a scalar. For example:


[+] (1,2,3);                    # 6
my @array = 'hello', 'world';
say [~] @array;                 # say 'helloworld'

Cross Operators

The last of these wonderful operators which deal with whole groupings is the cross operator, which surrounds a regular operator with the capital letter “X”. This takes a group of n elements and a group of m elements and combines them to crease a group of nm elements. For example:


('a', 'b', 'c') X~X (1, 2);     # ('a1', 'a2', 'b1', 'b2', 'c1', 'c2')

Strongish Typing

By default, Perl 6 variables are still weakly typed, being automatically cast to the appropriate type depending on the operation being carried out on them. But it is possible, if desired, to explicitly type variables when you declare them, and to cast them to explicit types. This allows the compiler to optimise your code a bit better.


my int $foo = 1;
my int @Foo = 1, 2, 3, 4;
my @bar of int;  # Same as: my int @bar

Regexs

Regular expressions are no longer called regular expressions, as they've diverged too far from real regular expressions. Some changes are:

Modifiers now go at the start of the regex. (m/foo/gi becomes m:g:i/foo/
The old :s behaviour is gone — that behavious is now the default.
The :x modifier is gone because that behaviour is now the default. A new :s modifier is available which treats white space as significant (though treats all whitespace characters as equivalent to each other).
A :P5 modifier is available to use Perl 5 syntax.
A :b (base character) modifier is available which acts like :i, except ignores accents instead of case.
Ordinal modifiers exist, so m:5th/FOO/ means “find the fifth occurance of 'FOO'”.
The {} signs no longer specify repetitions, but are used for embedding Perl code within a regex. To specify repetitions, use ** followed by an integer or a range: m/ FOO ** 2 BAR ** 3..* / means “find 'FOO' repeated twice and followed by 'BAR' repeated three or more times”. The integer or range may be calculated using embedded Perl code, so the following is equivalent: m/ FOO**{2} BAR**{3..Inf} /.
[a-z] no longer specifies a character class. Instead <foo> are used, where “foo” may be a named set of characters, such as “alpha”, or a custom set of characters within square backets, e.g. <[b..dfghj..np..tv..z_]> will match all consonants and the underscore.

There have been tonnes of other changes too. Perl 6 regexs are pretty different from Perl 5 regular expressions, but for the terrified, you only need to be aware of the :P5 modifier, and remember to put your modifiers at the start instead of the end, and you can happily use Perl 5 style expressions!

Subroutines

Declaration

The syntax for declaring a sub in Perl 6 is expanded from Perl 5. The old Perl 5 way is still supported, but new features are available to explicitly declare the number and type of parameters taken by a sub, and to declare the return type:


# Perl 5 way still works
sub addition1
{
	$arg1 = shift;
	$arg2 = shift;
	return $arg1 + $arg2;
}

# Can explicitly declare parameters:
sub addition2 ($arg1, $arg2)
{
	return $arg1 + $arg2;
}

# Declare the types of parameters and return type:
our Int sub addition3 (Int $arg1, Int $arg2)
{
	return $arg1 + $arg2;
}

# Alternative syntax for the same thing:
sub addition4 (Int $arg1, Int $arg2 --> Int)
{
	return $arg1 + $arg2;
}

# The "->" operator acts almost as a synonym for "sub". Useful for declaring
# anonymous subroutines.
my $addition = -> $arg1, $arg2 { $arg1 + $arg2 };
say $addition(1, 2);   # 3

Named Arguments

Arguments can be named and then used in any order (like in VisualBASIC). For example:


sub makeBigNum (:$mantissa, :$exponent, :$radix)
{
	if $radix 
		{ $r = $radix; }
	else
		{ $r = 10; }
	return $mantissa * ($r ** $exponent);
}

$speed1 = makeBigNum(3, 8);
$speed2 = makeBigNum(exponent=>8, mantissa=>3);
if $speed1 == $speed2
{
	# Yes, this should happen.
}

As a side note: a shortcut for the if/else clause can be given as $r = $radix // 10;. The // (“default operator”) is useful for assigning default values to arguments that haven't been explicitly provided. Note that $radix = $radix // 10; would not work, as arguments are read-only by default (see “Traits” below).

Overloading

The fact that we can specify the number and types of argument a sub takes means that Perl can offer us function overloading:


multi sub speak ($speaker, $words)
{
	say "$speaker says, '$words'";
}
multi sub speak ($words)
{
	say "'$words'";
}

speak("Bob", "Hello world!");     # Bob says, 'Hello world!'
speak("Hello Bob");               # 'Hello Bob'
speak("Bob", "Who said that?!");  # Bob says, 'Who said that?!'

As well as the keyword multi, there is a keyword only to explicitly declare that a function may not be overloaded.

Operators

You may define your own operators, or overload existing ones by defining functions along the lines of one of the following (where 'op' is the symbol to use for the operator):

sub prefix:{'op'} ($arg) { ... }
sub postfix:{'op'} ($arg) { ... }
sub infix:{'op'} ($arg1, $arg2) { ... }
sub circumfix:{'op1', 'op2'} ($arg) { ... }

Here are some examples:


sub infix:{'quoth'} ($speaker, $words)
{
	say "$speaker says, '$words'";
}
sub postfix:{'isHeard'} ($words)
{
	say "'$words'";
}

$dude = 'Bob';
$dude quoth "Hello world!";     # Bob says, 'Hello world!'
"Hello $dude" isHeard;          # 'Hello Bob'
"Bob" quoth "Who said that?!";  # Bob says, 'Who said that?!'

Traits

Subs and arguments may be declared with particular “traits” that allow them to behave differently from normal. (In fact, so can constants and variables.) For example, the is rw trait marks a parameter as being read-write, so the subroutine is able to modify it, with the modified value then available to the code that called the function.


sub prefix:{'double'} (Int $num is rw --> Int)
{
	$num *= 2;
}

$x = 12;
double $x;
say $x;     # 24

As mentioned in the “Named Arguments” section above, parameters are read-only by default. The function can't even change the value of parameters internally. The is copy trait creates a read-write copy of the variable passed as an argument.


sub prefix:{'saydouble'} (Int $num is copy --> Int)
{
	$num *= 2;
	say $num;
}

$x = 12;
saydouble $x;  # 24
say $x;        # 12

This allows us to rewrite our makeBigNum function as:


sub makeBigNum (:$mantissa, :$exponent, :$radix is copy)
{
	$radix = $radix // 10;
	return $mantissa * ($radix ** $exponent);
}

A number of other traits are defined.

Other Non-Syntax Changes

I've taken you through a tour of some of the more interesting and drastic changes to Perl syntax in Perl 6. I've by no means covered them all — I've not even touched upon multiline comments, or the new built-in maths and array functions. If you want to read up on them, check out the current draft of the Perl 6 documentation.

I want to hark back to what I was saying earlier about Perl 6's written specification, and how this allows multiple competing Perl 6 implementations. Most interestingly, it will allow alternative implementors to do something other than “just interpret the language”.

The original intention was that the “official” implementation of Perl would be a tool that parsed Perl 6, compiled it into “Parrot”: http://www.parrotcode.org/ bytecode and then ran that. Parrot is a virtual machine, similar to the .NET Common Language Runtime and the Java Virtual Machine. It was started as a “sister project” to Perl 6, but aims to support plenty of other languages too — APL, Tcl, Lua, Brainfuck, Lisp, Scheme, Perl 1.0, Python and Javascript all work, but with varying levels of completeness and reliability. An experimental translator is available to convert .NET bytecode to Parrot bytecode. Eventually the hope is that within Parrot, code written in Javascript could instantiate an object written in Python, which called a function written in Perl and another written in Parrot assembly language — many different languages interacting seamlessly. However, it is still a long way from being usable for day-to-day coding.

However Pugs, a Haskell implementation of Perl 6 is currently more feature-complete. Amongst the features already implemented are overloaded functions and operators, including named arguments; ranges and junctions, classes and objects, packages (including the ability to link to some Perl 5 packages) and many of the new operators. It is likely to always remain an experimental implementation, but it's currently very useful to see where the language is headed, and may be used to bootstrap an eventual “Perl 6 written in Perl 6”.

Unlike PHP 6, which should be ready some time this year, I doubt we'll see a version of Perl 6 usable before 2012. In the mean time, many of is features are available in Perl 5, Haskell, PHP, ECMAScript and Python, though often with very different syntaxes, and not available all together in a single language like Perl 6 should offer.