NAME
perlhack - How to hack at the Perl internals
DESCRIPTION
This document attempts to explain how Perl development takes place,
and ends with some suggestions for people wanting to become bona fide
porters.
The perl5-porters mailing list is where the Perl standard distribution
is maintained and developed. The list can get anywhere from 10 to 150
messages a day, depending on the heatedness of the debate. Most days
there are two or three patches, extensions, features, or bugs being
discussed at a time.
A searchable archive of the list is at either:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
or
http://archive.develooper.com/perl5-porters@perl.org/
List subscribers (the porters themselves) come in several flavours.
Some are quiet curious lurkers, who rarely pitch in and instead watch
the ongoing development to ensure they're forewarned of new changes or
features in Perl. Some are representatives of vendors, who are there
to make sure that Perl continues to compile and work on their
platforms. Some patch any reported bug that they know how to fix,
some are actively patching their pet area (threads, Win32, the regexp
engine), while others seem to do nothing but complain. In other
words, it's your usual mix of technical people.
Over this group of porters presides Larry Wall. He has the final word
in what does and does not change in the Perl language. Various
releases of Perl are shepherded by a pumpking, a porter
responsible for gathering patches, deciding on a patch-by-patch,
feature-by-feature basis what will and will not go into the release.
For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and
Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release.
In addition, various people are pumpkings for different things. For
instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the
Configure pumpkin up till the 5.8 release. For the 5.10 release
H.Merijn Brand took over.
Larry sees Perl development along the lines of the \s-1US\s0 government:
there's the Legislature (the porters), the Executive branch (the
pumpkings), and the Supreme Court (Larry). The legislature can
discuss and submit patches to the executive branch all they like, but
the executive branch is free to veto them. Rarely, the Supreme Court
will side with the executive branch over the legislature, or the
legislature over the executive branch. Mostly, however, the
legislature and the executive branch are supposed to get along and
work out their differences without impeachment or court cases.
You might sometimes see reference to Rule 1 and Rule 2. Larry's power
as Supreme Court is expressed in The Rules:
"1"
Larry is always by definition right about how Perl should behave.
This means he has final veto power on the core functionality.
"2"
Larry is allowed to change his mind about any matter at a later date,
regardless of whether he previously invoked Rule 1.
Got that? Larry is always right, even when he was wrong. It's rare
to see either Rule exercised, but they are often alluded to.
New features and extensions to the language are contentious, because
the criteria used by the pumpkings, Larry, and other porters to decide
which features should be implemented and incorporated are not codified
in a few small design goals as with some other languages. Instead,
the heuristics are flexible and often difficult to fathom. Here is
one person's list, roughly in decreasing order of importance, of
heuristics that new features have to be weighed against:
"Does
These haven't been written anywhere in stone, but one approximation
is:
1. Keep it fast, simple, and useful.
2. Keep features/concepts as orthogonal as possible.
3. No arbitrary limits (platforms, data sizes, cultures).
4. Keep it open and exciting to use/patch/advocate Perl everywhere.
5. Either assimilate new technologies, or build bridges to them.
"Where
All the talk in the world is useless without an implementation. In
almost every case, the person or people who argue for a new feature
will be expected to be the ones who implement it. Porters capable
of coding new features have their own agendas, and are not available
to implement your (possibly good) idea.
"Backwards
It's a cardinal sin to break existing Perl programs. New warnings are
contentioussome say that a program that emits warnings is not
broken, while others say it is. Adding keywords has the potential to
break programs, changing the meaning of existing token sequences or
functions might break programs.
"Could
Perl 5 has extension mechanisms, modules and \s-1XS\s0, specifically to avoid
the need to keep changing the Perl interpreter. You can write modules
that export functions, you can give those functions prototypes so they
can be called like built-in functions, you can even write \s-1XS\s0 code to
mess with the runtime data structures of the Perl interpreter if you
want to implement really complicated things. If it can be done in a
module instead of in the core, it's highly unlikely to be added.
"Is
Is this something that only the submitter wants added to the language,
or would it be broadly useful? Sometimes, instead of adding a feature
with a tight focus, the porters might decide to wait until someone
implements the more generalized feature. For instance, instead of
implementing a delayed evaluation feature, the porters are waiting
for a macro system that would permit delayed evaluation and much more.
"Does
Radical rewrites of large chunks of the Perl interpreter have the
potential to introduce new bugs. The smaller and more localized the
change, the better.
"Does
A patch is likely to be rejected if it closes off future avenues of
development. For instance, a patch that placed a true and final
interpretation on prototypes is likely to be rejected because there
are still options for the future of prototypes that haven't been
addressed.
"Is
Good patches (tight code, complete, correct) stand more chance of
going in. Sloppy or incorrect patches might be placed on the back
burner until the pumpking has time to fix, or might be discarded
altogether without further notice.
"Is
The worst patches make use of a system-specific features. It's highly
unlikely that nonportable additions to the Perl language will be
accepted.
"Is
Patches which change behaviour (fixing bugs or introducing new features)
must include regression tests to verify that everything works as expected.
Without tests provided by the original author, how can anyone else changing
perl in the future be sure that they haven't unwittingly broken the behaviour
the patch implements? And without tests, how can the patch's author be
confident that his/her hard work put into the patch won't be accidentally
thrown away by someone in the future?
"Is
Patches without documentation are probably ill-thought out or
incomplete. Nothing can be added without documentation, so submitting
a patch for the appropriate manpages as well as the source code is
always a good idea.
"Is
Larry said "Although the Perl Slogan is There's More Than One Way
to Do It, I hesitate to make 10 ways to do something". This is a
tricky heuristic to navigate, thoughone man's essential addition is
another man's pointless cruft.
"Does
Work for the pumpking, work for Perl programmers, work for module
authors, ... Perl is supposed to be easy.
"Patches
Working code is always preferred to pie-in-the-sky ideas. A patch to
add a feature stands a much higher chance of making it to the language
than does a random feature request, no matter how fervently argued the
request might be. This ties into Will it be useful?, as the fact
that someone took the time to make the patch demonstrates a strong
desire for the feature.
If you're on the list, you might hear the word core bandied
around. It refers to the standard distribution. Hacking on the
core means you're changing the C source code to the Perl
interpreter. A core module is one that ships with Perl.
Keeping in sync
The source code to the Perl interpreter, in its different versions, is
kept in a repository managed by a revision control system ( which is
currently the Perforce program, see http://perforce.com/ ). The
pumpkings and a few others have access to the repository to check in
changes. Periodically the pumpking for the development version of Perl
will release a new version, so the rest of the porters can see what's
changed. The current state of the main trunk of repository, and patches
that describe the individual changes that have happened since the last
public release are available at this location:
http://public.activestate.com/pub/apc/
ftp://public.activestate.com/pub/apc/
If you're looking for a particular change, or a change that affected
a particular set of files, you may find the Perl Repository Browser
useful:
http://public.activestate.com/cgi-bin/perlbrowse
You may also want to subscribe to the perl5-changes mailing list to
receive a copy of each patch that gets submitted to the maintenance
and development branches of the perl repository. See
http://lists.perl.org/ for subscription information.
If you are a member of the perl5-porters mailing list, it is a good
thing to keep in touch with the most recent changes. If not only to
verify if what you would have posted as a bug report isn't already
solved in the most recent available perl development branch, also
known as perl-current, bleading edge perl, bleedperl or bleadperl.
Needless to say, the source code in perl-current is usually in a perpetual
state of evolution. You should expect it to be very buggy. Do not use
it for any purpose other than testing and development.
Keeping in sync with the most recent branch can be done in several ways,
but the most convenient and reliable way is using rsync, available at
ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent
branch by \s-1FTP\s0.)
If you choose to keep in sync using rsync, there are two approaches
to doing so:
"rsync'ing
Presuming you are in the directory where your perl source resides
and you have rsync installed and available, you can upgrade to
the bleadperl using:
# rsync -avz rsync://public.activestate.com/perl-current/ .
This takes care of updating every single item in the source tree to
the latest applied patch level, creating files that are new (to your
distribution) and setting date/time stamps of existing files to
reflect the bleadperl status.
Note that this will not delete any files that were in '.' before
the rsync. Once you are sure that the rsync is running correctly,
run it with the --delete and the --dry-run options like this:
# rsync -avz --delete --dry-run rsync://public.activestate.com/perl-current/ .
This will
simulate an rsync run that also deletes files not
present in the bleadperl master copy. Observe the results from
this run closely. If you are sure that the actual run would delete
no files precious to you, you could remove the '--dry-run' option.
You can than check what patch was the latest that was applied by
looking in the file
.patch, which will show the number of the
latest patch.
If you have more than one machine to keep in sync, and not all of
them have access to the \s-1WAN\s0 (so you are not able to rsync all the
source trees to the real source), there are some ways to get around
this problem.
"Using
Set up a local rsync server which makes the rsynced source tree
available to the \s-1LAN\s0 and sync the other machines against this
directory.
From http://rsync.samba.org/README.html :
"Rsync uses rsh or ssh for communication. It does not need to be
setuid and requires no special privileges for installation. It
does not require an inetd entry or a daemon. You must, however,
have a working rsh or ssh system. Using ssh is recommended for
its security features."
"Using
Having the other systems mounted over the \s-1NFS\s0, you can take an
active pushing approach by checking the just updated tree against
the other not-yet synced trees. An example would be
#!/usr/bin/perl -w
use strict;
use File::Copy;
my %MF = map {
m/(\S+)/;
$1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime
} `cat MANIFEST`;
my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
foreach my $host (keys %remote) {
unless (-d $remote{$host}) {
print STDERR "Cannot Xsync for host $host\n";
next;
}
foreach my $file (keys %MF) {
my $rfile = "$remote{$host}/$file";
my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
defined $size or ($mode, $size, $mtime) = (0, 0, 0);
$size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
printf "%4s %-34s %8d %9d %8d %9d\n",
$host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
unlink $rfile;
copy ($file, $rfile);
utime time, $MF{$file}[2], $rfile;
chmod $MF{$file}[0], $rfile;
}
}
though this is not perfect. It could be improved with checking
file checksums before updating. Not all \s-1NFS\s0 systems support
reliable utime support (when used over the \s-1NFS\s0).
"rsync'ing
The source tree is maintained by the pumpking who applies patches to
the files in the tree. These patches are either created by the
pumpking himself using CWdiff -c after updating the file manually or
by applying patches sent in by posters on the perl5-porters list.
These patches are also saved and rsync'able, so you can apply them
yourself to the source files.
Presuming you are in a directory where your patches reside, you can
get them in sync with
# rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
This makes sure the latest available patch is downloaded to your
patch directory.
It's then up to you to apply these patches, using something like
# last=`ls -t *.gz | sed q`
# rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
# find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
# cd ../perl-current
# patch -p1 -N <../perl-current-diffs/blead.patch
or, since this is only a hint towards how it works, use CPAN-patchaperl
from Andreas K