Subroutines

If you have had a go at writing some programs of your own, you’ll have probably discovered that, sometimes, you repeat the same pieces of code, or that your programs can sprawl over a number of editor-page-views. To make things worse, if you want to edit a piece of repeated code you need to find all instances of it. These factors make finding programming errors unnecessarily difficult, and your programs hard to read. This is where subroutines come into their own.
Subroutines can sound daunting but you will have already been using them, probably without even realising it. For example, every time you use the ‘print’ command you’ve been using a subroutine – although you never get to see its internal workings. ‘Print’, like many other commands, is effectively a ‘black box’ that does all of your printing work for you. You don’t need to know how it does its job, just how to use it. Although it’s early to spring a fundamental concept on you, the ‘black box’ is how programmers write large, complex programs, and how they can work on details within them.
First steps
In reality, you take your problem and break it down into a few, overall steps. This might be as broad as: ‘Initialise variables’; ‘Get user input’; ‘Process data’; ‘Produce output’; and, ‘Tidy up’. Note that there is no program code at this stage. Each of these steps, or ‘black boxes’, can be broken down into smaller levels of detail. So, if you take ‘Get user input’, you might want to check what the user is typing, or limit what they can input in some way, then pre-process it so that it’s in a form that the program can use. This might consist of checking ranges of input, stuffing the entries into an array in a particular way, or checking that an IP address exists and has a specified port open. Each of these processes can be broken down further until you have something that you can turn into program code, such as using regular expressions to check that a string contains only digits and then checking that it is within an expected range.
Reducing repetition
If you find that you have to do a particular routine many times within a program, such as print in a particular formal way to a log file, you can make life a lot easier for yourself by giving that routine a place of its own and calling it when it is needed. This can be done using just a single command, in the same way that you do with ‘chomp’. By doing this you can shorten the overall length of your program, and if you need to correct the code or edit it at a later date you only need to look in one place.
So, how do we write and use a subroutine? The content of a subroutine doesn’t have to be different to the type of programming you do anywhere else in your program. But it does have to be surrounded by a special type of block that starts off with ‘sub’ and then has the name of your routine, which follows the same naming conventions as variables. Following that, you have a block of code within a set of braces in the normal way.
To use such a subroutine, you just have to use an ampersand, followed by the name of the subroutine and the usual semicolon. So it might look like this:
($x, $y) = (2, 3);
&addxy; print “$z\n”;
sub addxy {
$z = $x + $y
}
Note that there’s no need for a semicolon at the end of the subroutine block, and also that the last statement in the subroutine doesn’t need one either. This little program assigns the values of ‘2’ and ‘3’ to two global variables, calls a subroutine called ‘addxy’ and then prints the value of another global variable. On its own, this doesn’t save us any code-writing but if this subroutine was a little longer, and we called it many times from different places within our program, it would save a lot of work and maintenance. One other point worthy of note at this stage is that the main program ends with the last, non-subroutine line of code. Hence it effectively ignores anything that is within a subroutine block. This means we can put the subroutines anywhere we like in our program and Perl will respect them.
Finally, just how much code needs to be repeated to justify turning it into a subroutine? The answer to that is largely a matter of opinion, but a rough guide is that if you have code repeating more than two or three times, and it is more than four or five statement-lines long, you could justify turning it into a subroutine. I recently saw code from a company that should have known better (it wasn’t written in Perl) that used subroutines with only two lines of code, making things harder to read, rather than easier.
Passing values
One of the problems we have with our ‘addxy’ subroutine is that we need to remember that it uses two input variables, one called ‘$x’ and one called ‘$y’. If we already have two such variables in use, we will need to save them temporarily, substitute our values for the addxy routine, call the routine and then copy ‘$x’ and ‘$y’ back to what they were beforehand. If that is inconvenient, consider that when you come back to your program in the future, you might have forgotten about ‘$x’ and ‘$y’, set them a few dozen lines before ‘addxy’ is called and then wonder why the program isn’t working the way you would expect it to.
You can eradicate this problem by passing the values to your subroutine when you call it, instead of setting global variables a few lines before. However, if you start passing values, you’ll probably need to declare the subroutine if it’s used in the program before it’s defined – unless it is obvious to Perl that it is a subroutine. In the example below, we pass one value to a routine called ‘add3’:
$x = 6; &add3 ($x);
print “$z\n”;
sub add3 {
$z = (shift) + 3;
}
We could have condensed the first two lines into one that said:
‘&add3 (6);’
but here you can see that we can pass variables, as well as literals, to subroutines. One thing to note is that we have passed just one variable, and we access that directly using ‘shift’. Using shift like this takes the first value passed to the subroutine and uses it however we have chosen. Here that is by adding ‘3’ to it and storing it in ‘$z’.
If we wanted to pass more than one variable, such as in our ‘addxy’ routine, in the main program we would pass a list instead of a single variable, using ‘$addxy (2, 3);’ or even ‘$addxy (@two_numbers);’. In the subroutine we could shift several times so that we might end up with ‘$z = (shift) + (shift);’. However, the problem with using shift directly like this is that we can only use each value once. Have a look at the boxout ‘Passing lists’ for more on different ways of accessing lists that have been passed to a subroutine.
Returning variables
So far, we haven’t managed to get rid of the problem of the returned value being stored in ‘$z’. But you’ll be happy to hear that this is actually quite an easy thing to do. You just have to use ‘return’ as follows:
$z = &addxy(2, 3);
print “$z\n”;
sub addxy {
return (shift) + (shift);
}
This function returns the value as the result of ‘addxy’, thus relieving us of the requirement to use a global variable.
However, a subroutine will return the last expression that has been evaluated. So, instead of ‘return (shift) + (shift);’, we could just use ‘(shift) + (shift);’. If you did this in a non-subroutine context with warnings switched on, Perl would tell you about a pointless use of addition in a void context but here, it is the returned value.
Perl’s philosophy is that you should not be limited and, of course, you aren’t limited to returning just the one value. You can see how to do this in the boxout ‘Return of the lists’ at the bottom of the page. In addition to this, I’ve explained how to use variables within a subroutine without affecting those in the rest of your program, even if they have got the same name.
Return of the lists
Return lists or scalars, depending upon which has been requested.
We know that you can return a single variable either by using an explicit ‘return $x;’ which exits the subroutine immediately, or, the subroutine returns the result of the last expression that was evaluated. However, it is also possible to return a list. Call the subroutine like this:
@a = thisfunction ($x);
A list of values is expected to be placed in the array ‘@a’. In the subroutine, this is done either by using an explicit return with the list following it, or, as the last expression evaluated. Note that you don’t have to have your return list all in one array as Perl will flatten out the list that you provide it with, as it does when you send a list to an array or a subroutine. If you needed to have your list output formatted in a particular way, your last line to be evaluated could be:
(‘Hello’, 5..$x, @op1);
Do check ‘$x’, as the range operator will only work if the last number is greater than, or equal to, the first.
This is fine if we know that we need to return a list. But, if you look at the screenshot, you will see that we have a function that returns a single, scalar value when it is asked for one and, when asked for a list, it knows to return a list. This could then be done by looking at the number of values that are passed to the subroutine, thereby guessing what type of return is needed. However, this is not always useful and, no surprises here, there is a better way.
If you look at the subroutine ‘feet_to_metres’, the last line of code (16) has our favourite ternary operator in it, which is the single-line equivalent of an if-then-else block. In it we use a special function called ‘wantarray’. This detects whether the subroutine is being used in a scalar or list context. Around this result we have built our return. It returns an array if one is asked for, otherwise it returns a scalar, which is the first element of the array and the only one processed by the foreach loop.


