Accessibility
navigation | page content |
Accessibility
top of site | navigation |
Latest Tutorials
Tutorials

Input and output

In the final part of this series, Paul Grosse shows us how to enable our programs to communicate with users, files and other programs.
Input and output

In this series, we've learned about variable types, manipulating data, controlling flow, re-using code and making it easier to maintain by using subroutines. We complete the course by learning about input and output which will allow your programs to communicate with users, files, other programs and so on. Once completed, you should be able to write virtually any character-based program you want.

For programs to have any meaning, they must be able to communicate with the resources that are available to them, such as printing to the screen or a file or perhaps allowing the user to input data or indicate preferences in the way a program executes. In this series so far, we have used some basic communication methods that have allowed our examples to make sense, but there is a lot more to it than that. First of all, we need to look at what happens by default.

Default
There are three basic channels through which information flows: input for the program from the user (STDIN); output from the user, usually to the screen (STDOUT); and errors which normally go to the screen as well (STDERR). Between them, STDIN, STDOUT and STDERR all work well and you shouldn't need to do anything with them. We covered STDOUT when we used 'print'. To obtain a line of input from the console, you just need to have a line like the following:

chomp( $line = <STDIN> );

This takes the input from STDIN (using the diamond operator - just a < followed by a >), and then puts it in a variable called '$line'. Chomp then removes the newline character from the end.

As these are the defaults, you do not need to specify them explicitly in order to use them (in the same way that '$_' and '@_' are used). So, instead of having '<STDIN>', you can just use '<>'. This is taken further if you need to have a series of inputs in a loop, such as if you are using your program interactively. This is done as follows:

while (<>) {
chomp;
/^quit/i and do {
last;
}
# put your code in here...
}

As in the previous example, chomp removes the newline at the end but here, because we're using chomp, the value of the line input is stored in '$_'. There is nothing stopping you from copying $_ to some other variable or pushing it on to an array if you want a log of user input. It is useful to search the input for a quit or equivalent and using that to escape from the loop. If you look at the Perl Masterclass in PC Plus 229, you will see that the same control structure directives work with a while loop, which is a foreach in disguise, so you can use 'next' and 'redo' as well. You can even use redo after modifying '$_' if you want some food for thought.

Pipes
Using a while loop with the diamond operator enables a program to take input from anything that is prepared to masquerade as STDIN and have all of it processed for as long as data is fed to it. This is because the loop is executed until it receives an EOF (End Of File) signal, at which point it starts processing on the line after the end of the while loop. Just connect it up to a program instead of a user. This is done on the command line using the pipe symbol '|':

netstat -atn | grep LISTEN | myprog

In the above example, the socket information is piped to grep which filters out any line that doesn't have LISTEN in it. Any that remain are passed on to 'myprog', which processes the names in any way you have chosen (you might be interested in connections with a range of interfaces on a tri-homed firewall for instance). If the output of myprog was then piped to another program, which died before it could finish taking in all of the data (if a resource had closed, for instance) and then myprog tried to send it some more data, myprog would receive a SIGPIPE signal which would cause it to die unless SIGPIPE was prepared for - "SIG{'PIPE'} = 'die_gracefully';".

Command-line-based pipes are not the only ones that are available. If you have a xinet superserver (xinetd) running, you can get it to pipe the data from the socket it creates for each connection to your program. The data from the socket comes in through STDIN (so you can read it using a 'while (<>) {' loop) and you send data back using STDOUT so you can just use print. Plus, you can test such a program by calling it from the command line ('./myprog') and test the input and output without any code rewriting. If you want to test it locally, all you need to do is run a telnet session ('# telnet 127.0.0.1 25' where, in this case, 25 is the port number).

Redirection
While we are on the command line, you can redirect STDIN and STDOUT quite easily just by using '<' and '>'. First of all, we need to have the name of the program we want to run and then we can specify where it should read STDIN from and STDOUT to. Note that you can specify none, either or both, depending upon your needs:

./myprog < data.txt > outputfile.txt

This runs myprog and feeds it data from the file 'data.txt' as it is needed and then, the output of myprog is sent to a file called 'outputfile.txt'. If it finishes before it has got to the end then not all of it is passed to myprog; if it does get to the end, myprog is sent an EOF. If you wanted to append 'outputfile.txt' you need to use '>>' instead of '>'.

Opening files
So far, we have seen STDIN, STDOUT and STDERR which are in effect just special file handles. Like all file handles, they should be written in allcaps so that they are distinguished from other keywords. There are three stages to using a file: open it; use it; and close it.

The three most common ways to open files are 'read-only', 'write' and 'append'. This allows you to read data from a file, to create a new file and write to it from the beginning, and write more data to the end of an existing file.

open CFG, "prog.conf";
open CFG, "<prog.conf";
open PAGE, ">results.html";
open BIGLOG, ">>/var/log/slurp.log";

In the above examples (note the '>', '<' and '>>' as in command line redirection), the first two open the file for read-only, but in the second example, it was explicit. If you use a scalar instead of a literal, a mischievous user could taint the scalar with a value like '>/etc/httpd/httpd.conf' and cause it to be overwritten. If instead, you use open CFG "<".$filename; and that is tried, a file with <> at the beginning will cause an error. So, suppose we want to open a file, print to it and then close it again. All we need to do is:

open FH, ">testfile.txt";
print FH "Some content\n";
close FH;
To read a file in, line by line, we can do the following:
open FH, "testfile.txt";
foreach (<FH>) {
print "$_";
};
close FH;

This will print out each line, although you can do more than just this. With inputting files, it is useful to think of <FH> as a list which you can process in the same way as any other.

Instead of reading a file in line by line, you can do it in one go, storing the file as entries in an array, like this:

open FH, "testfile.txt";
@file_contents = <FH>;
close FH;

Now, we can access any line of the file in any order. The only limitation on this (assuming that the file exists) is that there is enough memory available. Fortunately with Perl, you don't have to worry too much about memory management, as Perl organises it all on your behalf. With modern PCs having hundreds of MBs of RAM, there is rarely a problem anyway. As always, you'll be able to find plenty of great examples on the Internet.

Command line
Winning battles with arguments.

If you have ever used the command line for anything, you will probably have passed a program some command line arguments, even without thinking about it. In DOS, you can type 'dir *.' to get a list of directories (and extensionless files) and in UNIX, you can say 'less mytext.txt' to display the file mytext.txt on the screen. In each case, you are passing command line arguments: '*.' and 'mytext.txt' respectively.

Perl picks these up and stores them in an array called '@ARGV' which you can access and process in the same way as any other array; ie, $ARGV[0] is the first and so on. You can either process these randomly as you can with any array or you can do it destructively using shift. In the screenshot, you can see the command line below:

./test3 single args "all one" 

This shows that the program 'test3' has had a number of arguments passed to it. The first two are separated by spaces; as is the third. However, the third one is two words, separated by spaces, within quotes. The command line shell scans the line when you press enter and passes the three arguments to the program test3 (the directory). It then does what we have told it to, which in this case is just print them out.

If you miss off the last quote, bash lets you enter other arguments until you enter another quote. In that case, all of the entries within a pair of quotes, will be just that one element in @ARGV, including the newline characters. In this way, command line arguments passed to Perl scripts will work in the same way as command line arguments passed to any other program, primarily because it's bash or any other command interpreter that you are using that does the pre-execution input and then passes on the arguments.

Paul Grosse  
  PC Plus Issue 231 - July 2005