Perl format Primer (1/2) | WebReference

Perl format Primer (1/2)

current pageTo page 2
[next]

Perl format Primer

By Dan Ragle

Though often referred to with the backronym "Practical Extraction and Reporting Language," Perl is probably more often associated with tasks involving the parsing and manipulation of files and data than it is the actual, formal reporting of that data. Taking a close look at your corporate or Web host's servers (assuming you are privileged enough to look at such things) will often reveal multiple Perl-based scripts with a host of critical functions, such as receiving data from a Web form and storing it in a database, reading a log file a line at a time and producing a separate text file with summarized results, synchronizing specific data elements among multiple servers, sending automated E-mail messages to a list of users and so on. Less frequent, however, will be the reporting tasks that are assigned to the language; the formal presentation of parsed data in a format that allows more casual data browsers--i.e., human beings--to make sense of the information.

Though Perl shines when it comes to making data manipulation jobs simple to code and execute, it's not entirely without formal reporting possibilities. In this brief tutorial, we'll examine the two main core functions that can be used to create formatted reports with the Perl language: format, for the insertion of data elements into formatted report lines; and write, which outputs the formatted results to a file or STDOUT for examination.

format

While you are certainly welcome to code up a Perl-based report manually--i.e., tracking line counts, page numbers, and page headers, lining up the output of print statements so that data is listed in the appropriate column, developing separate sub routines to print page and column header information, etc.--the core format function provides a template-based means to produce simple reports without requiring such manual shenanigans. In a nutshell, you can define report line and header templates using the format function that will then be automatically filled in and utilized whenever the write function is called. We'll examine write in more detail later in this tutorial; for now, let's focus on the creation of the report line and header formats using format.

To define a line or header template for your report, you use the following syntax:

format [name] = 
      picture line 1
      argument line 1
      picture line 2
      argument line 2
      ...
      picture line n
      argument line n
.

The dot at the end is not a typo; it's the official end of a format definition. It must be the first--and only--character on the line in order for Perl to interpret it as the end of the format template.

The name supplied for the format is important, and can be directly related to which file this format will be applied to. Each filehandle you use in your Perl script will automatically assume that the format that is defined with the same name as the filehandle will be applied to it. Or, in other words:

# open up MYFILE for writing
open(MYFILE,">myfile.txt") or die "Can't open up myfile: $!\n";
my ($name,$salary);
# now this line format will automatically apply to MYFILE
format MYFILE = 
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>>        Salary: @###########.##
      $name,                                       $salary
.

For now, ignore the picture and argument lines within the format, and concentrate on the name applied. Since this format declaration was named MYFILE it's automatically applied to the MYFILE filehandle; or, in other words, when we write to MYFILE, Perl will automatically use the MYFILE format for writing.

If no name is supplied to format, then STDOUT is assumed; in which case the format declaration in turn automatically applies to STDOUT. Note that format names have their own namespace in Perl, and you are therefore allowed to create names which may be identical to existing variables or functions (though, for the sake of clarity, you may not want to do this). As we shall see later, it is then possible to assign a specially named format directly to any specific filehandle, even if the format name is not the same as the filehandle itself.

In addition to the default report line format for each filehandle as described above, a special format can be defined as the page header for each filehandle. To define a specially formatted page header for a file, you add the _TOP suffix to the format name:

# open up MYFILE for writing
open(MYFILE,">myfile.txt") or die "Can't open up myfile: $!\n";
my ($name,$salary);
# now this line format will automatically apply to MYFILE
format MYFILE = 
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>>        Salary: @###########.##
      $name,                                       $salary
.
# and this page header format will automatically apply to MYFILE
format MYFILE_TOP = 
Employee Names and Salaries                      Page: @>>>>>>>>>>
                                                       $%
------------------------------------------------------------------
.

To assign a page header for STDOUT, you would need to define a format named STDOUT_TOP.

The picture line of a format definition declares the literal text and the field definitions that will appear within the body of your report. Literal text is exactly that: it will appear exactly as you specify it within the written output. Field definitions begin with either a caret or an at sign (^ or @) and denote a field into which a supplied value should be automatically inserted at run time. Immediately following the caret or at sign is a series of characters that denote the justification and length of the data that will be placed within the field:

CharacterMeaning
>right justified
#right justified (numeric only; can include a decimal point)
<left justified
|center justified
*left justified, fill in all data from value

The argument lines must immediately follow each picture line, and should contain only those variables or expressions--separated by commas--that will actually be filled into the field definitions on the previous line. The variables or expressions can actually appear anywhere on the line, but for readability I suggest you follow the typical convention of visually lining up the data variables with the appropriate format fields (if possible), as in the examples above and below. Data is always justified within the defined length of the field in the format--the field width is never expanded or contracted to fit the data you are plugging in (with the exception of * fields, which we'll discuss later). If the actual data you provide exceeds the width of the field then the data will be truncated. You may wrap your arguments to more than one line if you enclose all the arguments in curly braces, but if you do so, the opening curly brace must be the first token on the first line.

format definitions are processed by the perl interpreter at compile time, not run time, and therefore any variables that you wish to use within your format must be visible to the interpreter--either declared earlier in the case of lexically scoped variables, or within the same routine in the case of dynamically scoped variables. (But the actual value to be contained within the variables can, of course, be adjusted throughout the script.) The format definitions themselves are global in nature, and therefore you can only have one unique format name per package. If you define the same format name twice in the package, the last one that the compiler sees is the one that will be applied throughout that package for that format name whenever you write it (even if the format occurs after the write in question).

Some examples should help you to see how basic format definitions are utilized. For each of the formats, assume that the $name variable has already been defined and it contains John Smith and the $salary variable contains 78293.22.

format MYFILE = 
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>>        Salary: @###########.##
      $name,                                       $salary
.

Produces the output:

Name:                    John Smith        Salary:        78293.22

While:

format MYFILE = 
Name: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<        Salary: @###########.##
      $name,                                       $salary
.

Produces the output:

Name: John Smith                           Salary:        78293.22

And:

format MYFILE = 
Name: @||||||||||||||||||||||||||||        Salary: @###########.##
      $name,                                       $salary
.

Produces the output:

Name:          John Smith                  Salary:        78293.22

In each of the above cases, we could also have used the > right justification for the salary variable, but for our purposes:

format MYFILE = 
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>>        Salary: @>>>>>>>>>>>>>>
      $name,                                       $salary
.

would have worked just as well. Note that in this latter case, we couldn't have specified an explicit decimal place; as the . and anything following it would have been assumed to be literal text to be displayed by Perl. This is fine if you know for certain that your numeric data is all pre-formatted to the exact same precision (if you're using sprintf to format the numbers, for example), but for numbers with potentially differing precisions or formats the results will probably not be what you want. For example, if, instead of 78293.22 in the previous example we instead had the number 78293.20, our output using the template above would have looked like this:

Name:                    John Smith        Salary:         78293.2

While this is technically accurate, it looks unprofessional in a printed report and can be difficult to read in a columnar format. Using the # based notation avoids this problem, as Perl automatically rounds and formats the numbers to appear within the defined field with the specified decimal places in the correct position. The flip side of this convenience is the fact that # based fields must be filled with numbers; attempting to assign a text string to such a field will produce a nicely formatted zero in place of the actual text you wished to display.

For fields defined with an @ sign (as in all the examples above), you may use either static variables or the result of an expression to fill data into the field, for example:

format MYFILE = 
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>>        Salary: @>>>>>>>>>>>>>>
      &last_name_first($name),                     $salary
.

where the last_name_first function is defined elsewhere in your code and returns the desired value. The argument lines are first evaluated before the results are plugged into the format; making the use of arrays legal, too:

my @data_line=("John Smith",78293.22);
format MYFILE = 
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>>        Salary: @>>>>>>>>>>>>>>
      @data_line
.

Thus far, we have discussed only the use of @ type fields in a format definition. A ^ field can be used to denote special types of processing for the defined field. With # based fields, the field will automatically be cleared (blanked) if the data value is undefined. For all other justifications, the data is "filled" into the field; as much data as can fit in the field is placed into it, and the data variable is then reset such that the data that was actually placed into the fill-in field is removed from the variable. Due to this special type of processing, the values supplied to caret fields must be scalar variables that contain a text string.

Exactly how much data is placed into the field (and subsequently removed from the variable) depends on the settings of the special internal variable $:, which is typically set to space, newline, and dash. In other words, Perl automatically fills in the data, filling in the field as much as possible, up to the last newline, space, or dash in the specified text and then removes that much data from the variable. Having a look at the following before and after example should help to explain the concept:

my $my_string = "This is center justified text, longer than the format.";
open(MYFILE,">myfile.txt") or die "Can't open up myfile: $!\n";
format MYFILE = 
^|||||||||||||||||||||||||||||||||||
$my_string
.
write MYFILE;
# myfile.txt now contains:
#    This is center justified text,
print "$my_string\n"; # prints "longer than the format."

Thus, it's important for you to note that the variables used for filling into caret-based fill in fields will be modified each time write is processed.

So what good are these fill in fields? They allow you to easily "flow" data over multiple lines in a single format definition. For example:

my $name="John Smith";
my $salary=78293.20;
my $job_desc="John's job is to dominate the world with his wits and a toothbrush.";
format = 
Name: @<<<<<<<<<<<<<<<<<<<<<<<        Salary: @###########.##
      $name,                                  $salary
Job Description: ^<<<<<<<<<<<<<<<<<<<<<<<<<<
                 $job_desc
                 ^<<<<<<<<<<<<<<<<<<<<<<<<<<
                 $job_desc
                 ^<<<<<<<<<<<<<<<<<<<<<<<<<<
                 $job_desc
                 ^<<<<<<<<<<<<<<<<<<<<<<<<<<
                 $job_desc
.
write;
# result:
#
# Name: John Smith                      Salary:        78293.20
# Job Description: John's job is to dominate  
#                  the world with his wits and 
#                  a toothbrush.
#

Notice in the above example that an additional blank line is printed at the end of the job description block. This is because the data we provided didn't completely fill our defined format; i.e., we defined 4 lines to hold the data, but then only provided 3 lines of actual data to use. You can use a single tilde (~) on a line to suppress any line that would be completely blank due to a lack of data. Compare the above results with this:

my $name="John Smith";
my $salary=78293.20;
my $job_desc="John's job is to dominate the world with his wits and a toothbrush.";
format = 
Name: @<<<<<<<<<<<<<<<<<<<<<<<        Salary: @###########.##
      $name,                                  $salary
Job Description: ^<<<<<<<<<<<<<<<<<<<<<<<<<<
                 $job_desc
                 ^<<<<<<<<<<<<<<<<<<<<<<<<<< ~
                 $job_desc
                 ^<<<<<<<<<<<<<<<<<<<<<<<<<< ~
                 $job_desc
                 ^<<<<<<<<<<<<<<<<<<<<<<<<<< ~
                 $job_desc
.
write;
# result:
#
# Name: John Smith                      Salary:        78293.20
# Job Description: John's job is to dominate  
#                  the world with his wits and 
#                  a toothbrush.

Note the extra blank line is now gone. On lines that are displayed, the tilde itself will be replaced with a single blank space.

Placing two tildes consecutively on a line tells the perl interpreter to repeat that line, continuously replacing the fields with the supplied expressions, until the data is exhausted. Thus, the above example could even be shortened to this:

my $name="John Smith";
my $salary=78293.20;
my $job_desc="John's job is to dominate the world with his wits and a toothbrush.";
format = 
Name: @<<<<<<<<<<<<<<<<<<<<<<<        Salary: @###########.##
      $name,                                  $salary
Job Description: ^<<<<<<<<<<<<<<<<<<<<<<<<<<
                 $job_desc
                 ^<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
                 $job_desc
.
write;
# result:
#
# Name: John Smith                      Salary:        78293.20
# Job Description: John's job is to dominate  
#                  the world with his wits and 
#                  a toothbrush.

You can use the double tilde character construct with @ type fields, just make sure that the expression you use will eventually run out of data. For example, the following will not produce the effect you want, but will instead produce a Runaway format error:

my @peanuts=("Charlie","Lucy","Linus","Snoopy","Woodstock");
# this is WRONG! 
format =
Peanuts characters:
       @<<<<<<<<<<<<<<<<<<<<<< ~~
       @peanuts
.
write;

What you probably wanted was this:

my @peanuts=("Charlie","Lucy","Linus","Snoopy","Woodstock");
format =
Peanuts characters:
       @<<<<<<<<<<<<<<<<<<<<<< ~~
       shift(@peanuts)
.
write;
# result: 
# Peanuts characters:
#        Charlie
#        Lucy
#        Linus
#        Snoopy
#        Woodstock

Finally, let's take note of one last formatting construct you may find helpful. When you declare a field with @*, it's the same as saying "Fill in all the available data on this line, regardless of the data length." Using this field will result in a line the length of which won't be known until run time, since the line will be the length of the data provided. You can use an asterisk with caret fields, too; but there's not much reason too (unless you for some reason want the contents of the variable you supply to be cleared as part of the write process).

Certain internal Perl variables are available to you to utilize directly within your template formats and to control the application of the formats to the output files. On the next page we examine those variables and then conclude our tutorial with a look at the write function.


current pageTo page 2
[next]

Created: December 1, 2005
Revised: December 9, 2005

URL: http://webreference.com/programming/perl/format/index.html