Nested Data Structures--A Perl Primer (2/2) | WebReference

Nested Data Structures--A Perl Primer (2/2)

To Page 1current page
[previous]

Nested Data Structures--A Perl Primer

Dereferencing

Armed with the knowledge in the above section, you might be tempted to write something like this to access the first entry in the $foo array:

use strict;
my $bar = ["Apples","Oranges","Bananas"];
print $bar[0]."\n";   #  ERROR!

As indicated in the comment, such a simple access method will fail. The reason is because when the interpreter sees the statement print $bar[0] it looks for a previously defined @bar array, which doesn't exist. Since we're using strict, the complier simply quits when it can't find the necessary declared variable.

Curious, you may then decide to try this instead:

use strict;
my $bar = ["Apples","Oranges","Bananas"];
print $bar."\n";   #  ARRAY(0x80fbb0c)

This will at least compile, but still won't provide the actual values you want. What is printed is the somewhat cryptic ARRAY(0x80fbb0c) (the actual value printed will be different on your machine). The reason for this is the fact that references are scalar values; when you attempt to print them Perl prints out not the data values the references refer to, but the actual reference (i.e., the actual address) that is stored in $bar.

To access the data the reference variable is actually referring to (as opposed to the address contained directly within the reference variable) you must explicitly dereference the variable; that is, specifically indicate to Perl that you want to retrieve the data value that is being referenced. You can do this in one of three ways:

use strict;
my $bar = ["Apples","Oranges","Bananas"];
# each of the following prints "Apples"
print $$bar[0]."\n";
print ${$bar}[0]."\n";
print $bar->[0]."\n";

In the first example, $$bar[0], we prefix the reference variable with an additional data type indicator (i.e., $, @, %, & etc.) to indicate to Perl that we wish to retrieve that type of variable from the referenced value. Since we indeed want a single entry of the array (and array entries must be scalar), the additional dollar sign is appropriate here. Note that if we wanted to refer to the array as a whole, we would instead use the @ sign:

foreach (@$bar) {
   print $_."\n";
}

The reference variable itself it always presented with the dollar sign, since it is in fact always a scalar value; thus the first $ is placed to the immediate left of the variable and the type indicator (i.e., $, @, % etc.) is placed to the immediate left of the dollar sign. For a hash, the concept is the same:

my $bar = {"fruit" => "Apples", "vegetables" => "Spinach"};
foreach (keys(%$bar)) {
   print $_." = ".$$bar{$_}."\n";
}

Note again the use of the double $$ when referencing the specific scalar value a reference refers to; i.e., $$bar{$_}.

In the second reference above, we force a statement block around the reference variable: ${$bar}[0]. The basic idea here is the same as the first example: the statement block (indicated by the curly brackets or braces) forces Perl to interpret the value of the reference variable; it then uses this interpreted value (the address) to refer to the actual data being stored.

Finally, the third method is perhaps the most common way to dereference variables; using the "arrow" or "infix" operator (->). It is commonly used by Perl coders because it provides somewhat of a visual display of the actual operation being performed by Perl behind the scenes. When encountering a statement like this:

print $bar->[0]."\n";

it's easy to imagine that you are actually referring to the first element in the array that is being pointed to by $bar. When Perl sees the arrow operator in use, it knows that dereferencing of a scalar value is going on; therefore, in this case, you needn't (and mustn't) use the extra type indicator (like our $$bar[0] example above). Perl will automatically perform the dereference of the scalar value for you based on the existence of the arrow operator.

As in the earlier example, the arrow operator works with hashes, too:

my $bar = {"fruit" => "Apples", "vegetables" => "Spinach"};
foreach (keys(%$bar)) {
   print $_."=".$bar->{$_}."\n";
}

Putting It All Together

With our rudimentary knowledge of hard references, we're now ready to create and access the data structure I described in the opening of this article. Recall our original problem: We can't create an array of arrays in Perl, because each array entry must itself contain a scalar value. But references are, in fact, scalar values and are therefore legal array entries! Thus, while we can't create an array of arrays, we can create an array of references, and each of those references can refer to an array (or a hash, or a subroutine, etc.). A crude (but illustrative) method of creating our data structure might look like this:

use strict;
my $day1={"day" => "1",
         "quantity" => 400,
         "value" => 36.23};
my $day2={"day" => "3",
         "quantity" => 800,
         "value" => -34.65};
my $days_in_month=[$day1,$day2];
my @months=($days_in_month);

In the above code we've created two anonymous hashes, and assigned references to them in the $day1 and $day2 variables. We then created an anonymous array containing both of these references and assigned a reference to this anonymous array in $days_in_month. Finally, we create a regular named array (@months) and assign to it the single $days_in_month reference. (Presumably we would add other months to the main array later.) Using our knowledge of dereferencing, we would then access the second day's value as follows:

print $months[0]->[1]->{"value"}."\n";

Note also this shortcut: When square or curly brackets (braces) are adjacent to one another, Perl assumes that the second and subsequent set of brackets/braces must require dereferencing. Thus, in the above statement, the arrow operators aren't even necessary, meaning that this statement is also valid:

print $months[0][1]{"value"}."\n";

I recommend, however, leaving the arrows in; since they are such a strong visual indicator of the exact operation being performed.

One more point before we wrap up. On the previous page, I noted that Perl will silently create a reference in an array/hash entry if needed; a handy point that will allow you to build your nested data structures without having to worry about predeclaring each of your individual references within the array/hash. So in the above example, we could pre-populate our structure with something as simple as this:

use strict;
my @months=();
for (my $i=0; $i

In this statement: $months[$i]->[$day_number++]->{"day"}, Perl "sees" that $months[$i] must contain a reference to an array (because of the arrow operator that is pointing to the array subscript, [$day_number++]) and it creates one and sticks it in $months[$i] if it doesn't already exist. It then "sees" that the [$day_number++] entry of this array must contain a reference to a hash (because of the arrow operator that is pointing to the hash subscript, {"day"}) and it creates one and sticks it in $months[$i]->[$day_number++] (again, assuming that hash reference does not already exist). Finally, it sticks the actual value in the resulting $months[$i]->[$day_number++]->{"day"} entry. You needn't explicitly test and define empty array/hash references as you build the array (unless you want to); Perl creates those for you.

For Further Study

It's my hope that this brief tutorial has helped you to understand the basic mechanics involved in the definition and access of nested data structures in Perl. Many more advanced topics involved with the creation of references and nested data structures were omitted here. Some of those more advanced topics that you may wish to follow up on on your own are listed below:

  1. Objects are References
    In Perl, objects are references; and developers create special subroutines called constructors to return these references to your program. You then use these special types of references to refer not only to the core data of the object, but the methods that are available to it as well (using the same arrow operator syntax; such as $circle->draw(4.12);). For more on these special types of references, you may wish to peruse an OOP Perl tutorial and/or lookup the bless command in your Perl reference.

  2. Filehandle References
    In addition to the other variable types described above, filehandles can also be referenced (by way of typeglobs). This allows you to, for example, easily pass specific filehandles into subroutines.

  3. ref
    You can use the ref operator to return the type of data value being referenced by a variable (i.e., print ref($bar); # ARRAY).

Conclusion

Nested data structures in Perl are simple, once you get a handle on the basic tasks of creating and using hard references. With these basic tips in mind you should be well on your way to creating arrays of arrays, arrays of hashes, hashes of arrays, or hashes of hashes depending on the needs of your application.


To Page 1current page
[previous]

Created: September 8, 2005
Revised: September 8, 2005

URL: http://webreference.com/programming/perl/nested/2.html