Minimal Perl for Unix and Linux People: Part 2/Page 6 | WebReference

Minimal Perl for Unix and Linux People: Part 2/Page 6

[previous] [next]

Perl as a (Better) Find Command: Part 2

6.5.2 Dealing with multi-word filenames

As discussed earlier, the find | xargs approach to handling filenames has the advantage of using fewer processes than the find-exec alternative. However, there's a limitation of the xargs approach that's important to understand. Specifically, filenames containing whitespace characters are split into separate pieces at those positions, preventing them from being handled properly.

Let's say we need to count the number of characters (via wc-c) in each of the regular files within or below the current directory. The find -exec approach isn't bothered by filenames containing whitespace characters:

As you can see, each part of multi-wordname was presented as a separate argument to the wc command.

This problem can easily be rectified by using a Perl command in place of xargs, because Perl can also report file sizes, but it doesn't automatically do word-splitting on input lines:

The -s operator provides the byte-count for the file named in the current input line (see table 6.2), and $_ provides the filename itself, so printing these elements—with a space before the filename—produces a report that resembles wc's output.

The result is a solution that handles whitespace embedded in filenames properly, like find's –exec option, but that's even more economical with processes than xargs—the Perl command uses only one, versus one process for xargs and from one to an astronomical number for the required wc commands.

We discussed the benefits of pre-processing arguments for other commands with Perl in section 6.3. But turnabout is fair play, so next we'll discuss the use of other commands, such as find, as argument pre-processors for Perl.


Back in chapter 4, we covered simple Perl commands that offered improvements on sed, including examples that automatically edited large numbers of files with commands like these:

The first example edits every file in the current directory, whereas the second one edits all the HTML files in the directories called site1 and site2.

That format works nicely for processing all files within directories, but what if you want to select particular files on the basis of their attributes—including files that reside in subdirectories? That problem can be solved by using find to feed filenames to a Perl command—which filters out the inappropriate ones and passes the others on to xargs—which in turn feeds arguments to another Perl command.

Here's an example:

This can be simplified a bit more by using the textfiles script from this chapter along with the change_file script from chapter 4, yielding the following:

As discussed previously, the use of xargs ensures that every pathname emitted by find | textfiles is eventually presented as an argument to change_file, even if the OS won't allow a single instance of the script to handle them all.22

Of course, the invocation could be simplified even more by enclosing this pipeline in a script. One user interface might look like this:

Another interface might dispense with th e switch options and assign meanings to arguments by position instead:

You'll learn techniques for processing positional parameters, such as the three arguments of that last command, in section 8.1.3.

Next, you'll see how Perl lets you enjoy the benefits of the Unix find command on Windows.

[previous] [next]