Simple Comments Release Notes: v.920 (2/2)
Simple Comments: v.920
Comment Keys and Digest Hashes
As part of the development cycle for version .920 of Simple Comments, it became clear that we needed to store a "comment key" with each comment; a unique key that could be used to identify that particular comment within the system. Previously, we didn't store this key directly, but retrieved it dynamically as we read the comment (i.e., the comment key itself was based on the logical concatenation of several pieces of information from the comment, such as the article key, the user's name, the submitter IP address, etc.). This design was flawed, however. What happens, for example, if the administator chooses to edit the user name appearing on a comment? The comment key for that comment would then be different the next time it was read.
In previous versions of Simple Comments, this wasn't an issue; as the only time these comment keys were actually used was when comments were checked for duplicates as they were added to the live files (so the worst that could happen is a duplicate comment could be inadvertantly posted). With the new version, we need to reference a different comment from within each comment; and that reference needs to still be valid if the original comment changes. We need this as part of our new reply-to capabilities; i.e., when a user submits a comment in reply-to another comment, their submitted comment needs to store the comment key of the original comment so both comments can be properly sorted when they're displayed on the site.
Thus, we set out to add permanent comment keys to comments; i.e., now the comment key is added to the comment immediately when it is submitted; and it remains as it was originally generated throughout the life of the comment. Our original comment-key generation algorithm was adequate (using the article key in combination with multiple individual fields of the comment itself, including submit date), but it produced lengthy comment keys that were difficult to work with. Additionally, and perhaps more importantly, we now needed to be able to pass these keys into the comment display templates (since they would be needed in order to properly assign the reply-to information to submitted comments); and the comment key itself would therefore be visible within every displayed page. Since this comment key included some information that shouldn't be visible within the pages themselves (the submitting user's IP, for example), we set out to find a way to obscure them.
A simple means to create a unique, and obscured representation of data is to generate
an MD5 digest of the data; and in Perl, this is greatly simplified by using the
Digest::MD5 module. A digest is similar to
crypt, in that it
generates somewhat of a one-way-encrypted representation of the data submitted to it.
crypt, however, the MD5 algorithm utilizes the entire data string
submitted to it in its processing, instead of only the first 8 characters.
Digest::MD5 in a Perl script is simple:
use strict; use Digest::MD5 qw(md5_hex); print "Digest of 'Dan is a lazy slob' is: ", md5_hex('Dan is a lazy slob'), "\n"; # above prints: # Digest of 'Dan is a lazy slob' is: b442aa6caec8af56f801b673075c2084
In the above example, I've used the procedural interface to
(an object-oriented interface is also available), and I've specifically used the
md5_hex function, which outputs the digest as a 32 byte hex string. You can
also make use of an
md5 function that outputs the digest as a straight 128 bit binary
md5_base64 that outputs the digest as--you guessed it--a
Base64 encoded string.
md5_hex routine proved to be a simple answer both to our
lengthy comment keys and the obfuscation problem. We can apply
to the same data that we were formerly piecing together for our comment key, and it
would always be reduced to a 32-character string. Additionally, none of the fields
used to build the key can be ascertained from it.
To implement the new keys, we first needed to write a routine that would add permanent
keys to each of our existing comments in the system (and if you're upgrading from a
previous version of Simple Comments, you'll need to run this routine once as soon as
you deploy v .920. It's in the administration script, and instructions are in the
README.txt of the distribution). This routine ran smoothly on almost
all of our test servers; with one important exception.
In our Perl 5.6.1 test server, an interesting thing happened when we assigned comment keys to our existing comments: All of the comments ended up with the same key! Specifically, the digest created for every comment in the system was:
I recognized this as the digest that's created when you supply an empty string to the MD5 algorithm; i.e.:
# perl -MDigest::MD5 -e "print Digest::MD5::md5_hex(q()), qq(\n)" d41d8cd98f00b204e9800998ecf8427e
Some quick testing assured me that I was in fact presenting valid (and unique)
strings to the
md5_hex routine, so why was I getting this digest value?
It turns out the older versions of
Digest::MD5 (prior to 2.20, if I'm
not mistaken) don't handle strings with
utf8 data properly. And though I
wasn't intentionally using
utf8 data in my digest input, recall that
XML::Parser retrieves data from external XML files flagged as
by default (see the release notes for
v .910 for further
utf8 related tidbits). Therefore, the comment keys
for all my existing comments (which were read in via
XML::Parser) were incorrectly produced as displayed above.
In some contexts, this could be considered a security risk; and the earlier
Digest::MD5 are listed as potential security problems in some
online security notice repositories. The core of the problem lies in the fact that
the MD5 algorithm was intended for use with strings of bytes; and not on strings with
characters with ordinal values above 255. Later versions of
correct the problem by generating an error if any true "wide characters" are encountered
in the input string:
Wide character in subroutine entry
But earlier versions produce inconsistent digests as described above.
To correct the problem for the new version of Simple Comments, we do two things.
First, any wide charactes in the comment data itself are converted to a non-wide character
format (using entity versions of the wide characters) using our existing
to_entities function in the
Comments.pm module. This prevents
the routine from failing with later versions of
Digest::MD5 in the event that true
wide character data is found in the input. Second, we force the removal of the
utf8 flag on the resulting data input in a similar manner to untainting
known, good data:
($comment_key =~ /^(.*)$/) && ($comment_key = $1);
(If you are unfamiliar with taint mode in Perl, take a look at our
earlier primer on the subject. The above method is
recommended only for data known to be safe by some other means; i.e., it was
tested separately.) This prevents
Digest::MD5 implementations from presenting an inaccurate digest
as the result of the string being marked as
These two fixes corrected the new comment key methodology in the v .920 release of
Simple Comments. Unfortunately, the new comment keys was not the only place that
Passwords and Digests
In the previous version of Simple Comments (v .910) we were also using
for the encoding of user passwords, for those implementations that were not using a Web
server-based means of authenticating administrators for the administration script. Due to the problems described above, a potential security vulnerability was present when
v .910 was deployed on Perl 5.6 servers using an older (pre v 2.20) version of
Specifically, if the administrator innocently added the "empty" digest key into the
that user (or anyone else that knew--or could guess--that user's ID), could conceivably
access the administration script using any password that contained a
character. The chances of that vulnerability being exploited seem pretty rare; but nonetheless
we've taken steps in v .920 to remove this risk. Specifically, we treat the input password in
the same manner as the comment keys above; so it will always be handled properly in both
the old and newer versions of
Digest::MD5. Further, we don't allow any passwords that match the "empty" digest string presented above. I.E.,
you can still put those digests within your password file; but Simple Comments will refuse
to authenticate against them (any user that attempts to provide the empty digest as their
encoded password will be denied access, even if that is the digest stored in the
It's my hope that you will continue to find the Simple Comments script itself useful, and/or the developmental notes to helpful for your own Perl projects. Please feel free to contact me if you have any clarifications, suggestions or requests for improvements!
Created: December 26, 2006
Revised: December 26, 2006