The Compress::Zlib module provides a Perl interface to the zlib
compression library (see AUTHOR for details about where to get
zlib). Most of the functionality provided by zlib is available
in Compress::Zlib.
The module can be split into two general areas of functionality, namely
in-memory compression/decompression and read/write access to gzip
files. Each of these areas will be discussed separately below.
The interface Compress::Zlib provides to the in-memory deflate
(and inflate) functions has been modified to fit into a Perl model.
The main difference is that for both inflation and deflation, the Perl
interface will always consume the complete input buffer before
returning. Also the output buffer returned will be automatically grown
to fit the amount of output available.
It combines the features of the zlib functions deflateInit,
deflateInit2 and deflateSetDictionary.
If successful, it will return the initialised deflation stream, $d
and $status of Z_OK in a list context. In scalar context it
returns the deflation stream, $d, only.
If not successful, the returned deflation stream ($d) will be
undef and $status will hold the exact zlib error code.
The function optionally takes a number of named options specified as
-Name=>value pairs. This allows individual options to be
tailored without having to specify them all in the parameter list.
For backward compatibility, it is also possible to pass the parameters
as a reference to a hash containing the name=>value pairs.
The function takes one optional parameter, a reference to a hash. The
contents of the hash allow the deflation interface to be tailored.
When a dictionary is specified Compress::Zlib will automatically
call deflateSetDictionary directly after calling deflateInit. The
Adler32 value for the dictionary can be obtained by calling the method
$d-dict_adler()>.
Sets the initial size for the deflation buffer. If the buffer has to be
reallocated to increase the size, it will grow in increments of
Bufsize.
The default is 4096.
Here is an example of using the deflateInit optional parameter list
to override the default buffer size and compression level. All other
options will take their default values.
Deflates the contents of $buffer. The buffer can either be a scalar
or a scalar reference. When finished, $buffer will be
completely processed (assuming there were no errors). If the deflation
was successful it returns the deflated output, $out, and a status
value, $status, of Z_OK.
On error, $out will be undef and $status will contain the
zlib error code.
In a scalar context deflate will return $out only.
As with the deflate function in zlib, it is not necessarily the
case that any output will be produced by this method. So don't rely on
the fact that $out is empty for an error test.
Typically used to finish the deflation. Any pending output will be
returned via $out.
$status will have a value Z_OKif successful.
In a scalar context flush will return $out only.
Note that flushing can seriously degrade the compression ratio, so it
should only be used to terminate a decompression (using Z_FINISH) or
when you want to create a full flush point (using Z_FULL_FLUSH).
By default the flush_type used is Z_FINISH. Other valid values
for flush_type are Z_NO_FLUSH, Z_PARTIAL_FLUSH, Z_SYNC_FLUSH
and Z_FULL_FLUSH. It is strongly recommended that you only set the
flush_type parameter if you fully understand the implications of
what it does. See the zlib documentation for details.
In a list context it returns the inflation stream, $i, and the
zlib status code ($status). In a scalar context it returns the
inflation stream only.
If successful, $i will hold the inflation stream and $status will
be Z_OK.
If not successful, $i will be undef and $status will hold the
zlib error code.
The function optionally takes a number of named options specified as
-Name=>value pairs. This allows individual options to be
tailored without having to specify them all in the parameter list.
For backward compatibility, it is also possible to pass the parameters
as a reference to a hash containing the name=>value pairs.
The function takes one optional parameter, a reference to a hash. The
contents of the hash allow the deflation interface to be tailored.
Here is a list of the valid options:
-WindowBits
For a definition of the meaning and valid values for WindowBits
refer to the zlib documentation for inflateInit2.
Defaults to -WindowBits =>MAX_WBITS.
-Bufsize
Sets the initial size for the inflation buffer. If the buffer has to be
reallocated to increase the size, it will grow in increments of
Bufsize.
Default is 4096.
-Dictionary
The default is no dictionary.
Here is an example of using the inflateInit optional parameter to
override the default buffer size.
Inflates the complete contents of $buffer. The buffer can either be
a scalar or a scalar reference.
Returns Z_OKif successful and Z_STREAM_ENDif the end of the
compressed data has been successfully reached.
If not successful, $out will be undef and $status will hold
the zlib error code.
The $buffer parameter is modified by inflate. On completion it
will contain what remains of the input buffer after inflation. This
means that $buffer will be an empty string when the return status is
Z_OK. When the return status is Z_STREAM_END the $buffer
parameter will contains what (if anything) was stored in the input
buffer after the deflated data stream.
This feature is useful when processing a file format that encapsulates
a compressed data stream (e.g. gzip, zip).
Scans $buffer until it reaches either a full flush point or the
end of the buffer.
If a full flush point is found, Z_OK is returned and $buffer
will be have all data up to the flush point removed. This can then be
passed to the deflate method.
Any other return code means that a flush point was not found. If more
data is available, inflateSync can be called repeatedly with more
compressed data until the flush point is found.
use strict ;
use warnings ;
use Compress::Zlib ;
my $x = inflateInit()
or die "Cannot create a inflation stream\n" ;
my $input = '' ;
binmode STDIN;
binmode STDOUT;
my ($output, $status) ;
while (read(STDIN, $input, 4096))
{
($output, $status) = $x->inflate(\$input) ;
print $output
if $status == Z_OK or $status == Z_STREAM_END ;
last if $status != Z_OK ;
}
die "inflation failed\n"
unless $status == Z_STREAM_END ;
Two high-level functions are provided by zlib to perform in-memory
compression/uncompression of RFC1950 data streams. They are called
compress and uncompress.
The two Perl subs defined below provide the equivalent
functionality.
Compresses $source. If successful it returns the
compressed data. Otherwise it returns undef.
The source buffer can either be a scalar or a scalar reference.
The $level paramter defines the compression level. Valid values are
0 through 9, Z_NO_COMPRESSION, Z_BEST_SPEED,
Z_BEST_COMPRESSION, and Z_DEFAULT_COMPRESSION.
If $level is not specified Z_DEFAULT_COMPRESSION will be used.
A number of functions are supplied in zlib for reading and writing
gzip files. This module provides an interface to most of them. In
general the interface provided by this module operates identically to
the functions provided by zlib. Any differences are explained
below.
This function operates identically to the zlib equivalent except
that it returns an object which is used to access the other gzip
methods.
As with the zlib equivalent, the mode parameter is used to
specify both whether the file is opened for reading or writing and to
optionally specify a a compression level. Refer to the zlib
documentation for the exact format of the mode parameter.
If a reference to an open filehandle is passed in place of the
filename, gzdopen will be called behind the scenes. The third example
at the end of this section, gzstream, uses this feature.
Reads $sizebytes from the compressed file into $buffer. If
$size is not specified, it will default to 4096. If the scalar
$buffer is not large enough, it will be extended automatically.
Returns the number of bytes actually read. On EOF it returns 0 and in
the case of an error, -1.
Reads the next line from the compressed file into $line.
Returns the number of bytes actually read. On EOF it returns 0 and in
the case of an error, -1.
It is legal to intermix calls to gzread and gzreadline.
At this time gzreadline ignores the variable $/
($INPUT_RECORD_SEPARATOR or $RS when English is in use). The
end of a line is denoted by the C character '\n'.
Flushes all pending output to the compressed file.
Works identically to the zlib function it interfaces to. Note that
the use of gzflush can degrade compression.
Returns Z_OKif$flush is Z_FINISH and all output could be
flushed. Otherwise the zlib error code is returned.
Refer to the zlib documentation for the valid values of $flush.
Returns the zlib error message or number for the last operation
associated with $gz. The return value will be the zlib error
number when used in a numeric context and the zlib error message
when used in a string context. The zlib error number constants,
shown below, are available for use.
The $gzerrno scalar holds the error code associated with the most
recent gzip routine. Note that unlike gzerror(), the error is
not associated with a particular file.
As with gzerror() it returns an error number in numeric context and
an error message in string context. Unlike gzerror() though, the
error message will correspond to the zlib message when the error is
associated with zlib itself, or the UNIX error message when it is
not (i.e. zlib returned Z_ERRORNO).
As there is an overlap between the error numbers used by zlib and
UNIX, $gzerrno should only be used to check for the presence of
an error in numeric context. Use gzerror() to check for specific
zlib errors. The gzcat example below shows how the variable can
be used safely.
Here is an example script which uses the interface. It implements a
gzcat function.
use strict ;
use warnings ;
use Compress::Zlib ;
die "Usage: gzcat file...\n"
unless @ARGV ;
my $file ;
foreach $file (@ARGV) {
my $buffer ;
my $gz = gzopen($file, "rb")
or die "Cannot open $file: $gzerrno\n" ;
print $buffer while $gz->gzread($buffer) > 0 ;
die "Error reading from $file: $gzerrno" . ($gzerrno+0) . "\n"
if $gzerrno != Z_STREAM_END ;
$gz->gzclose() ;
}
Below is a script which makes use of gzreadline. It implements a
very simple grep like script.
use strict ;
use warnings ;
use Compress::Zlib ;
die "Usage: gzgrep pattern file...\n"
unless @ARGV >= 2;
my $pattern = shift ;
my $file ;
foreach $file (@ARGV) {
my $gz = gzopen($file, "rb")
or die "Cannot open $file: $gzerrno\n" ;
while ($gz->gzreadline($_) > 0) {
print if /$pattern/ ;
}
die "Error reading from $file: $gzerrno\n"
if $gzerrno != Z_STREAM_END ;
$gz->gzclose() ;
}
This script, gzstream, does the opposite of the gzcat script
above. It reads from standard input and writes a gzip file to standard
output.
use strict ;
use warnings ;
use Compress::Zlib ;
binmode STDOUT; # gzopen only sets it on the fd
my $gz = gzopen(\*STDOUT, "wb")
or die "Cannot open stdout: $gzerrno\n" ;
while (<>) {
$gz->gzwrite($_)
or die "error writing: $gzerrno\n" ;
}
Two functions are provided by zlib to calculate a checksum. For the
Perl interface, the order of the two parameters in both functions has
been reversed. This allows both running checksums and one off
calculations to be done.
Although Compress::Zlib has a pair of functions called compress
and uncompress, they are not the same as the Unix programs of the
same name. The Compress::Zlib library is not compatable with Unix
compress.
If you have the uncompress program available, you can use this to
read compressed files
open F, "uncompress -c $filename |";
while (<F>)
{
...
If you have the gunzip program available, you can use this to read
compressed files
open F, "gunzip -c $filename |";
while (<F>)
{
...
and this to write compress files if you have the compress program
available
open F, "| compress -c $filename ";
print F "data";
...
close F ;
The Archive::Tar module can optionally use Compress::Zlib (via
the IO::Zlib module) to access tar files that have been compressed
with gzip. Unfortunately tar files compressed with the Unix compress
utility cannot be read by Compress::Zlib and so cannot be directly
accesses by Archive::Tar.
If the uncompress or gunzip programs are available, you can use
one of these workarounds to read .tar.Z files from Archive::Tar
Assuming you don't want to use this module to access zip files there
are a number of undocumented features in the zlib library you need to
be aware of.
When calling inflateInit or deflateInit the WindowBits parameter
must be set to -MAX_WBITS. This disables the creation of the zlib
header.
The zlib function inflate, and so the inflate method supplied in
this module, assume that there is at least one trailing byte after the
compressed data stream. Normally this isn't a problem because both
the gzip and zip file formats will guarantee that there is data directly
after the compressed data stream.
The Compress::Zlib module was written by Paul Marquess,
pmqs@cpan.org. The latest copy of the module can be
found on CPAN in modules/by-module/Compress/Compress-Zlib-x.x.tar.gz.