Precomp utility

Current version: Precomp v0.4.3, lprepaq v1.3, paq8o8pre v2 (01.09.2012)
Download - Changes - Results - FAQ - Contact

What is Precomp?

Precomp is a command line precompressor. You can use it to achieve better compression on some filetypes (works on files that are compressed with zLib or the Deflate compression method, and on GIF files). Precomp tries to decompress the streams in those files, and if they can be decompressed and "re"-compressed so that they are bit-to-bit-identical to the original stream, the decompressed stream can be used instead of the compressed one.

The result is a .pcf file (PCF = PreCompressedFile) that contains more decompressed data than the original file. Note that by default, the result file is compressed using bZip2, but you can also turn off compression to get a file larger than the original file and compress it with a method stronger than bZip2. This will lead to even better compression results.

Since version 0.4.2, Precomp is available for Linux, too. The Linux and Windows versions are completely compatible, PCF files are exchangeable between Windows and Linux systems.

What is Precomp Comfort?

Precomp Comfort is a Windows-only variation of Precomp. It supports drag and drop of single files and uses an INI file for the parameters.
It is included in the ZIP file. Precomp.exe is the original version, Precomf.exe is the Comfort version.

What is lprepaq?

lprepaq combines lpaq6 by Matt Mahoney and Precomp. It first precompresses the input file, then compresses it using the powerful PAQ compression method. lprepaq is a complete compressor/decompressor, so use this if you just want to compress your files.

What is prepaq?

prepaq v2 (aka paq8o8pre v2) by Jan Ondrus combines paq8o8 by Matt Mahoney and Precomp. It first precompresses the input file, then compresses it using the powerful PAQ compression method. Just like lprepaq, prepaq is a complete compressor/decompressor, but is it much slower than lprepaq and compression is better.

Filetypes

Here is a list of filetypes that can probably achieve better compression with Precomp, along with notes how you can check if improvement is possible.
Note that this list is not complete, and that other filetypes can contain Deflate or zLib streams, too, but you should use the intense mode parameter (-intense) for them.

Download

Precomp (Windows and Linux) v0.4.3: precomp.zip (876 KB)

lprepaq v1.3 (including source): lprepaq.zip (259 KB)

Note: Perhaps you'll be asked for MSVCR80.DLL. Download it here.

prepaq v2 (aka paq8o8pre v2, including source): paq8o8pre.zip (311 KB)

Old versions

Precomp is not backwards compatible. If you want to recompress some PCF file made with a different version of Precomp, you'll have to download it here:

Precomp v0.4 Precomp v0.41 Precomp v0.42

For older versions (0.3 - 0.38) please drop me a mail and I'll send it to you.

How to use it

Easiest way (lprepaq):
"lprepaq 5 input_filename output_filename" to compress a file.
"lprepaq d input_filename output_filename" to decompress a file.
5 selects 99 MB memory. Options range from 0 (6 MB) to 9 (1539 MB).
In general, option N uses 3 + 3*2N MB.

Easy way (Precomp Comfort, on Windows):
Drag and drop a file on precomf.exe to precompress the file into a .pcf file with the same name.
To get back the original file, do the same with the .pcf file.

Using the command line: (Precomp)
"precomp input_filename" to precompress a file into a .pcf file with the same name
"precomp -r pcf_filename" to restore the original file

Errorlevels
For batch jobs (Windows) or shell scripts (Linux), you'll find these errorlevels useful that are returned:
Error level Description
0 No error
1 Various errors (f.e. file access errors)
2 No streams could be decompressed
3 Disk full
4 Temporary file disappeared
5 Parameter error: Ignore position too big
6 Parameter error: Identical byte size too big
7 Parameter error: Recursion depth too big
8 Parameter error: Recursion depth set more than once
9 Parameter error: Minimal identical byte size set more than once
10 Parameter error: Don't use a space after -o
11 Parameter error: More than one output file
12 Parameter error: More than one input file
13 Ctrl-C detected (user break)
14 Parameter error: Intense mode recursion limit too big
15 Parameter error: Brute mode recursion limit too big


Additional switches: (Precomp / Precomp Comfort)

-longhelp:

Only common switches are shown by default. This switch will display a long and detailed help. -o[filename]:

Specifies the output file name. For precompression, default is the original file name with extension .pcf, for restoring the original file, it is the original file name. If the output file exists, you will be asked if you want to overwrite it. Nevertheless, you can specify a different output file name with this option.

-c[bn]: (Comfort: Compression_Method)

The first step that Precomp does is to decompress all the streams in the input file. The output is either directly compressed using bZip2 ("-cb", default setting) or left as it is ("-cn"), i.e. if an external compressor is to be used.

-n[bn]:

This switch is for converting a PCF file from no compression to bZip2 compression and vice versa without running Precomp on the original file again.

-zl: (Comfort: zLib_Levels)

After precompressing a file with Precomp, it tells you how to use this parameter to speed up the precompression the next time you precompress this file. These are one or more two-digit numbers. The first digit is the compression level, the second digit is the memory settings which are tried on this file. However, using this on a different file could lead to Precomp missing some compressed parts of it.

-t: (Comfort: Compression_Types)

Enables or disables detecting of certain compression types. For command-line use, there are two variants:
t+ enables certain types and disables the others, while t- disables certain types and enables the rest.
Using -t-j for example disables JPEG recompression and leaves all other types as before, using -t+pf enables only PDF and GIF precompression, disabling everything else.

-d: (Comfort: Maximal_Recursion_Depth)

Sets the maximal recursion depth. Several streams can contain additional streams inside, for example ZIP or MIME Base64 streams. This switch specifies the maximum depth up to which Precomp will look for streams. Setting this to 0 disables recursion, the default is 10 which should be enough for most filetypes.

-f: (Comfort: Fast_Mode)

Fast mode to speed up Precomp. This switch will treat any stream like the first validated one and not test any other compression methods. This will work fine on files that use only a few compression methods, but will result in weaker compression for files with many compression methods used. Good candidates are PDF and ZIP/JAR/GZ files. Bad candidates are archives containing many different files.
With fast mode turned off, Precomp will display a message after precompression in case only one level combination was applied to the input file. This means that fast mode will do absolutely the same on this file, but faster.

-intense: (Comfort: Intense_Mode)

Intense mode will slow down Precomp much. It looks for raw zLib headers, and recognizes more file formats like SIS and SWF or special formats used only for one single program. However, the zLib header consists of only 2 bytes, so there can be many false-detected streams that aren't zLib streams but are treated like those, which results in a slower and more instable behaviour.
Intense mode can be combined with fast mode, but it could happen that a false-detected stream is the first stream and prevents further real streams to be detected, so combine them with caution. Use this mode if you have files that use zLib compression but are not supported in normal mode (SIS, SWF, ISO files...).

-brute: (Comfort: Brute_Mode)

Brute mode will slow down Precomp extremely. It assumes that there could be zLib streams without headers everywhere. This even recognizes most exotic file formats that don't include zLib headers but will take very much time (more than a minute even for filesizes around 10 KB). If you should have data that has to be processed with this mode, better try to add zLib headers on your own.
Brute mode can be combined with fast mode, but disables intense mode.

-pdfbmp[+-]: (Comfort: PDF_BMP_Mode)

This precedes PDF images with a BMP header to improve compression and speed, especially for PAQ.

-progonly[+-]: (Comfort: JPG_progressive_only)

Recompresses progressive JPGs only. Again, this is especially useful for PAQ which usually has a better JPG compression than packJPG, but lacks progressive JPG support.

-mjpeg[+-]: (Comfort: MJPEG_recompression)

Enables MJPEG recompression by inserting huffman tables into the JPG data.

-v: (Comfort: Verbose)

Verbose (debug) mode to gain additional information about detected streams and recompression success/failure. If you want a file with these informations, forward the output to it, like this: "precomp -v input_filename > verbose.txt".

-i: (Comfort: Ignore_Positions)

In verbose mode, you can see the position of streams in the file. With this parameter, you can ignore certain streams.

-s: (Comfort: Minimal_Size)

With this parameter, you can choose the minimal size of a stream that will be processed. The default is 4 bytes. Setting it to higher values (around 50-200 bytes) sometimes improves recompression, especially in intense or brute mode.

Results

Some results to demonstrate the capabilities of Precomp can be found at the Results page.

Future work

FAQ

I tried to compress a file with Precomp and it didn't get smaller. Why?

Precomp couldn't find any compressed streams in the file and bZip2 compression didn't help either.

Is the source code for Precomp available?

Not yet, because it is very messy at the moment, but it will be in the future.

Are there any known bugs?

There are some bugs that lead to crashes on very special corrupt files, but these are very unusual. Nevertheless, Precomp is far from being complete, so if you find a bug, send me a bug report.

I have found a bug. How to report it?

Send a mail to "schnaader AT gmx.de", preferably with "[Precomp]" in subject (you can also use this link) with a description of the bug and if you want (and if it is less than 10 MB), the file you wanted to precompress/restore.

What is the difference between using Precomp or Multivalent for PDF files?

The main difference is that PDF files compressed with Multivalent can't be restored bit-to-bit-identical because Multivalent is a lossy compression method (although it is doesn't lose the PDF content). So if you just want to compress PDF files and to have fast access to them later on, use Multivalent. If you want to get them smaller than Multivalent (even in compact mode) does, or want to be sure the file is bit-to-bit-identical with the original PDF, use Precomp. You can also use Precomp on PDF files compressed with Multivalent.

The precompression for PNG, GIF and ZIP files is bad, although verbose mode says they can be decompressed completely.

The decompression of those files is well-defined, but there are many ways to recompress them. Especially zLib can be tuned with deflateTune(), which is not supported by Precomp because there are simply too much variations to try. I'm working on this.

Contact

Use this link to send comments, criticism, bug reports, etc.

Credits

Thanks for support, help and comments: