PreComp utility
Current version: Precomp v0.4, lprepaq v1.3, paq8o8pre v2 (21.03.2009)
Download - Changes - Results - FAQ - Contact

What is Precomp?

Precomp is a command line precompressor. You can use it to achieve better compression on some filetypes (works on files that are compressed with zLib or the Deflate compression method, and on GIF files). Precomp tries to decompress the streams in those files, and if they can be decompressed and "re"-compressed so that they are bit-to-bit-identical to the original stream, the decompressed stream can be used instead of the compressed one.

The result is a .pcf file (PCF = PreCompressedFile) that contains more decompressed data than the original file. Note that this file is larger than the original file, but if you compress it with a compression method stronger than Deflate, the compression is better than before (or use lprepaq to get it precompressed and compressed in one step).

What is Precomp Comfort?

Precomp Comfort is a variation of Precomp. It supports drag and drop of single files and uses an INI file for the parameters.
It is included in the ZIP file. Precomp.exe is the original version, Precomf.exe is the Comfort version.

What is lprepaq?

lprepaq combines lpaq6 by Matt Mahoney and Precomp. It first precompresses the input file, then compresses it using the powerful PAQ compression method. lprepaq is a complete compressor/decompressor, so use this if you just want to compress your files.

What is prepaq?

prepaq v2 (aka paq8o8pre v2) by Jan Ondrus combines paq8o8 by Matt Mahoney and Precomp. It first precompresses the input file, then compresses it using the powerful PAQ compression method. Just like lprepaq, prepaq is a complete compressor/decompressor, but is it much slower than lprepaq and compression is better.

Filetypes

Here is a list of filetypes that can eventually achieve better compression with Precomp and how you can check if they can.
Note that this list is not complete, and that other filetypes can contain Deflate or zLib streams, too, but you should use the slow mode parameter (-slow) for them.
Download

Precomp (and Precomp Comfort) v0.4: precomp.zip (392 KB)

lprepaq v1.3 (including source): lprepaq.zip (259 KB)

Note: Perhaps you'll be asked for MSVCR80.DLL. Download it here.

prepaq v2 (aka paq8o8pre v2, including source): paq8o8pre.zip (311 KB)

Old versions

Precomp is not backwards compatible. If you want to recompress some PCF file made with a different version of Precomp, you'll have to download it here:
Precomp v0.3 Precomp v0.31 Precomp v0.32 Precomp v0.33 Precomp v0.34
Precomp v0.35 Precomp v0.36 Precomp v0.37 Precomp v0.38

How to use it

Easiest way (lprepaq):
"lprepaq 5 input_filename output_filename" to compress a file.
"lprepaq d input_filename output_filename" to decompress a file.
5 selects 99 MB memory. Options range from 0 (6 MB) to 9 (1539 MB).
In general, option N uses 3 + 3*2N MB.

Easy way (Precomp Comfort):
Drag and drop a file on precomf.exe to precompress the file into a .pcf file with the same name.
To get back the original file, do the same with the .pcf file.

Using the command line: (Precomp)
"precomp input_filename" to precompress a file into a .pcf file with the same name
"precomp -rpcf_filename" to restore the original file (-d is still valid, too)

Errorlevels
For batch jobs, you'll find these errorlevels useful that are returned:
Error level Description
0 No error
1 Various errors (f.e. file access errors)
2 Nothing could be decompressed (PCF output is the same except PCF header)
3 Disk full
4 Temporary file disappeared
5 Parameter error: Ignore position too big
6 Parameter error: Identical byte size too big
7 Parameter error: Recursion level too big
8 Parameter error: Recursion level set more than once
9 Parameter error: Minimal identical byte size set more than once
10 Parameter error: Don't use a space after -o
11 Parameter error: More than one output file
12 Parameter error: More than one input file


Additional switches: (Precomp / Precomp Comfort)

-o[filename]:

Specifies the output file name. For precompression, default is the original file name with extension .pcf, for "decompression", it is the original file name. If the output file exists, you will be asked if you want to overwrite it. Nevertheless, you can specify a different output file name with this option.

-c and -m: (Comfort: Compression_Levels, Memory_Levels)

After precompressing a file with Precomp, it tells you how to use these both parameters to speed up the precompression the next time you precompress this file. These are the compression level and memory settings which are tried on this file. If you would use this on a different file, it could be that Precomp misses some compressed parts of it.

-t: (Comfort: Compression_Types)

Enables or disables detecting of certain compression types. For command-line use, there are two variants:
t+ enables certain types and disables the others, while t- disables certain types and enables the others.
Using -t-j for example disables JPEG recompression and leaves all other types as before, using -t+pf enables only PDF and GIF precompression, disabling everything else.

-l: (Comfort: Maximal_Recursion_Level)

Sets the maximal recursion level. Several streams can contain additional streams inside, for example ZIP or MIME Base64 streams. This switch specifies the maximal "depth" where Precomp will look for streams. Setting this to 0 disables recursion, the default is 10 which should be enough for most filetypes.

-f: (Comfort: Fast_Mode)

Fast mode to speed up Precomp. This uses the first found compression for all streams instead of trying all 81 combinations when not sure. This will work fine on files that use only a few compression methods, but will result in worse compression for files with many compression methods used. Good candidates are PDF and ZIP/JAR/GZ files. Bad candidates are archives containing many files.
In non-fast mode, there is a message when only one level combination is used. This means that fast mode will do absolutely the same on this file, but faster.

-slow: (Comfort: Slow_Mode)

Slow mode will slow down Precomp much. It looks for raw zLib headers, and recognizes more file formats like SIS and SWF or special formats used only for one single program. However, the zLib header consists of only 2 bytes, so there can be many false-detected streams that aren't zLib streams but are handled like them, which results in a slower and more instable behaviour.
Slow mode can be combined with fast mode, but it could happen that a false-detected stream is the first stream and prevents further real streams to be detected, so combine them with caution. Use this mode if you have files that use zLib compression but are not supported (SIS, SWF, game ISO files...).

-brute: (Comfort: Brute_Mode)

Brute mode will slow down Precomp very much. It assumes that there could be zLib streams without headers everywhere. This even recognizes most exotic file formats that don't include zLib headers but will take very much time (more than a minute even for filesizes around 10 KB). If you should have data that has to be processed with this mode, better try to add zLib headers on your own.
Brute mode can be combined with fast mode, but disables slow mode.

-pdfbmp[+-]: (Comfort: PDF_BMP_Mode)

This precedes PDF images with a BMP header to improve compression and speed, especially for PAQ.

-progonly[+-]: (Comfort: JPG_progressive_only)

Recompresses progressive JPGs only. Again, this is especially useful for PAQ which usually has a better JPG compression than packJPG, but lacks progressive JPG support.

-mjpeg[+-]: (Comfort: MJPEG_recompression)

Enables MJPEG recompression by inserting huffman tables into the JPG data.

-v: (Comfort: Verbose)

Verbose (debug) mode to gain additional information about detected streams and recompression success/failure. If you want a file with these informations, forward the output to it, like this: "precomp -v input_filename > verbose.txt".

-i: (Comfort: Ignore_Positions)

In verbose mode, you can see the position of streams in the file. With this parameter, you can ignore certain streams.

-s: (Comfort: Minimal_Size)

With this parameter, you can choose the minimal size of a stream that will be processed. The default is 4 bytes. Setting it to higher values (around 50-200 bytes) sometimes improves recompression, especially in slow or brute mode.

Results

Some results to demonstrate the capabilities of Precomp can be found at the Results page.

Future work
FAQ

I tried to compress a file precompressed with Precomp and it didn't get smaller.

There can be two reasons for this: One is that perhaps Precomp can't find any compressed streams in the file or they are too small to make any difference. The other is that the compressor you used after precompression is worse than Deflate (or you didn't use any).

Is the source code for Precomp available?

Not yet, because it is very messy at the moment, but it will be.

Are there any known bugs?

There are some bugs that lead to crashes on very special corrupt files, but these are very unusual. Nevertheless, Precomp is far from being complete, so if you find a bug, send me a bug report.

I have found a bug. How to report it?

Send a mail to "schnaader AT gmx.de", preferably with "[Precomp]" in subject (you can also use this link) with a description of the bug and if you want (and if it is less than 10 MB), the file you wanted to precompress/restore.

What is the difference between using Precomp or Multivalent for PDF files?

The main difference is that PDF files compressed with Multivalent can't be restored bit-to-bit-identical because Multivalent is a lossy compression method (although it is doesn't lose the PDF content). So if you just want to compress PDF files and to have fast access to them later on, use Multivalent. If you want to get them smaller than Multivalent (even in compact mode) does, or want to be sure the file is bit-to-bit-identical with the original PDF, use Precomp. You can also use Precomp on PDF files compressed with Multivalent.

The precompression for PNG, GIF and ZIP files is bad, although verbose mode says they can be decompressed completely.

The decompression of those files is well-defined, but there are many ways to recompress them. Especially zLib can be tuned with deflateTune(), which is not supported by Precomp because there are simply too much variations to try. I'm working on this, but at the moment I can't say if I will succeed.
Because of the easy decompression, there will be a parameter for lossy compression in future versions. This won't restore files bit-to-bit identical, but the file content will stay the same.

Contact

Use this link to send comments, criticism, bug reports, etc.

Credits

Thanks for support, help and comments: