Precomp utility
Current version: Precomp v0.4.7, lprepaq v1.3, paq8o8pre v2Download - Changes - Results - FAQ - Contact
Precomp on GitHub
Precomp and the source code are now available on GitHub.What is Precomp?
Precomp is a command line precompressor. You can use it to achieve better compression on some filetypes (works on files that are compressed with zLib or the Deflate compression method, and on GIF files). Precomp tries to decompress the streams in those files, and if they can be decompressed and "re"-compressed so that they are bit-to-bit-identical to the original stream, the decompressed stream can be used instead of the compressed one.The result is a .pcf file (PCF = PreCompressedFile) that contains more decompressed data than the original file. Note that by default, the result file is compressed using bZip2, but you can also turn off compression to get a file larger than the original file and compress it with a method stronger than bZip2. This will lead to even better compression results.
Since version 0.4.2, Precomp is available for Linux, too. The Linux and Windows versions are completely compatible, PCF files are exchangeable between Windows and Linux systems.
What is Precomp Comfort?
Precomp Comfort is a Windows-only variation of Precomp. It supports drag and drop of single files and uses an INI file for the parameters.It is included in the ZIP file. Precomp.exe is the original version, Precomf.exe is the Comfort version.
What is lprepaq?
lprepaq combines lpaq6 by Matt Mahoney and Precomp. It first precompresses the input file, then compresses it using the powerful PAQ compression method. lprepaq is a complete compressor/decompressor, so use this if you just want to compress your files.What is prepaq?
prepaq v2 (aka paq8o8pre v2) by Jan Ondrus combines paq8o8 by Matt Mahoney and Precomp. It first precompresses the input file, then compresses it using the powerful PAQ compression method. Just like lprepaq, prepaq is a complete compressor/decompressor, but is it much slower than lprepaq and compression is better.Filetypes
Here is a list of filetypes that can probably achieve better compression with Precomp, along with notes how you can check if improvement is possible.Note that this list is not complete, and that other filetypes can contain Deflate or zLib streams, too, but you should use the intense mode parameter (-intense) for them.
- JPG Precomp uses packJPG by Matthias Stirner to losslessy compress JPG images.
- MJPEG MJPEG is a video format that consists of JPG images without huffman tables. Precomp inserts them so that packJPG is able to compress the images.
- ZIP/JAR Most ZIP files use Deflate for compression. JAR files basically are ZIP files with an additional manifest for use with Java.
- PNG PNG uses Deflate to compress its filtered image data.
- GIF The GIF format uses LZW to compress its image data.
- GZ GZip files use Deflate for compression.
- BZ2 bZip2 is a format often used in Linux environments.
- SWF Macromedia's Shockwave Flash files can use zLib compression since Version 6.
- MIME Base64 This encoding is used to attach binary files to e-mails.
- SVGZ These files contain SVG files compressed with GZip.
- ODT OpenOffice Document files consist of zipped XML data.
- SIS (intense mode only) These files contain informations about software installation on Symbian OS for mobile phones. They use zLib compression.
- 3DM (intense mode only) This is a file format for 3D geometry used by Rhino3D that contains zLib streams.
- zeno (intense mode only) Zeno is a file format used by e.g. the german Wikipedia DVD.
Check: "FlateDecode" appears in the file.
Check: First three bytes of file are CWS (instead of FWS for uncompressed files).
Download
Precomp (Windows and Linux) v0.4.7: precomp.zip (2300 KB)lprepaq v1.3 (including source): lprepaq.zip (259 KB)
Note: Perhaps you'll be asked for MSVCR80.DLL. Download it here.
prepaq v2 (aka paq8o8pre v2, including source): paq8o8pre.zip (311 KB)
Old versions
Precomp is not backwards compatible. If you want to recompress some PCF file made with a different version of Precomp, you'll have to use the binaries of the older versions:Google Drive folder containing old Precomp versions
How to use it
Easiest way (lprepaq):"lprepaq 5 input_filename output_filename" to compress a file.
"lprepaq d input_filename output_filename" to decompress a file.
5 selects 99 MB memory. Options range from 0 (6 MB) to 9 (1539 MB).
In general, option N uses 3 + 3*2N MB.
Easy way (Precomp Comfort, on Windows):
Drag and drop a file on precomf.exe to precompress the file into a .pcf file with the same name.
To get back the original file, do the same with the .pcf file.
Using the command line: (Precomp)
"precomp input_filename" to precompress a file into a .pcf file with the same name
"precomp -r pcf_filename" to restore the original file
Errorlevels
For batch jobs (Windows) or shell scripts (Linux), you'll find these errorlevels useful that are returned:
Error level | Description |
---|---|
0 | No error |
1 | Various errors (f.e. file access errors) |
2 | No streams could be decompressed |
3 | Disk full |
4 | Temporary file disappeared |
5 | Parameter error: Ignore position too big |
6 | Parameter error: Identical byte size too big |
7 | Parameter error: Recursion depth too big |
8 | Parameter error: Recursion depth set more than once |
9 | Parameter error: Minimal identical byte size set more than once |
10 | Parameter error: Don't use a space after -o |
11 | Parameter error: More than one output file |
12 | Parameter error: More than one input file |
13 | Ctrl-C detected (user break) |
14 | Parameter error: Intense mode recursion limit too big |
15 | Parameter error: Brute mode recursion limit too big |
Additional switches: (Precomp / Precomp Comfort)
-longhelp:
Only common switches are shown by default. This switch will display a long and detailed help. -o[filename]:
Specifies the output file name. For precompression, default is the original file name with extension .pcf, for restoring the original file, it is the original file name. If the output file exists, you will be asked if you want to overwrite it. Nevertheless, you can specify a different output file name with this option.
-c[bn]: (Comfort: Compression_Method)
The first step that Precomp does is to decompress all the streams in the input file. The output is either directly compressed using bZip2 ("-cb", default setting) or left as it is ("-cn"), i.e. if an external compressor is to be used.
-n[bn]:
This switch is for converting a PCF file from no compression to bZip2 compression and vice versa without running Precomp on the original file again.
-zl: (Comfort: zLib_Levels)
After precompressing a file with Precomp, it tells you how to use this parameter to speed up the precompression the next time you precompress this file. These are one or more two-digit numbers. The first digit is the compression level, the second digit is the memory settings which are tried on this file. However, using this on a different file could lead to Precomp missing some compressed parts of it.
-t: (Comfort: Compression_Types)
Enables or disables detecting of certain compression types. For command-line use, there are two variants:
t+ enables certain types and disables the others, while t- disables certain types and enables the rest.
Using -t-j for example disables JPEG recompression and leaves all other types as before, using -t+pf enables only PDF and GIF precompression, disabling everything else.
-d: (Comfort: Maximal_Recursion_Depth)
Sets the maximal recursion depth. Several streams can contain additional streams inside, for example ZIP or MIME Base64 streams. This switch specifies the maximum depth up to which Precomp will look for streams. Setting this to 0 disables recursion, the default is 10 which should be enough for most filetypes.
-f: (Comfort: Fast_Mode)
Fast mode to speed up Precomp. This switch will treat any stream like the first validated one and not test any other compression methods. This will work fine on files that use only a few compression methods, but will result in weaker compression for files with many compression methods used. Good candidates are PDF and ZIP/JAR/GZ files. Bad candidates are archives containing many different files.
With fast mode turned off, Precomp will display a message after precompression in case only one level combination was applied to the input file. This means that fast mode will do absolutely the same on this file, but faster.
-intense: (Comfort: Intense_Mode)
Intense mode will slow down Precomp much. It looks for raw zLib headers, and recognizes more file formats like SIS and SWF or special formats used only for one single program. However, the zLib header consists of only 2 bytes, so there can be many false-detected streams that aren't zLib streams but are treated like those, which results in a slower and more instable behaviour.
Intense mode can be combined with fast mode, but it could happen that a false-detected stream is the first stream and prevents further real streams to be detected, so combine them with caution. Use this mode if you have files that use zLib compression but are not supported in normal mode (SIS, SWF, ISO files...).
-brute: (Comfort: Brute_Mode)
Brute mode will slow down Precomp extremely. It assumes that there could be zLib streams without headers everywhere. This even recognizes most exotic file formats that don't include zLib headers but will take very much time (more than a minute even for filesizes around 10 KB). If you should have data that has to be processed with this mode, better try to add zLib headers on your own.
Brute mode can be combined with fast mode, but disables intense mode.
-pdfbmp[+-]: (Comfort: PDF_BMP_Mode)
This precedes PDF images with a BMP header to improve compression and speed, especially for PAQ.
-progonly[+-]: (Comfort: JPG_progressive_only)
Recompresses progressive JPGs only. Again, this is especially useful for PAQ which usually has a better JPG compression than packJPG, but lacks progressive JPG support.
-mjpeg[+-]: (Comfort: MJPEG_recompression)
Enables MJPEG recompression by inserting huffman tables into the JPG data.
-v: (Comfort: Verbose)
Verbose (debug) mode to gain additional information about detected streams and recompression success/failure. If you want a file with these informations, forward the output to it, like this: "precomp -v input_filename > verbose.txt".
-i: (Comfort: Ignore_Positions)
In verbose mode, you can see the position of streams in the file. With this parameter, you can ignore certain streams.
-s: (Comfort: Minimal_Size)
With this parameter, you can choose the minimal size of a stream that will be processed. The default is 4 bytes. Setting it to higher values (around 50-200 bytes) sometimes improves recompression, especially in intense or brute mode.
Results
Some results to demonstrate the capabilities of Precomp can be found at the Results page.FAQ
I tried to compress a file with Precomp and it didn't get smaller. Why?Precomp couldn't find any compressed streams in the file and bZip2 compression didn't help either.
Is the source code for Precomp available?
Precomp and the source code are now available on GitHub.
What is the difference between using Precomp or Multivalent for PDF files?
The main difference is that PDF files compressed with Multivalent can't be restored bit-to-bit-identical because Multivalent is a lossy compression method (although it is doesn't lose the PDF content). So if you just want to compress PDF files and to have fast access to them later on, use Multivalent. If you want to get them smaller than Multivalent (even in compact mode) does, or want to be sure the file is bit-to-bit-identical with the original PDF, use Precomp. You can also use Precomp on PDF files compressed with Multivalent.
The precompression for PNG, GIF and ZIP files is bad, although verbose mode says they can be decompressed completely.
The decompression of those files is well-defined, but there are many ways to recompress them. Especially zLib can be
tuned with deflateTune(), which is not supported by Precomp because there are simply too much variations to try. I'm
working on this.
Contact
Use this link to send
comments, criticism, bug reports, etc.