pdf-compress v1.9.1 [10 Nov 2023] - by Dominic
Description
GNU/Linux program to compress an overly large pdf file to smaller monochrome. Typical use case is a file generated by scanning with unnecessarily high resolution and/or color. The objective is a file that although not of the same quality as the original, is still legible and much smaller (90%+ size reductions are common). This is achieved by conversion to 1-bit pixel mapping, so is typically most effective for dark text on a white background.
If destination path/file is unspecified pdf-compress.sh will create a file in the same location as source, with -1 suffixed to the filename (before .pdf extension). If destination is a directory then the created file will have the same name as the source file. pdf-compress.sh will also set the destination file to have the same ownership, group and modtime as the original file (but ownership can only be changed if running as superuser).
If a target maximum percentage size is set (-r option) and the initial compression attempt does not achieve at least this reduction, pdf-compress.sh will retry, varying the GhostScript resize parameter (initial value 300) until (hopefully) it can achieve something close to the specified value (but absolute accuracy of resizing is not to be expected).
If qpdf is present it is used (after GhostScript) to make a further c.10% filesize saving with no effect on quality; with -r this 10% is included in the specified target so the resulting file will have slightly better quality, not smaller size, than would have been achieved without qpdf. Downside: compressing with qpdf is relatively slow.
If exiftool is present it is used to update the 'Producer' metadata in the destination file to show that it was processed by pdf-compress.sh, and when - see also options -m and -n.
With option -s, if the final page (not page 1) appears empty, it is stripped.
[Deprecated: specify additional options to be passed to the ImageMagick convert operation within pdf-compress.sh by using -p; for instance, to hide a watermark try -p '-white-threshold 60%'.]
Usage
./pdf-compress.sh [options] source_path/filename [destination_path[/filename]]
Options
-b - set background white threshold (lower value turns more non-white background white; default 0.7, advisable range 0.6 - 0.8)
-f - replace original file
-h - show this help and exit
-l - show changelog and exit
-m - test (without making any changes) if file has previous been processed by pdf-compress.sh (like -n below); exit code 1 if so or 0 if not (can be combined with -q for silent checking) - requires exiftool
-n - skip making any changes if file has previously been processed by pdf-compress.sh (i.e. if the 'Producer' metadata contains 'pdf-compress.sh') - requires exiftool
-p 'param1 param2' - additional parameters for ImageMagick 'convert' command [deprecated]
-q - be quieter (1 line output if successful)
-r num - target maximum percentage size vs original (so, 40 means at least 60% file size reduction)
-s - remove any apparently-empty final page
-t path - directory to use for temporary files, which can be large (default: /tmp)
-y - overwrite destination file if it already exists
Dependencies
bash(4+) convert(ImageMagick) exiftool* gs(GhostScript) qpdf*
* not required but used for additional functionality if available
License
Copyright © 2024 Dominic Raferd. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Changelog
1.9 [09 Nov 2023] - add option -s to strip any blank final page if option -s is selected (unless the file is a single blank page) [option change from 1.8]
1.8 [02 Nov 2023] - strip any blank final page (unless the file is a single blank page or option -s is selected)
1.7 [27 Oct 2023] - fix to pdf format identification
1.6 [19 Oct 2023] - minor fixes and shellcheck conformations
1.5 [15 Oct 2023] - add error logging unless running from a terminal (via echolog and quit functions)
1.4 [03 Jan 2023] - add -b (allow white background threshold to be varied)
1.3 [28 Apr 2022] - add -m (check if already processed without taking any other action)
1.2 [11 Apr 2022] - add -f (force overwrite) option
1.1 [08 Apr 2022] - improved output using GS 'Hybrid2' setting by Adam Lesser - kudos (see https://bugs.ghostscript.com/show_bug.cgi?id=694762)
1.0 [31 Mar 2022] - minor changes
0.9 [03 Mar 2022] - add -n option, set 'Producer' metadata (no effect if exiftool unavailable), remove -v option, make -q option non-silent
0.8 [13 Feb 2022] - modify -r option to be an 'aim for' percentage size reduction (by looping), remove -n (negate) option as irrelevant, add use of qpdf (if present) for additional c.10% size saving
0.7 [10 Feb 2022] - improved (and simplified) by using gs pngmonod device, align owner/group, permissions and modtime of created file with those of the original
0.6 [05 Aug 2020] - added more gs settings (kudos: Enno Nagel)
0.5 [16 Nov 2017] - negate by default (-n to non-negate), default resize 200 not 100, add -p option
0.4 [23 Apr 2017] - add -n (negative) option
0.3 [16 Oct 2016] - fix for temporary file deletion, make temporary filenames unique
0.2 [21 Sep 2016] - lots of tweaks
0.1 [20 Sep 2016] - initial version
Download pdf-compress.sh
Donation
I have provided this software free gratis and for nothing. If you would like to thank me with a contribution, please let me know and I will send you a link. Thank you!
My Other Sites
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup)
- Finding a 4D Backup Solution
- Web Scraping How To - extracting data from web sites
My Programs
Here is a selection of some (other) programs I have written, most of which run under GNU/Linux from the command line (CLI), are freely available and can be obtained by clicking on the links. Dependencies are shown and while in most cases written and tested on an x86-based Linux server, they should run on a Raspberry Pi, and many can run under Windows using Windows Subsystem for Linux (WSL) or Cygwin. Email me if you have problems or questions, or if you think I could help with a programming requirement.
Backup Utilities
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup) [ GNU/Linux & MS Windows©: 2008-20 ]
- rdiff-backup-regress - GNU/Linux script to regress an rdiff-backup archive. [ GNU/Linux: 2012-24 ]
Debian/Ubuntu kernel and LVM Utilities
- kernel-remove - GNU/Linux script to list the installed GNU/Linux kernels in a Debian-based distro (e.g. Ubuntu), and can be used to remove an unwanted kernel and related packages, updating grub appropriately. [ GNU/Linux-Debian/Ubuntu: 2010-24 ]
- lvm-usage - GNU/Linux script to show available disk space and how it is used; run as cron job to warn if usage is above a set percentage. Provides additional information if LVM is in use. [ GNU/Linux: 2012-24 ]
- lvm-delete-snapshot - GNU/Linux script to remove LVM snapshot that has been left over by another process. [ GNU/Linux: 2012-21 ]
- netnames - GNU/Linux script shows current name, biosdevname and 'predictable name' of network device - helps with network device name scheme migration. [ GNU/Linux-Debian/Ubuntu: 2020-20 ]
- lv-convert2cache - GNU/Linux script to convert an existing LV into a cache LV using a smaller faster device as a cache. [ GNU/Linux: 2022-24 ]
Miscellaneous Programs
- sleepwalker - Windows© program which can be run from a remote machine to 'wake up' a Windows© machine behind a router, wait for it to start and then initiate Remote Desktop session. [MS Windows©: 2008-22]
- numliststat - GNU/Linux program giving statistical value(s) for a piped-in list of numbers. [ GNU/Linux: 2022-24 ]
- relay-enforcer - GNU/Linux program enabling a postfix-based mail server relaying to Gmail to act on reports from Gmail about blocked emails. [ GNU/Linux: 2016-24 ]
- tiny-device-monitor - GNU/Linux program to test webpages (including password-protected) or machines to check they are live; use as a cron job for your own websites, for hardware presenting a webpage, or for any machines with a presence on your local LAN or on the internet. [ GNU/Linux: 2009-24 ]
- form-extractor - GNU/Linux program to extract form tags from a web page or downloaded file. [ GNU/Linux: 2012-24 ]
- mythic-dns-sync - GNU/Linux program to update DNS record at mythic-beasts.com to match local external ip. [ GNU/Linux: 2016-23 ]
- saynoto0870 - For UK, a GNU/Linux script which performs automated lookup of the www.saynoto0870.com database, finding cheap or free geographic number replacements for expensive non-geographic (087* or 084*) numbers. [ GNU/Linux: 2012-12 ]
- bind9-resolved-switch - GNU/Linux program for switching permanently between using bind9 or systemd-resolved as the system DNS resolver. [ GNU/Linux: 2016-24 ]
- unlock - GNU/Linux remote program for easy entering of decrypt passphrase on a remote machine which has root dm-crypt+LUKS. [ GNU/Linux: 2017-18 ]
- wifi-updown - GNU/Linux program to take down wifi interface if there is a working wired interface (or restore wifi if not). [ GNU/Linux: 2018-23 ]
- routefix - GNU/Linux program to restore a default ip traffic route if there is none such (e.g. after running wifi-updown). [ GNU/Linux: 2018-23 ]
- dutree - GNU/Linux program to show a tree-style list of files and directories at the specified location which are greater than the specified size (default 1GB). [ GNU/Linux: 2012-24 ]
- Accounts - Multi-business multi-currency accounting software, uses Access [MS Windows©: 1996-2024]
- Rents Program - Residential lettings/landlord front office program, with many special features for UK market [MS Windows©: 1991-2024]
This section is closed. If you have a question, please submit it by email, thank you.
./pdf-compress.sh myfile.pdf (which is at same directory, no mistake in path of file, I'm sure)
in terminal but it only writes
pdf-compress: (cursor waits without doing anything, I can type characters from keyboard but don't know why script does not compress my pdf file as you define in usage section)
and waits without doing process. What is problem?
thank you for this nice program!
How do I actually install and then use it on Ubuntu 16.04 (I am a bit of a novice....)?
Keep smiling
Spiv