28

Similarly to this question:

Convert a PDF to greyscale on the command line in FLOSS?

I have a PDF-document and want to convert it to pure black and white. So I want to discard halftones. To convert to grayscale with ghostscript I can use this command:

gs \
 -sOutputFile=output.PDF \
 -sDEVICE=pdfwrite \
 -sColorConversionStrategy=Gray \
 -dProcessColorModel=/DeviceGray \
 -dCompatibilityLevel=1.4 \
  input.PDF < /dev/null

What do I have to change to get monochrome e.g. only the colors black and white and no halftones?

niklasfi
  • 641

12 Answers12

8

The last suggestion indeed only converts to grayscale and then only works if the underlying doc uses setrgbcolor. This did not work for me, since I had a doc, that used setcolor.

I had success with redefining setcolor to always set the color to 0,0,0:

gs -o <output-file.pdf> -sDEVICE=pdfwrite \
-c "/osetcolor {/setcolor} bind def /setcolor {pop [0 0 0] osetcolor} def" \
-f <input-file.ps>

It has been 15+ years since I did any PostScript hacking, so the above may be lame, incorrect or even accidental - if you know how to do better, please suggest.

Surge
  • 104
  • 1
  • 2
5

I could not find out which procedure for color selection is used in the PDFs I am dealing with. This is why I convert to grayscale PostScript first:

gs -o gray.ps -sDEVICE=ps2write -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -dCompatibilityLevel=1.4 -f colored.pdf

As the PDFs I struggle to print may contain confidential information which is cleverly "redacted" by having the color set to white, I need to employ some sort of thresholding. This is what I came up with:

gs -o thresholded.pdf -sDEVICE=pdfwrite -c "/osetgray {setgray} bind def /setgray {0.5 lt {0} {1} ifelse osetgray} def" -f gray.ps

For those (like me) unfamiliar with PostScript's stack programming style, this re-defines setgray as:

setgray(value) {
   original_setgray(value < 0.5 ? 0 : 1)
}

To verify the feasibility of this approach, you can create a simple PostScript file using enscript:

 printf 'Dark \x00color{0.9 0.9 0.9} Light\n' | enscript --escapes --no-header -o colored.ps

Then proceed to use both commands as given above.

The gray.ps produced by Ghostscript's ps2write will contain the line /G/setgray load def. This provides /G as an alias to /setgray. Somewhere later the actual typesetting happens (irrelevant lines omitted):

(Dark )Tj
0.898 G
( Light)Tj

Only in these circumstances the thresholded.pdf will show "Dark" as fully black and "Light" as fully white.

Hermann
  • 244
4

I am not sure if the following suggestion will work... but it may be worth to try out:

  1. convert the PDF to PostScript using the simple pdf2ps utility
  2. convert that PostScript back to PDF while using a re-defined /setrgbcolor PostScript operator

These are the commands:

First

  pdf2ps color.pdf color.ps

This gives you color.ps as output.

Second

gs \
-o bw-from-color.pdf \
-sDEVICE=pdfwrite \
-c "/setrgbcolor{0 mul 3 1 roll 0 mul 3 1 roll 0 mul 3 1 roll 0 mul add add setgray}def" \
-f color.ps
Kurt Pfeifle
  • 13,079
3

It's not ghostscript, but with imagemagick this is quite simple:

 convert -monochrome input.pdf output.pdf
o-town
  • 63
0

I had to modify the solution suggested by Surge (above) a little bit for my file:

Step 1: Convert the coloured.pdf to coloured.ps

gswin64c -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=coloured.ps coloured.pdf

Step 2: Convert the coloured.ps to blackandwhite.pdf

gswin64c ^
-dBATCH -dNOPAUSE -q ^
-sOutputFile=blackandwhite.pdf ^
-sDEVICE=pdfwrite ^
-c "/osetrgbcolor {setrgbcolor} bind def /setrgbcolor {pop pop pop 0 0 0 osetrgbcolor} def" ^
-f coloured.ps

I did not have any success with setcolor operator as suggested by Surge. So I decided to play with other operators that can set colour is postscript like setgray, setrgbcolor, setcmykcolor, etc.

What I understand is that code in quotes following -c switch is postscript. It tells to bind the original definition of setrgbcolor with a new custom operator called osetrgbcolor . Now define a new instance setrgbcolor that pops the 3 inputs expected by original setrgbcolor and replace them with 0 0 0 i.e. red=0 green=0 blue=0. Thus 0 0 0 is passed to the operator osetrgbcolor custom defined earlier

PS1: The above code was implemented in windows command prompt

PS2: I was a total stranger to Postscript coding. I got a jumpstart from youtuber "John's Basement" in the video series Postscript Tutorial. I referred Adobe's Postscript Language Reference to understand the operator setrgbcolor and operands that it accepted.

ednoy
  • 9
0

Just wanted to chime in that this was a handy post. Been using k2pdfopt to format pdfs for kindle usage. For years used gImageReader edited the pdf with brightness and contrast and exported to a image file. The big problem was I had to manually right click for each image of the pdf which is tedious to say the least and a ton of tinkering. Anyways I found with a little trial and error that the post above was helpfull but I would definitly add thelines below, colorspace gray being very important and used with posterize seems to clear up alot of garbage. Will be using this handy command with pdfarranger if nessesary and k2pdfopt!

convert -density 300 -colorspace Gray -posterize 2 -deskew 80% input.pdf output.pdf

The only other thing is with imagemagick I had to change the policy file to read write for pdf usage. There is a ton documentation elsewhere for that - ty internet!

Phil
  • 9
0

The current commands to force vector text and linework to black (on white) are as seen on this page as PDF.

gs -sDEVICE=pdfwrite -dBlackText -dBlackVector -dBlackThresholdL=96 -o BW.pdf -f ColourPDF.pdf

enter image description here

Now for the real issue, as the OP request was NO halftones. Thus, images could generally only become black or white. Black images are not much use as we see where the grey text on grey field is now black on black. This is where a Monochrome Printer would use halftones to provide false contrast.

enter image description here

So the only means to ensure near black objects on a white background is keep or remove all blue green and red pixels as images.

The switch to do that is -dFILTERIMAGE thus we get nearest to Black and White.

enter image description here

From

gs -sDEVICE=pdfwrite -dBlackText -dBlackVector -dBlackThresholdL=95 -dFILTERIMAGE -o BW.pdf -f ColourPDF.pdf

NOTE: there are other cross platform command line tools such as cpupdf that allow to extract an image and replace with a monochrome of the same physical size. Or for blacklines and blacktext in ANY single colour, not just black, there is coherent cpdf.

K J
  • 1,248
0

This looks like it would work:

1) Convert the file to monochrome with gs

gs -sDEVICE=psmono \
  -dNOPAUSE -dBATCH -dSAFER \
  -sOutputFile=combined.ps \
  first.pdf \
  second.ps \
  third.eps [...]

3) Convert the Postscript file back to a PDF with ps2pdf or gs

(credit to: http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files)

Ed L
  • 139
0

for gray scale PDF:

By using GhostScript

IN PHP code, use this script

exec("'gs' '-sOutputFile=outputfilename.pdf' '-sDEVICE=pdfwrite' '-sColorConversionStrategy=Gray' '-dProcessColorModel=/DeviceGray' '-dCompatibilityLevel=1.4'  'inputfilename.pdf'",$output);

usefull url
http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files

-1

ImageMagick can do it.

convert -posterize 2 input.pdf output.pdf

Comes out nice & crisp, and about a 3rd the file size of the color original.

DarkDiamond
  • 1,919
  • 11
  • 15
  • 21
Jerryk
  • 1
-1
gs \
   -sDEVICE=pdfwrite \
   -sProcessColorModel=DeviceGray \
   -sColorConversionStrategy=Gray \
   -dOverrideICC \
   -o out.pdf \
   -f in.pdf

This command converts file in.pdf to grayscale (GS 9.10).

source: this answer to StackOverflow question "Convert PDF to grayscale without rasterization?"

Geremia
  • 573
-1

For pure black and white PDF, you need to convert it into ps format then into PDF for postscript:

exec(" gs -sDEVICE=psmono  -dNOPAUSE -dBATCH -dSAFER  -sOutputFile=combined.ps  $pdf");

postscript to PDF -> black and white

exec(" gs -sDEVICE=pdfwrite   -dNOPAUSE -dBATCH -dSAFER  -sOutputFile=file_pdf.pdf  filename.ps");