How can I change metadata fields, CreationDate and ModificationDate, when I create a pdf with Reportlab?
- 11,722
- 5
- 72
- 103
- 73
- 1
- 1
- 7
2 Answers
Take a look at where modification and creation dates are set:
D['ModDate'] = D["CreationDate"] = \
Date(ts=document._timeStamp,dateFormatter=self._dateFormatter)
# ...
return PDFDictionary(D).format(document)
Basically, metadata is a dictionary saved at the end of binary string, start of string is file contents (document).
Inside Reportlab the workflow you ask about can be:
- create canvas
- draw something on it
- get
documentfrom canvas - create
PDFDictionarywith artificial mod and create dates - format document with
PDFDictionary - save to file
Change metadata of pdf file with pypdf also attempts similar goal.
- 4,173
- 2
- 19
- 39
The ReportLab (currently 3.5) Canvas provides public methods, like Canvas.setAuthor(), to set the /Author, /Title, and other metadata fields (called "Internal File Annotations" in the docs, section 4.5).
However, there is no method for overriding the /CreationDate or /ModDate.
If you only need to change the formatting of the dates, you can simply use the Canvas.setDateFormatter() method.
The methods described above modify a PDFInfo object, as can be seen in the source, but this is part of a private PDFDocument (as in Canvas._doc.info).
If you really do need to override the dates, you could either hack into the private parts of the canvas, or just search the content of the resulting file object for /CreationDate (...) and /ModDate (...), and replace the value between brackets.
Here's a quick-and-dirty example that does just that:
import io
import re
from reportlab.pdfgen import canvas
# write a pdf in a file-like object
file_like_obj = io.BytesIO()
p = canvas.Canvas(file_like_obj)
# set some metadata
p.setAuthor('djvg')
# ... add some content here ...
p.save()
# replace the /CreationDate (similar for /ModDate )
pdf_bytes = file_like_obj.getvalue()
pdf_bytes = re.sub(b'/CreationDate (\w*)', b'/CreationDate (D:19700101010203+01)', pdf_bytes)
# write to actual file
with open('test.pdf', 'wb') as pdf:
pdf.write(pdf_bytes)
The example above just illustrates the principle. Obviously one could use fancy regular expressions with lookaround etc.
From the pdf spec:
Date values used in a PDF shall conform to a standard date format, which closely follows that of the international standard ASN.1 (Abstract Syntax Notation One), defined in ISO/IEC 8824. A date shall be a text string of the form
( D : YYYYMMDDHHmmSSOHH' mm )
- 11,722
- 5
- 72
- 103