Migration Guide: 1.x to 2.x

PyPDF2<2.0.0 (docs) is very different from PyPDF2>=2.0.0 (docs).

Luckily, most changes are simple naming adjustments. This guide helps you to make the step from PyPDF2 1.x (or even the original PyPdf) to PyPDF2>=2.0.0.

You can execute your code with the updated version and show deprecation warnings by running python -W all your_code.py.

Imports and Modules

PyPDF2.utils no longer exists
PyPDF2.pdf no longer exists. You can import from PyPDF2 directly or from PyPDF2.generic

Naming Adjustments

Classes

The base classes were renamed as they also allow operating with BytesIO streams instead of files. Also, the strict parameter changed the default value from strict=True to strict=False.

PdfFileReader ➔ PdfReader
PdfFileWriter ➔ PdfWriter
PdfFileMerger ➔ PdfMerger

PdfFileReader and PdfFileMerger no longer have the overwriteWarnings parameter. The new behavior is overwriteWarnings=False.

Function, Method, and Property Names

In PyPDF2.xmp.XmpInformation:

rdfRoot ➔ rdf_root
xmp_createDate ➔ xmp_create_date
xmp_creatorTool ➔ xmp_creator_tool
xmp_metadataDate ➔ xmp_metadata_date
xmp_modifyDate ➔ xmp_modify_date
xmpMetadata ➔ xmp_metadata
xmpmm_documentId ➔ xmpmm_document_id
xmpmm_instanceId ➔ xmpmm_instance_id

In PyPDF2.generic:

readObject ➔ read_object
convertToInt ➔ convert_to_int
DocumentInformation.getText ➔ DocumentInformation._get_text : This method should typically not be used; please let me know if you need it.
readHexStringFromStream ➔ read_hex_string_from_stream
initializeFromDictionary ➔ initialize_from_dictionary
createStringObject ➔ create_string_object
TreeObject.hasChildren ➔ TreeObject.has_children
TreeObject.emptyTree ➔ TreeObject.empty_tree

In many places:

getObject ➔ get_object
writeToStream ➔ write_to_stream
readFromStream ➔ read_from_stream

PdfReader class:

reader.getPage(pageNumber) ➔ reader.pages[page_number]
reader.getNumPages() / reader.numPages ➔ len(reader.pages)
getDocumentInfo ➔ metadata
flattenedPages attribute ➔ flattened_pages
resolvedObjects attribute ➔ resolved_objects
xrefIndex attribute ➔ xref_index
getNamedDestinations / namedDestinations attribute ➔ named_destinations
getPageLayout / pageLayout ➔ page_layout attribute
getPageMode / pageMode ➔ page_mode attribute
getIsEncrypted / isEncrypted ➔ is_encrypted attribute
getOutlines ➔ get_outlines
readObjectHeader ➔ read_object_header
cacheGetIndirectObject ➔ cache_get_indirect_object
cacheIndirectObject ➔ cache_indirect_object
getDestinationPageNumber ➔ get_destination_page_number
readNextEndLine ➔ read_next_end_line
_zeroXref ➔ _zero_xref
_authenticateUserPassword ➔ _authenticate_user_password
_pageId2Num attribute ➔ _page_id2num
_buildDestination ➔ _build_destination
_buildOutline ➔ _build_outline
_getPageNumberByIndirect(indirectRef) ➔ _get_page_number_by_indirect(indirect_ref)
_getObjectFromStream ➔ _get_object_from_stream
_decryptObject ➔ _decrypt_object
_flatten(..., indirectRef) ➔ _flatten(..., indirect_ref)
_buildField ➔ _build_field
_checkKids ➔ _check_kids
_writeField ➔ _write_field
_write_field(..., fieldAttributes) ➔ _write_field(..., field_attributes)
_read_xref_subsections(..., getEntry, ...) ➔ _read_xref_subsections(..., get_entry, ...)

PdfWriter class:

writer.getPage(pageNumber) ➔ writer.pages[page_number]
writer.getNumPages() ➔ len(writer.pages)
addMetadata ➔ add_metadata
addPage ➔ add_page
addBlankPage ➔ add_blank_page
addAttachment(fname, fdata) ➔ add_attachment(filename, data)
insertPage ➔ insert_page
insertBlankPage ➔ insert_blank_page
appendPagesFromReader ➔ append_pages_from_reader
updatePageFormFieldValues ➔ update_page_form_field_values
cloneReaderDocumentRoot ➔ clone_reader_document_root
cloneDocumentFromReader ➔ clone_document_from_reader
getReference ➔ get_reference
getOutlineRoot ➔ get_outline_root
getNamedDestRoot ➔ get_named_dest_root
addBookmarkDestination ➔ add_bookmark_destination
addBookmarkDict ➔ add_bookmark_dict
addBookmark ➔ add_bookmark
addNamedDestinationObject ➔ add_named_destination_object
addNamedDestination ➔ add_named_destination
removeLinks ➔ remove_links
removeImages(ignoreByteStringObject) ➔ remove_images(ignore_byte_string_object)
removeText(ignoreByteStringObject) ➔ remove_text(ignore_byte_string_object)
addURI ➔ add_uri
addLink ➔ add_link
getPage(pageNumber) ➔ get_page(page_number)
getPageLayout / setPageLayout / pageLayout ➔ page_layout attribute
getPageMode / setPageMode / pageMode ➔ page_mode attribute
_addObject ➔ _add_object
_addPage ➔ _add_page
_sweepIndirectReferences ➔ _sweep_indirect_references

PdfMerger class

__init__ parameter: strict=True ➔ strict=False (the PdfFileMerger still has the old default)
addMetadata ➔ add_metadata
addNamedDestination ➔ add_named_destination
setPageLayout ➔ set_page_layout
setPageMode ➔ set_page_mode

Page class:

artBox / bleedBox / cropBox / mediaBox / trimBox ➔ artbox / bleedbox / cropbox / mediabox / trimbox
- getWidth, getHeight ➔ width / height
- getLowerLeft_x / getUpperLeft_x ➔ left
- getUpperRight_x / getLowerRight_x ➔ right
- getLowerLeft_y / getLowerRight_y ➔ bottom
- getUpperRight_y / getUpperLeft_y ➔ top
- getLowerLeft / setLowerLeft ➔ lower_left property
- upperRight ➔ upper_right
mergePage ➔ merge_page
rotateClockwise / rotateCounterClockwise ➔ rotate_clockwise
_mergeResources ➔ _merge_resources
_contentStreamRename ➔ _content_stream_rename
_pushPopGS ➔ _push_pop_gs
_addTransformationMatrix ➔ _add_transformation_matrix
_mergePage ➔ _merge_page

XmpInformation class:

getElement(..., aboutUri, ...) ➔ get_element(..., about_uri, ...)
getNodesInNamespace(..., aboutUri, ...) ➔ get_nodes_in_namespace(..., aboutUri, ...)
_getText ➔ _get_text

utils.py:

matrixMultiply ➔ `matrix_multiply
RC4_encrypt is moved to the security module

Parameter Names

PdfWriter.get_page: pageNumber ➔ page_number
PyPDF2.filters (all classes): decodeParms ➔ decode_parms
PyPDF2.filters (all classes): decodeStreamData ➔ decode_stream_data
pagenum ➔ page_number
PdfMerger.merge: position ➔ page_number
PdfWriter.add_outline_item_destination: dest ➔ page_destination
PdfWriter.add_named_destination_object: dest ➔ page_destination
PdfWriter.encrypt: user_pwd ➔ user_password
PdfWriter.encrypt: owner_pwd ➔ owner_password

Deprecations

A few classes / functions were deprecated without replacement:

PyPDF2.utils.ConvertFunctionsToVirtualList
PyPDF2.utils.formatWarning
PyPDF2.isInt(obj): Use instance(obj, int) instead
PyPDF2.u_(s): Use s directly
PyPDF2.chr_(c): Use chr(c) instead
PyPDF2.barray(b): Use bytearray(b) instead
PyPDF2.isBytes(b): Use instance(b, type(bytes())) instead
PyPDF2.xrange_fn: Use range instead
PyPDF2.string_type: Use str instead
PyPDF2.isString(s): Use instance(s, str) instead
PyPDF2._basestring: Use str instead
b_(...) was removed. You should typically be able to use the bytes object directly, otherwise you can copy this