Migration Guide: 1.x to 2.x
PyPDF2<2.0.0 (docs)
is very different from PyPDF2>=2.0.0 (docs).
Luckily, most changes are simple naming adjustments. This guide helps you to
make the step from PyPDF2 1.x (or even the original PyPdf) to PyPDF2>=2.0.0.
You can execute your code with the updated version and show deprecation warnings
by running python -W all your_code.py.
Imports and Modules
PyPDF2.utilsno longer existsPyPDF2.pdfno longer exists. You can import fromPyPDF2directly or fromPyPDF2.generic
Naming Adjustments
Classes
The base classes were renamed as they also allow operating with BytesIO streams
instead of files. Also, the strict parameter changed the default value from
strict=True to strict=False.
PdfFileReader➔PdfReaderPdfFileWriter➔PdfWriterPdfFileMerger➔PdfMerger
PdfFileReader and PdfFileMerger no longer have the overwriteWarnings
parameter. The new behavior is overwriteWarnings=False.
Function, Method, and Property Names
In PyPDF2.xmp.XmpInformation:
rdfRoot➔rdf_rootxmp_createDate➔xmp_create_datexmp_creatorTool➔xmp_creator_toolxmp_metadataDate➔xmp_metadata_datexmp_modifyDate➔xmp_modify_datexmpMetadata➔xmp_metadataxmpmm_documentId➔xmpmm_document_idxmpmm_instanceId➔xmpmm_instance_id
In PyPDF2.generic:
readObject➔read_objectconvertToInt➔convert_to_intDocumentInformation.getText➔DocumentInformation._get_text: This method should typically not be used; please let me know if you need it.readHexStringFromStream➔read_hex_string_from_streaminitializeFromDictionary➔initialize_from_dictionarycreateStringObject➔create_string_objectTreeObject.hasChildren➔TreeObject.has_childrenTreeObject.emptyTree➔TreeObject.empty_tree
In many places:
getObject➔get_objectwriteToStream➔write_to_streamreadFromStream➔read_from_stream
PdfReader class:
reader.getPage(pageNumber)➔reader.pages[page_number]reader.getNumPages()/reader.numPages➔len(reader.pages)getDocumentInfo➔metadataflattenedPagesattribute ➔flattened_pagesresolvedObjectsattribute ➔resolved_objectsxrefIndexattribute ➔xref_indexgetNamedDestinations/namedDestinationsattribute ➔named_destinationsgetPageLayout/pageLayout➔page_layoutattributegetPageMode/pageMode➔page_modeattributegetIsEncrypted/isEncrypted➔is_encryptedattributegetOutlines➔get_outlinesreadObjectHeader➔read_object_headercacheGetIndirectObject➔cache_get_indirect_objectcacheIndirectObject➔cache_indirect_objectgetDestinationPageNumber➔get_destination_page_numberreadNextEndLine➔read_next_end_line_zeroXref➔_zero_xref_authenticateUserPassword➔_authenticate_user_password_pageId2Numattribute ➔_page_id2num_buildDestination➔_build_destination_buildOutline➔_build_outline_getPageNumberByIndirect(indirectRef)➔_get_page_number_by_indirect(indirect_ref)_getObjectFromStream➔_get_object_from_stream_decryptObject➔_decrypt_object_flatten(..., indirectRef)➔_flatten(..., indirect_ref)_buildField➔_build_field_checkKids➔_check_kids_writeField➔_write_field_write_field(..., fieldAttributes)➔_write_field(..., field_attributes)_read_xref_subsections(..., getEntry, ...)➔_read_xref_subsections(..., get_entry, ...)
PdfWriter class:
writer.getPage(pageNumber)➔writer.pages[page_number]writer.getNumPages()➔len(writer.pages)addMetadata➔add_metadataaddPage➔add_pageaddBlankPage➔add_blank_pageaddAttachment(fname, fdata)➔add_attachment(filename, data)insertPage➔insert_pageinsertBlankPage➔insert_blank_pageappendPagesFromReader➔append_pages_from_readerupdatePageFormFieldValues➔update_page_form_field_valuescloneReaderDocumentRoot➔clone_reader_document_rootcloneDocumentFromReader➔clone_document_from_readergetReference➔get_referencegetOutlineRoot➔get_outline_rootgetNamedDestRoot➔get_named_dest_rootaddBookmarkDestination➔add_bookmark_destinationaddBookmarkDict➔add_bookmark_dictaddBookmark➔add_bookmarkaddNamedDestinationObject➔add_named_destination_objectaddNamedDestination➔add_named_destinationremoveLinks➔remove_linksremoveImages(ignoreByteStringObject)➔remove_images(ignore_byte_string_object)removeText(ignoreByteStringObject)➔remove_text(ignore_byte_string_object)addURI➔add_uriaddLink➔add_linkgetPage(pageNumber)➔get_page(page_number)getPageLayout / setPageLayout / pageLayout➔page_layout attributegetPageMode / setPageMode / pageMode➔page_mode attribute_addObject➔_add_object_addPage➔_add_page_sweepIndirectReferences➔_sweep_indirect_references
PdfMerger class
__init__parameter:strict=True➔strict=False(thePdfFileMergerstill has the old default)addMetadata➔add_metadataaddNamedDestination➔add_named_destinationsetPageLayout➔set_page_layoutsetPageMode➔set_page_mode
Page class:
artBox/bleedBox/cropBox/mediaBox/trimBox➔artbox/bleedbox/cropbox/mediabox/trimboxgetWidth,getHeight➔width/heightgetLowerLeft_x/getUpperLeft_x➔leftgetUpperRight_x/getLowerRight_x➔rightgetLowerLeft_y/getLowerRight_y➔bottomgetUpperRight_y/getUpperLeft_y➔topgetLowerLeft/setLowerLeft➔lower_leftpropertyupperRight➔upper_right
mergePage➔merge_pagerotateClockwise/rotateCounterClockwise➔rotate_clockwise_mergeResources➔_merge_resources_contentStreamRename➔_content_stream_rename_pushPopGS➔_push_pop_gs_addTransformationMatrix➔_add_transformation_matrix_mergePage➔_merge_page
XmpInformation class:
getElement(..., aboutUri, ...)➔get_element(..., about_uri, ...)getNodesInNamespace(..., aboutUri, ...)➔get_nodes_in_namespace(..., aboutUri, ...)_getText➔_get_text
utils.py:
matrixMultiply➔ `matrix_multiplyRC4_encryptis moved to the security module
Parameter Names
PdfWriter.get_page:pageNumber➔page_numberPyPDF2.filters(all classes):decodeParms➔decode_parmsPyPDF2.filters(all classes):decodeStreamData➔decode_stream_datapagenum➔page_numberPdfMerger.merge:position➔page_numberPdfWriter.add_outline_item_destination:dest➔page_destinationPdfWriter.add_named_destination_object:dest➔page_destinationPdfWriter.encrypt:user_pwd➔user_passwordPdfWriter.encrypt:owner_pwd➔owner_password
Deprecations
A few classes / functions were deprecated without replacement:
PyPDF2.utils.ConvertFunctionsToVirtualListPyPDF2.utils.formatWarningPyPDF2.isInt(obj): Useinstance(obj, int)insteadPyPDF2.u_(s): UsesdirectlyPyPDF2.chr_(c): Usechr(c)insteadPyPDF2.barray(b): Usebytearray(b)insteadPyPDF2.isBytes(b): Useinstance(b, type(bytes()))insteadPyPDF2.xrange_fn: UserangeinsteadPyPDF2.string_type: UsestrinsteadPyPDF2.isString(s): Useinstance(s, str)insteadPyPDF2._basestring: Usestrinsteadb_(...)was removed. You should typically be able to use the bytes object directly, otherwise you can copy this