The PdfDocCommon Class
PdfDocCommon is an abstract class which is inherited by PdfReader and PdfWriter.
Where identified in the API, you can use any of the derived class.
- class pypdf._doc_common.PdfDocCommon[source]
Bases:
objectCommon functions from PdfWriter and PdfReader objects.
This root class is strongly abstracted.
- flattened_pages: list[PageObject] | None = None
- abstract property root_object: DictionaryObject
- property metadata: DocumentInformation | None
Retrieve the PDF file’s document information dictionary, if it exists.
Note that some PDF files use metadata streams instead of document information dictionaries, and these metadata streams will not be accessed by this function.
- property xmp_metadata: XmpInformation | None
- property viewer_preferences: ViewerPreferences | None
Returns the existing ViewerPreferences as an overloaded dictionary.
- get_num_pages() int[source]
Calculate the number of pages in this PDF file.
- Returns:
The number of pages of the parsed PDF file.
- Raises:
PdfReadError – If restrictions prevent this action.
- get_page(page_number: int) PageObject[source]
Retrieve a page by number from this PDF file. Most of the time
.pages[page_number]is preferred.- Parameters:
page_number – The page number to retrieve (pages begin at zero)
- Returns:
A
PageObjectinstance.
- property named_destinations: dict[str, Destination]
A read-only dictionary which maps names to destinations.
- get_named_dest_root() ArrayObject[source]
- get_fields(tree: TreeObject | None = None, retval: dict[Any, Any] | None = None, fileobj: Any | None = None, stack: list[PdfObject] | None = None) dict[str, Any] | None[source]
Extract field data if this PDF contains interactive form fields.
The tree, retval, stack parameters are for recursive use.
- Parameters:
tree – Current object to parse.
retval – In-progress list of fields.
fileobj – A file object (usually a text file) to write a report to on all interactive form fields found.
stack – List of already parsed objects.
- Returns:
A dictionary where each key is a field name, and each value is a
Fieldobject. By default, the mapping name is used for keys.Noneif form data could not be located.
- get_form_text_fields(full_qualified_name: bool = False) dict[str, Any][source]
Retrieve form fields from the document with textual data.
- Parameters:
full_qualified_name – to get full name
- Returns:
A dictionary. The key is the name of the form field, the value is the content of the field.
If the document contains multiple form fields with the same name, the second and following will get the suffix .2, .3, …
- get_pages_showing_field(field: Field | PdfObject | IndirectObject) list[PageObject][source]
Provides list of pages where the field is called.
- Parameters:
field – Field Object, PdfObject or IndirectObject referencing a Field
- Returns:
List of pages –
- Empty list:
The field has no widgets attached (either hidden field or ancestor field).
- Single page list:
Page where the widget is present (most common).
- Multi-page list:
Field with multiple kids widgets (example: radio buttons, field repeated on multiple pages).
- property open_destination: None | Destination | TextStringObject | ByteStringObject
Property to access the opening destination (
/OpenActionentry in the PDF catalog). It returnsNoneif the entry does not exist or is not set.- Raises:
Exception – If a destination is invalid.
- property outline: list[Destination | list[Destination | list[Destination]]]
Read-only property for the outline present in the document (i.e., a collection of ‘outline items’ which are also known as ‘bookmarks’).
- property threads: ArrayObject | None
Read-only property for the list of threads.
See §12.4.3 from the PDF 1.7 or 2.0 specification.
It is an array of dictionaries with “/F” (the first bead in the thread) and “/I” (a thread information dictionary containing information about the thread, such as its title, author, and creation date) properties or None if there are no articles.
Since PDF 2.0 it can also contain an indirect reference to a metadata stream containing information about the thread, such as its title, author, and creation date.
- get_page_number(page: PageObject) int | None[source]
Retrieve page number of a given PageObject.
- Parameters:
page – The page to get page number. Should be an instance of
PageObject- Returns:
The page number or None if page is not found
- get_destination_page_number(destination: Destination) int | None[source]
Retrieve page number of a given Destination object.
- Parameters:
destination – The destination to get page number.
- Returns:
The page number or None if page is not found
- property pages: list[PageObject]
Property that emulates a list of
PageObject. This property allows to get a page or a range of pages.Note
For PdfWriter only: Provides the capability to remove a page/range of page from the list (using the del operator). Remember: Only the page entry is removed, as the objects beneath can be used elsewhere. A solution to completely remove them - if they are not used anywhere - is to write to a buffer/temporary file and then load it into a new PdfWriter.
- property page_labels: list[str]
A list of labels for the pages in this document.
This property is read-only. The labels are in the order that the pages appear in the document.
- property page_layout: str | None
Get the page layout currently being used.
Valid layoutvalues/NoLayout
Layout explicitly not specified
/SinglePage
Show one page at a time
/OneColumn
Show one column at a time
/TwoColumnLeft
Show pages in two columns, odd-numbered pages on the left
/TwoColumnRight
Show pages in two columns, odd-numbered pages on the right
/TwoPageLeft
Show two pages at a time, odd-numbered pages on the left
/TwoPageRight
Show two pages at a time, odd-numbered pages on the right
- property page_mode: Literal['/UseNone', '/UseOutlines', '/UseThumbs', '/FullScreen', '/UseOC', '/UseAttachments'] | None
Get the page mode currently being used.
Valid modevalues/UseNone
Do not show outline or thumbnails panels
/UseOutlines
Show outline (aka bookmarks) panel
/UseThumbs
Show page thumbnails panel
/FullScreen
Fullscreen view
/UseOC
Show Optional Content Group (OCG) panel
/UseAttachments
Show attachments panel
- remove_page(page: int | PageObject | IndirectObject, clean: bool = False) None[source]
Remove page from pages list.
- Parameters:
page –
int: Page number to be removed.PageObject: page to be removed. If the page appears many times only the first one will be removed.IndirectObject: Reference to page to be removed.
clean – replace PageObject with NullObject to prevent annotations or destinations to reference a detached page.
- decode_permissions(permissions_code: int) dict[str, bool][source]
Take the permissions as an integer, return the allowed access.
- property user_access_permissions: UserAccessPermissions | None
Get the user access permissions for encrypted documents. Returns None if not encrypted.
- abstract property is_encrypted: bool
Read-only boolean property showing whether this PDF file is encrypted.
Note that this property, if true, will remain true even after the
decrypt()method is called.
- property attachment_list: Generator[EmbeddedFile, None, None]
Iterable of attachment objects.