Generic PDF objects
Implementation of generic PDF objects (dictionary, number, string, …).
- class pypdf.generic.ArrayObject(iterable=(), /)[source]
-
- replicate(pdf_dest: PdfWriterProtocol) ArrayObject[source]
- clone(pdf_dest: PdfWriterProtocol, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) ArrayObject[source]
Clone object into pdf_dest.
- class pypdf.generic.BooleanObject(value: Any)[source]
Bases:
PdfObject- clone(pdf_dest: PdfWriterProtocol, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) BooleanObject[source]
Clone object into pdf_dest.
- static read_from_stream(stream: IO[Any]) BooleanObject[source]
- class pypdf.generic.ByteStringObject(*args, **kwargs)[source]
-
Represents a string object where the text encoding could not be determined.
This occurs quite often, as the PDF spec doesn’t provide an alternate way to represent strings – for example, the encryption data stored in files (like /O) is clearly not text, but is still stored in a “String” object.
- class pypdf.generic.ContentStream(stream: Any, pdf: Any, forced_encoding: None | str | list[str] | dict[int, str] = None)[source]
Bases:
DecodedStreamObjectIn order to be fast, this data structure can contain either:
raw data in ._data
parsed stream operations in ._operations.
At any time, ContentStream object can either have both of those fields defined, or one field defined and the other set to None.
These fields are “rebuilt” lazily, when accessed:
when .get_data() is called, if ._data is None, it is rebuilt from ._operations.
when .operations is called, if ._operations is None, it is rebuilt from ._data.
Conversely, these fields can be invalidated:
when .set_data() is called, ._operations is set to None.
when .operations is set, ._data is set to None.
- replicate(pdf_dest: PdfWriterProtocol) ContentStream[source]
- class pypdf.generic.DecodedStreamObject[source]
Bases:
StreamObject
- class pypdf.generic.DictionaryObject[source]
Bases:
dict[Any,Any],PdfObject- replicate(pdf_dest: PdfWriterProtocol) DictionaryObject[source]
- clone(pdf_dest: PdfWriterProtocol, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) DictionaryObject[source]
Clone object into pdf_dest.
- get_inherited(key: str, default: Any = None) Any[source]
Returns the value of a key or from the parent if not found. If not found returns default.
- Parameters:
key – string identifying the field to return
default – default value to return
- Returns:
Current key or inherited one, otherwise default value.
- property xmp_metadata: XmpInformationProtocol | None
Retrieve XMP (Extensible Metadata Platform) data relevant to this object, if available.
See Table 347 — Additional entries in a metadata stream dictionary.
- Returns:
Returns a
XmpInformationinstance that can be used to access XMP metadata from the document. Can also return None if no metadata was found on the document root.
- class pypdf.generic.DirectReferenceLink(reference: ArrayObject)[source]
Bases:
objectDirect reference link being preserved until we can resolve it correctly.
- find_referenced_page() IndirectObject[source]
- patch_reference(target_pdf: PdfWriter, new_page: IndirectObject) None[source]
target_pdf: PdfWriter which the new link went into
- class pypdf.generic.EmbeddedFile(name: str, pdf_object: DictionaryObject, parent: ArrayObject | None = None)[source]
Bases:
objectContainer holding the information on an embedded file.
Attributes are evaluated lazily if possible.
Further information on embedded files can be found in section 7.11 of the PDF 2.0 specification.
- class pypdf.generic.EncodedStreamObject[source]
Bases:
StreamObject
- class pypdf.generic.FloatObject(value: Any = '0.0', context: Any | None = None)[source]
- class pypdf.generic.IndirectObject(idnum: int, generation: int, pdf: Any)[source]
Bases:
PdfObject- replicate(pdf_dest: PdfWriterProtocol) PdfObject[source]
- clone(pdf_dest: PdfWriterProtocol, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) IndirectObject[source]
Clone object into pdf_dest.
- property indirect_reference: IndirectObject
- class pypdf.generic.NameObject[source]
-
- delimiter_pattern = re.compile(b'\\s+|[\\(\\)<>\\[\\]{}/%]')
- prefix = b'/'
- renumber_table: ClassVar[dict[str, bytes]] = {'\x00': b'#00', '\x01': b'#01', '\x02': b'#02', '\x03': b'#03', '\x04': b'#04', '\x05': b'#05', '\x06': b'#06', '\x07': b'#07', '\x08': b'#08', '\t': b'#09', '\n': b'#0A', '\x0b': b'#0B', '\x0c': b'#0C', '\r': b'#0D', '\x0e': b'#0E', '\x0f': b'#0F', '\x10': b'#10', '\x11': b'#11', '\x12': b'#12', '\x13': b'#13', '\x14': b'#14', '\x15': b'#15', '\x16': b'#16', '\x17': b'#17', '\x18': b'#18', '\x19': b'#19', '\x1a': b'#1A', '\x1b': b'#1B', '\x1c': b'#1C', '\x1d': b'#1D', '\x1e': b'#1E', '\x1f': b'#1F', ' ': b'#20', '#': b'#23', '%': b'#25', '(': b'#28', ')': b'#29', '/': b'#2F', '<': b'#3C', '>': b'#3E', '[': b'#5B', ']': b'#5D', '{': b'#7B', '}': b'#7D'}
- clone(pdf_dest: Any, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) NameObject[source]
Clone object into pdf_dest.
- surfix
Decorator that converts a method with a single cls argument into a property that can be accessed directly from the class.
- CHARSETS = ('utf-8', 'gbk', 'latin1')
- class pypdf.generic.NamedReferenceLink(reference: TextStringObject, source_pdf: PdfReader)[source]
Bases:
objectNamed reference link being preserved until we can resolve it correctly.
- find_referenced_page() IndirectObject | None[source]
- patch_reference(target_pdf: PdfWriter, new_page: IndirectObject) None[source]
target_pdf: PdfWriter which the new link went into
- class pypdf.generic.NullObject(*args, **kwargs)[source]
Bases:
PdfObject- clone(pdf_dest: PdfWriterProtocol, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) NullObject[source]
Clone object into pdf_dest.
- static read_from_stream(stream: IO[Any]) NullObject[source]
- class pypdf.generic.NumberObject(value: Any)[source]
-
- NumberPattern = re.compile(b'[^+-.0-9]')
- clone(pdf_dest: Any, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) NumberObject[source]
Clone object into pdf_dest.
- static read_from_stream(stream: IO[Any]) NumberObject | FloatObject[source]
- class pypdf.generic.OutlineFontFlag(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
IntFlagA class used as an enumerable flag for formatting an outline font.
- italic = 1
- bold = 2
- class pypdf.generic.OutlineItem(title: str | bytes, page: NumberObject | IndirectObject | NullObject | DictionaryObject, fit: Fit)[source]
Bases:
Destination
- class pypdf.generic.PdfObject(*args, **kwargs)[source]
Bases:
PdfObjectProtocol- hash_func(*, usedforsecurity=True)
Returns a sha1 hash object; optionally initialized with a string
- indirect_reference: IndirectObject | None
- replicate(pdf_dest: PdfWriterProtocol) PdfObject[source]
Clone object into pdf_dest (PdfWriterProtocol which is an interface for PdfWriter) without ensuring links. This is used in clone_document_from_root with incremental = True.
- Parameters:
pdf_dest – Target to clone to.
- Returns:
The cloned PdfObject
- clone(pdf_dest: PdfWriterProtocol, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) PdfObject[source]
Clone object into pdf_dest (PdfWriterProtocol which is an interface for PdfWriter).
By default, this method will call
_reference_clone(see_reference).- Parameters:
pdf_dest – Target to clone to.
force_duplicate – By default, if the object has already been cloned and referenced, the copy will be returned; when
True, a new copy will be created. (Default value =False)ignore_fields – List/tuple of field names (for dictionaries) that will be ignored during cloning (applies to children duplication as well). If fields are to be considered for a limited number of levels, you have to add it as integer, for example
[1,"/B","/TOTO"]means that"/B"will be ignored at the first level only but"/TOTO"on all levels.
- Returns:
The cloned PdfObject
- class pypdf.generic.StreamObject[source]
Bases:
DictionaryObject- replicate(pdf_dest: PdfWriterProtocol) StreamObject[source]
- static initialize_from_dictionary(data: dict[str, Any]) EncodedStreamObject | DecodedStreamObject[source]
- flate_encode(level: int = -1) EncodedStreamObject[source]
- class pypdf.generic.TextStringObject(value: Any)[source]
-
A string object that has been decoded into a real unicode string.
If read from a PDF document, this string appeared to match the PDFDocEncoding, or contained a UTF-16BE BOM mark to cause UTF-16 decoding to occur.
- clone(pdf_dest: Any, force_duplicate: bool = False, ignore_fields: Sequence[str | int] | None = ()) TextStringObject[source]
Clone object into pdf_dest.
- class pypdf.generic.TreeObject(dct: DictionaryObject | None = None)[source]
Bases:
DictionaryObject- add_child(child: Any, pdf: PdfWriterProtocol) None[source]
- inc_parent_counter_default(parent: None | IndirectObject | TreeObject, n: int) None[source]
- inc_parent_counter_outline(parent: None | IndirectObject | TreeObject, n: int) None[source]
- insert_child(child: Any, before: Any, pdf: PdfWriterProtocol, inc_parent_counter: Callable[[...], Any] | None = None) IndirectObject[source]
- class pypdf.generic.ViewerPreferences(value: Any = None)[source]
Bases:
DictionaryObject- property PRINT_SCALING: NameObject
- pypdf.generic.create_string_object(string: str | bytes, forced_encoding: None | str | list[str] | dict[int, str] = None) TextStringObject | ByteStringObject[source]
Create a ByteStringObject or a TextStringObject from a string to represent the string.
- Parameters:
string – The data being used
forced_encoding – Typically None, or an encoding string
- Returns:
A ByteStringObject
- Raises:
TypeError – If string is not of type str or bytes.
- pypdf.generic.extract_links(new_page: PageObject, old_page: PageObject) list[tuple[NamedReferenceLink | DirectReferenceLink, NamedReferenceLink | DirectReferenceLink]][source]
Extracts links from two pages on the assumption that the two pages are the same. Produces one list of (new link, old link) tuples.
- pypdf.generic.is_null_or_none(x: Any) TypeGuard[None | NullObject | IndirectObject][source]
- Returns:
True if x is None or NullObject.
- pypdf.generic.read_hex_string_from_stream(stream: IO[Any], forced_encoding: None | str | list[str] | dict[int, str] = None) TextStringObject | ByteStringObject[source]
- pypdf.generic.read_object(stream: IO[Any], pdf: PdfReaderProtocol | None, forced_encoding: None | str | list[str] | dict[int, str] = None) PdfObject | int | str | ContentStream[source]
- pypdf.generic.read_string_from_stream(stream: IO[Any], forced_encoding: None | str | list[str] | dict[int, str] = None) TextStringObject | ByteStringObject[source]
- class pypdf._protocols.PdfObjectProtocol(*args, **kwargs)[source]
Bases:
Protocol- clone(pdf_dest: Any, force_duplicate: bool = False, ignore_fields: tuple[str, ...] | list[str] | None = ()) Any[source]
- get_object() PdfObjectProtocol | None[source]
- class pypdf._protocols.XmpInformationProtocol(*args, **kwargs)[source]
Bases:
PdfObjectProtocol
- class pypdf._protocols.PdfCommonDocProtocol(*args, **kwargs)[source]
Bases:
Protocol- property root_object: PdfObjectProtocol
- get_object(indirect_reference: Any) PdfObjectProtocol | None[source]
- class pypdf._protocols.PdfReaderProtocol(*args, **kwargs)[source]
Bases:
PdfCommonDocProtocol,Protocol