# Cropping and Transforming PDFs ```{note} Just because content is no longer visible, it is not gone. Cropping works by adjusting the viewbox. That means content that was cropped away can still be restored. ``` ```{testsetup} pypdf_test_setup("user/cropping-and-transforming", { "example.pdf": "../resources/example.pdf", "Seige_of_Vicksburg_Sample_OCR.pdf": "../resources/Seige_of_Vicksburg_Sample_OCR.pdf", "labeled-edges-center-image.pdf": "../resources/labeled-edges-center-image.pdf", "side-by-side-subfig.pdf": "../resources/side-by-side-subfig.pdf", "nup-source.pdf": "../resources/box.pdf", "box.pdf": "../resources/box.pdf", }) ``` ```{testcode} from pypdf import PdfReader, PdfWriter reader = PdfReader("Seige_of_Vicksburg_Sample_OCR.pdf") writer = PdfWriter() # Add page 1 from reader to output document, unchanged. writer.add_page(reader.pages[0]) # Add page 2 from reader, but rotated clockwise 90 degrees. writer.add_page(reader.pages[1].rotate(90)) # Add page 3 from reader, but crop it to half size. page3 = reader.pages[2] page3.mediabox.upper_right = ( page3.mediabox.right / 2, page3.mediabox.top / 2, ) writer.add_page(page3) writer.write("out-all-in-one.pdf") ``` ## Page rotation The most typical rotation is a clockwise rotation of the page by multiples of 90 degrees. That is done when the orientation of the page is wrong. You can do that with the {func}`~pypdf._page.PageObject.rotate` method: ```{testcode} from pypdf import PdfReader, PdfWriter reader = PdfReader("example.pdf") writer = PdfWriter() writer.add_page(reader.pages[0]) writer.pages[0].rotate(90) writer.write("out-page-rotation.pdf") ``` The rotate method is typically preferred over the `page.add_transformation(Transformation().rotate())` method, because `rotate` will ensure that the page is still in the mediabox/cropbox. The transformation object operates on the coordinates of the page contents and does not change the mediabox or cropbox. ## Plain Merge ![](plain-merge.png) is the result of ```{testcode} from pypdf import PdfReader, PdfWriter, Transformation # Get the data reader_base = PdfReader("labeled-edges-center-image.pdf") page_base = reader_base.pages[0] reader = PdfReader("box.pdf") page_box = reader.pages[0] page_base.merge_page(page_box) # Write the result back writer = PdfWriter() writer.add_page(page_base) writer.write("out-plain-merge.pdf") ``` ## Merge with Rotation ![](merge-45-deg-rot.png) ```{testcode} from pypdf import PdfReader, PdfWriter, Transformation # Get the data reader_base = PdfReader("labeled-edges-center-image.pdf") page_base = reader_base.pages[0] reader = PdfReader("box.pdf") page_box = reader.pages[0] # Apply the transformation transformation = Transformation().rotate(45) page_box.add_transformation(transformation) page_base.merge_page(page_box) # Write the result back writer = PdfWriter() writer.add_page(page_base) writer.write("out-merge-with-rotation.pdf") ``` If you add the `expand` parameter: ```{testcode} transformation = Transformation().rotate(45) page_box.add_transformation(transformation) page_base.merge_page(page_box, expand=True) ``` you get: ![](merge-rotate-expand.png) Alternatively, you can move the merged image a bit to the right by using ```{testcode} op = Transformation().rotate(45).translate(tx=50) ``` ![](merge-translated.png) ## Scaling In pypdf, the content and the page can either be scaled together or separately. Content scaling scales the contents on a page, and page scaling scales just the page size (the canvas). Typically, you want to combine both. ![](scaling.png) ### Scaling both the Page and contents together ```{testcode} from pypdf import PdfReader, PdfWriter # Read the input reader = PdfReader("side-by-side-subfig.pdf") page = reader.pages[0] # Scale page.scale_by(0.5) # Write the result to a file writer = PdfWriter() writer.add_page(page) writer.write("out-scale-all.pdf") ``` ### Scaling the content only The content is scaled around the origin of the coordinate system. Typically, that is the lower-left corner. ```{testcode} from pypdf import PdfReader, PdfWriter, Transformation # Read the input reader = PdfReader("side-by-side-subfig.pdf") page = reader.pages[0] # Scale op = Transformation().scale(sx=0.7, sy=0.7) page.add_transformation(op) # Write the result to a file writer = PdfWriter() writer.add_page(page) writer.write("out-scale-content.pdf") ``` ### Scaling the page only To scale the page by `sx` in the X direction and `sy` in the Y direction: ```{testcode} page.mediabox = page.mediabox.scale(sx=0.7, sy=0.7) ``` If you wish to have more control, you can adjust the various page boxes directly: ```{testcode} from pypdf.generic import RectangleObject mb = page.mediabox page.mediabox = RectangleObject((mb.left, mb.bottom, mb.right, mb.top)) page.cropbox = RectangleObject((mb.left, mb.bottom, mb.right, mb.top)) page.trimbox = RectangleObject((mb.left, mb.bottom, mb.right, mb.top)) page.bleedbox = RectangleObject((mb.left, mb.bottom, mb.right, mb.top)) page.artbox = RectangleObject((mb.left, mb.bottom, mb.right, mb.top)) ``` ### pypdf._page.MERGE_CROP_BOX `pypdf<=3.4.0` used to merge the other page with `trimbox`. `pypdf>3.4.0` changes this behavior to `cropbox`. In case anybody has good reasons to use/expect `trimbox`, you can add the following code to get the old behavior: ```{testcode} import pypdf pypdf._page.MERGE_CROP_BOX = "trimbox" ``` # Transforming several copies of the same page We have designed the following business card (A8 format) to advertise our new startup. ![](nup-source.png) We would like to copy this card sixteen times on an A4 page, to print it, cut it, and give it to all our friends. Having learned about the {func}`~pypdf._page.PageObject.merge_page` method and the {class}`~pypdf.Transformation` class, we run the following code. Notice that we had to tweak the media box of the source page to extend it, which is already a dirty hack (in this case). ```{testcode} from pypdf import PaperSize, PdfReader, PdfWriter, Transformation # Read source file reader = PdfReader("nup-source.pdf") sourcepage = reader.pages[0] # Create a destination file, and add a blank page to it writer = PdfWriter() destpage = writer.add_blank_page(width=PaperSize.A4.height, height=PaperSize.A4.width) # Extend source page mediabox sourcepage.mediabox = destpage.mediabox # Copy source page to destination page, several times for x in range(4): for y in range(4): # Translate page sourcepage.add_transformation( Transformation().translate( x * PaperSize.A8.height, y * PaperSize.A8.width, ) ) # Merge translated page destpage.merge_page(sourcepage) # Write file writer.write("out-nup-dest1.pdf") ``` And the result is… unexpected. ![](nup-dest1.png) The problem is that, having run ``add_transformation()`` several times on the *same* source page, those transformations add up: for instance, the sixteen transformations are applied to the last copy of the source page, so most of the business cards are *outside* the destination page. We need a way to merge a transformed page, *without* modifying the source page. Here comes {func}`~pypdf._page.PageObject.merge_transformed_page`. With this method: - we no longer need the media box hack of our first try; - transformations are only applied *once*. ```{testcode} from pypdf import PaperSize, PdfReader, PdfWriter, Transformation # Read source file reader = PdfReader("nup-source.pdf") sourcepage = reader.pages[0] # Create a destination file, and add a blank page to it writer = PdfWriter() destpage = writer.add_blank_page(width=PaperSize.A4.height, height=PaperSize.A4.width) # Copy source page to destination page, several times for x in range(4): for y in range(4): destpage.merge_transformed_page( sourcepage, Transformation().translate( x * sourcepage.mediabox.width, y * sourcepage.mediabox.height, ), ) # Write file writer.write("out-nup-dest2.pdf") ``` We get the expected result. ![](nup-dest2.png) There is still some work to do, for instance, to insert margins between and around cards, but this is left as an exercise for the reader…