Concatenating PDF files using Python

Concatenating PDF files using Python

Sometimes we have to deal with multiple PDF files and I've seen often that people want if pdfs can be merged or concatenated as one single file.

We can do that with Python thanks to the PyPDF2 library, which is a very handy tool when it comes to working with PDFs using Python.

Let's see how to do it.

Merging PDFs

First of all install the PyPDF2 package.

pip install pypdf2

Now, we have to instantiate PdfFileMerger Object and use its append() method for appending files to it.

merger = PyPDF2.PdfFileMerger()

for pdf in pdf_list:
        merger.append(pdf)
merger.write(file_name + '.pdf')

Using this, we can write a script which merges all pdf in a folder.

Importing Libs

import PyPDF2
import sys
import os

Getting all PDFs in the folder

def get_all_pdfs(path='.'):
    files = []
    for file in os.listdir(path):
        if file.endswith('.pdf') or file.endswith('.PDF'):
            root = os.path.abspath(path)
            files.append(os.path.join(root, file))
    return files if files else None

Merging

def merge_pdfs(file_name='output'):
    """
    Create a single PDF file from the given list of PDFs
    """

    pdf_list = get_all_pdfs()

    if pdf_list is not None:
        merger = PyPDF2.PdfFileMerger()

        for pdf in pdf_list:
            merger.append(pdf)
        merger.write(file_name + '.pdf')
    else:
        print("No PDF Found!")

if __name__ == "__main__":
    merge_pdfs()

This was one of the uses of PyPDF2 while it can also used in other ways like reading, splitting and to some extent editing PDFs.

Do give your suggestions and feedbacks in the comments.

Thanks for reading :)