Python : How to extract images from PDF files

python @ Freshers.in

In this article you can see how to extract images from pdf files and save it in your local. For that here we are using PyPDF2 library.

PyPDF2 is a pure-python PDF library that can split, merge, crop, and otherwise alter the pages of PDF files. It is free and open-source.

Install PyPDF2

!pip install PyPDF2

Sample code to extract images from PDF 

from PyPDF2 import PdfReader
pdfreader = PdfReader("freshers_ny.pdf")
first_page = pdfreader.pages[0]
count = 0
for image_file in first_page.images:
    with open(str(count) + image_file.name,"wb") as fp:
        fp.write(image_file.data)
        count = count + 1

PyPDF2 Official page 
Get more post on Python, PySpark

Author: user

Leave a Reply