Select Page

Passport Data Extractor

Machine Learning application that uses OCR technology to extract text from scanned passport images. The application extracts the text and MRZ Code, then provides the data in Key-Value pair format, and uses the MRZ Code to verify the extracted values.
The Passport Data Extractor is a Machine Learning application that uses OCR technology to extract text from scanned passport images. The application extracts the text and MRZ Code, then provides the data in Key Value pair format, and uses the MRZ Code to verify the extracted values.

Passport Data Extractor uses AWS Textract along with Tesseract. The application uses AWS Textract to extract data in Key Value pairs and then it extracts the MRZ Code using Tesseract OCR that is specifically trained for extracting MRZ Codes, then it passes the extracted information to the MRZ Hashing algorithm that checks and verifies the hashing values for the respective extracted information. After verifying the information, the application saves the results in JSON format.

The data fields that extracted as Key-Value pairs are as follows:

  • Type
  • Country Code
  • Surname
  • Given Names
  • Nationality
  • Date of Birth
  • Passport Number
  • Date of Expiry
  • Personal Number (If present)

Technology Stack:

  • Python 3.7
  • AWS Textract
  • Tesseract OCR
  • Python Flask Framework (for web interface)

Benefits:

  • Faster than Manual Data Entry
  • Verified Data extraction using MRZ Code
  • Usable in Process Automation