$20 Bonus + 25% OFF CLAIM OFFER
Place Your Order With Us Today And Go Stress-Free
1. Outline your research aims, objectives and research questions.
• Is it possible to convert paper bill / or paper ledger in to digital form.
• Computer Vision using OpenCV
• Coding with Google Colab
• Text Detection using YOLO
• OCR using Tesseract
• Can we convert the data from paper to digital form.
• Will the deep learning helps to reduce manual labour converting physical bills to digital.
• Is it possible to automate the converting process.
2. What is the academic/scientific rationale for proposing to conduct this research study? Provide references to relevant empirical, conceptual, and theoretical literature
• People have been relying on paper invoices for a very long time, but these days, all have become digital, and so have invoices. Reconciling digital invoices is a laborious job as it requires employees to spend hours browsing through several invoices and noting things down in a ledger.
• But, what if we told you we could automate this, and you can save on those human hours spent as a business? Yes, it is possible because of the incredible data science tools like YOLO and Tesseract that one can use to create OCR in Python. OCR stands for optical character recognition, and in this project, we will explain how to build OCR from scratch in Python.
• Digitized images are often represented as a two-dimensional (2D) array of pixels values. Each pixel value which makes up the color scheme of the image is often influenced by an array of factors such as light intensity. Visual scene is projected unto a surface, where receptors (natural or artificial) produce values that depend on the intensity of incident light.
• These exciting concepts are however hard to implement. Forming an image leads to loss of details of information while collapsing a three-dimensional (3D) image into a two-dimensional image. Many other factors are responsible for why image recognition/ image processing is hard. Some of such factors are noise in the image (pixels values that are off from its surrounding pixels), mapping from scene to image etc.
• In recent years, during the ImageNet Large Scale Visual Recognition Competition (ILSVRC, 2015), computers were going better than humans in the image classification task . 9 In 2016, a faster object detector, YOLO, was proposed to implement object detection in real-time situation. Our motivation is to apply YOLO to object detection task of URL links within an image scene. We will also be comparing the speed and accuracy of this with an OCR software.
Also Read - Programming Assignment Help
This machine learning project deals with training the YOLO object detection model using the dataset of digital invoices. The model is trained to identify three essential classes from the invoices, Invoice number, Billing Date, and Total amount. After that, you will use Tesseract for performing OCR in python.
Tech Stack
Language: Python
Object detection: YOLO V4
Text Recognition: Tesseract OCR Environment: Google Colab
Any business currently going through all the bills manually to jot it down in a ledger can use this project.
Below we have mentioned in detail all the data science tools and techniques that you will use to implement the solution of this project.
OpenCV is one of Python’s most popular computer vision and image processing libraries. Before serving any image to the object detection model YOLO, it must be processed, and for that purpose, you will use OpenCV. Additionally, for visualizing the testing results of the YOLO model, one relies on various functions of the OpenCV library.
Google Colab is an application hosted by Google in the cloud that allows its users to build executable programs in Python. In this YOLO character recognition project, you will learn to use Colab notebooks to implement the complete solution.
You will learn how to link the darknet OCR framework for training the YOLO v4 model, execute terminal commands in colab notebooks, and do many more exciting tasks.
Colab uses the power of Graphical Processing Units (GPU) for performing this task at a much faster speed than CPU tasks. You will also learn how to change the runtime in Colab and set it to GPU for faster execution
Also Read - C-Programming Assignment Help
YOLO v4 is an object detection model developed by Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao where YOLO stands for ‘You only look once’. Its quirky name comes from the algorithm identifying all the objects in an image by looking at it only once.
And that is one of the primary reasons why the algorithm can detect objects faster than the RCNN algorithms.
This project will use the YOLO algorithm to build a custom OCR with Python. The reason behind building a custom OCR model is that YOLO only knows how to identify 80 predefined classes of the COCO dataset.
Thus, this project will guide you through transfer learning to create a YOLO-text-recognition model using the invoices dataset.
As specified already, this custom OCR system will identify the three objects from the invoice images: invoice number, Billing Date, and Total amount and create a bounding box around them once the respective entities have been identified.
With YOLO, the system will recognize the vital text classes from the invoices but to decode the information in the text; one must utilize Optical Character Recognition (OCR).
Tesseract OCR is a tool that quickly scans text and converts it into digital data. In this project, you will learn how to use Tesseract OCR for creating a custom OCR in Python.
Also Read - Oracle Assignment Help
Here is a step-by-step guide on building OCR from scratch in Python -
1. Setting up and Installation to run Yolov4
Downloading AlexeyAB's famous repository, we will adjust the Makefile to enable OPENCV and GPU for darknet and then build darknet.
2. Downloading pre-trained YOLOv4 weights
YOLOv4 has already been trained on the coco dataset, with 80 classes that it can predict. We will take these pre-trained weights to understand how they result in some test images.
3. Creating display functions to display the predicted class.
Here, you will learn how to use OpenCV for visualizing object detection results of the YOLO model.
4. Data collection and Labeling with LabelImg
This YOLO OCR project aims to train YOLO to learn three new classes; you will create a new dataset for training and validation. You will create this new dataset with the help of the Labellmg tool that will annotate the image with three classes, and YOLO will then use these annotations during training.
5. Configuring Files for Training -
This step involves configuring custom .cfg, obj.data, obj.names, train.txt and test.txt files.
Configuring all the needed variables based on class in the config file
Creating obj.names and obj.data files
1. obj.names: Classes to be detected
2. obj.data
Configuring train.txt and test.txt
6. Download pre-trained weights for the convolutional layers
YOLO's object detection model has already been trained on the COCO dataset for 80 different classes. One can download these weights and then fine-tune them accordingly with the help of their custom dataset. The great part about this is the fact that even with fewer data points, by just adding a couple of layers of learning on top of existing ones, the model can learn and adapt to the new classes.
7. Training Custom Object Detector
8. Evaluating the model using Mean Average precision
9. Predict image classes and save the coordinates separately
10. Detecting text from the predicted class
Importing pytesseract and setting environment variable (for windows only, for Unix it is already set) for English trained data
Getting the list of predicted files from the directory
Using tesseract pre-trained LSTM model to extract the text
Fine-tuning the LSTM model. (Please note that fine-tuning the model will only be required if the extracted text is inaccurate to that shown in the image)
Understanding Critical Thinking and Harvard Referencing | Human Centred Systems Design |
Are you confident that you will achieve the grade? Our best Expert will help you improve your grade
Order Now