The quality of an OCR is crucial to the task of accurately extracting the information of interest, and the modeling of text-based classifiers, e. The figure below shows the stages to achieve those goals, and our attempt to address the challenges stated above. A receipt is captured via a camera, and the image is passed to the Logo Recognizer of Retailer Recogniser in Information of Interest Extractorand the Text Line Localizerwhere the outputs, i.
The output of an OCR is a string of characters. One of the predicted results from both the Logo Recognizer and the Text-based Retailer Recognizer is selected. Both supports multiple languages. There are other OCRs out there which mostly are licensed. In other words, it locates lines of text in a natural image. It is a deep learning approach based on both recurrent neural network and convolutional network. It is hope that when an image is broken up into smaller regions, before passing them into an OCR, it will help to boost the OCR performance.
In a small set of samples, we find performance gain when feeding a whole image versus a sub-image into an OCR.Text Recognition (OCR) using Cognitive Service, Microsoft Flow, and Power Apps
The authors claimed that the algorithm works reliably on multi-scale and multi- language text without further post-processing and computationally efficient. Logo Recognizer This model recognises a retailer based on its logo. In this example, we use Custom Vision to build a custom model for recognizer retailer based on how the receipt look. Custom Vision allows you to easily customize your own computer vision models that fit with your unique use case.
It requires a couple of dozens of samples of labeled images for each classes. In this example, we train a model using the whole image receipts. It is possible to only provide a specific region during training, e. When comes to prediction, either only the top region, or the whole receipt can be fed into the predictor.
Figures below show the overal performance and some experimentation results with a small number of class, namely, rail 33bandq 14pizzaexpress 18walmart 34 and asda 26and respective number of samples uploaded to Custom Vision. The table below shows some exemplar results.
This model classifies well most of the time for receipts from known retailers, and able to distinguish a receipt from a non-receipt see row 6e. Note the confident probability socres shown. In row 2 and 4, there exist a test image with multiple bandq receipts and a test image with multiple walmart test receipts.
This is just to show how the model will behave in corner cases like this. In practice, restrictions, such as onle one single receipt is allowed at one time, can be put in place. Row 7 shows receipts from retailers that the model has no knowledge of. Unforturnately, the classifier is rather confused when a receipt which does not belong to any of the known classes. To address the issue above, try adding a class called otherswhich will be a collection of receipts that has no logos, or any receipts that are not the intended 5 classes.
How to decide between these two options will depend on the requirement specific to applications. The figures below shows the performance of a different model which has the class others incorporated.
In this example, 76 samples uploaded to Custom Vision for the model building.A couple of weeks ago I was given the opportunity of working with a partner to build a solution that would hopefully help them automate their expense receipts processing.
Whilst the scenario sounded simple there was a need for a reliable infrastructure to enable the processing, i. This is referenced in another blog post by one of my colleagues on the team that can be found here. So why Azure Functions? Next I am going to walkthrough some of the key pieces of the solution and then finally provide instructions on where to learn how you would go about setting up continuous deployment to Azure using Visual Studio Team Services.
All the code I describe in this blog post can be found on GitHub. There are 2 folders: a simple image uploader console application and, the azure functions to process the image. Please feel free to download the solutions and try out the code yourself. For instructions on how to deploy and run, please refer to the pre-requisite and setup instructions outlined in the readme documents in each folder. For details on where to download please refer here. The function itself is triggered by a message being added to the Azure Storage Queue, receiptQueueItem.
As well as receiptQueueItem there are several other important parameters of this function, namely:. Step 0 in the case statement is responsible for the primary activity of this function and is the the one that provides the SmartOCRService with the necessary information so that it can process the image:.
The message has 4 properties:. An added advantage of using a queue rather than a direct request to the SmartOCRService is that the role of the ExpenseProcessor is now temporarily complete until the OCRCallback function triggers the state change or continuation of the workflow.
Step 1 of the process provides a placeholder to communicate when the processing of the image is complete success or failure. In this simplified case, the code simply updates the receiptsTable to highlight the final status of the process. To identify which image has been processed, the ItemId is a property of the message payload.
As you will see as part of the OCRSmartService, there are several non-catastrophic scenarios which we may want to handle by retrying the process. This step simply restarts the process by following the steps executed as part of Step 0. This new workflow state will either be Complete, Error or Retry. Its important to note that it was a desired condition by the partner to have this separation of concerns between the ExpenseProcessor and the SmartOCRService — it is imagined that overtime more processors will be put in place e.
InvoiceProcessor, OrderProcessor, etc. As per the previous ExpenseProcessor function, the function is triggered when a message is added to the Azure Storage Queue QueueTrigger described by the parameter ocrQueue. Key thing to note within this function: As the function is dependent on the OCR Service being available it needs to handle the exception when the service is unavailable.
If the service is unavailable the requestor may want to inform the user that they need to try later, or in our case automate that process by having the requestor retry the whole process automagically. By default, if the function fails there will be a maximum of 5 retry attempts. If the last retry attempt fails then the original queue message is added to a poison-message queue.
By adding the message to a poison-message queue means the message will not be acted upon again but provides a user some notification that the message has failed. In our case we wanted to override this behaviour by preventing this message being added to the poison-message queue. Note, to monitor the number of retries the function has tried then add the parameter dequeueCountwhich is type int to the signature of the Run … method.
It was a requirement of the Partner that they needed this continuous infrastructure in place. The original plan was to investigate how this could be done by trying out ourselves the setup within Visual Studio Team Services, but after some researching on the internet we found a great blog which set out the steps perfectly.
The final requirement the partner had was being able to monitor their Azure Functions. After several iterations it was decided that to get the best insights into how the application was performing was to integrate Microsoft Application Insights.
If you are interested in using Application Insights inside your Azure Functions then I would suggest you read the following blog post found here. If you find Application Insights is overkill for your projects then I would suggest having a look at the following documentation. Your email address will not be published. The scenario was simple: Upload Image.Accelerate your business processes by automating information extraction.
With just a few samples, Form Recognizer tailors its understanding to your documents, both on-premises and in the cloud. Turn forms into usable data at a fraction of the time and cost, so you can focus more time acting on the information rather than compiling it. Easily pull data and organize information with prebuilt and custom features—no manual labeling required. Get output tailored to your layouts with automatic custom extraction, and improve it with human feedback. Ingest data from the cloud to the edge and apply to search indexes, business automation workflows, and more.
Form Recognizer uses pre-built and unsupervised learning components to understand the layout and relationships between fields and entries in your documents, to pull information in an organized manner. Bill To: Contoso, Ltd. Phone: Invoice : Fax: Email: contoso example. Invoice For: Project 2. Invoice Subtotal: 2, Tax Rate: 8. Sales Tax: Other: 0. Terms: Total due in 90 days.
TOTAL: 2, Qty: 39, 40, 30, 40, 10, 5, 70, 25, 5, 80, 65, Unit Price: 5. Price: The sales receipt information was extracted using pre-built receipt model. All other examples were trained using a custom model with five PDF files of each form type. The custom extraction capabilities in Form Recognizer help you overcome this challenge by training on your own data based on just five documents.
Not only is the first output more reliable and tailored to your needs, but also you can provide human inputs to create a highly accurate model customized to your forms. Recognize forms on the edge, on-premises, and in the cloud with container support in Azure Cognitive Services. Use the REST interface of the Form Recognizer API to then integrate into cognitive search indexes, automate business processes, and create custom workflows for your business.
Automating Receipt Processing
Make data-driven decisions by extracting data from tables and forms and putting it into your data visualization service for analysis. Easily find specific information in your documents and forms, such as total accounts payable, by integrating Form Recognizer with Azure Cognitive Search.
Extract text, key-value pairs, and tables from forms and receipts, and pipe them into your back-end systems to perform tasks such as claim, invoice, and receipt processing.
Form Recognizer offers free and standard pricing options to extract valuable information from documents at a fraction of the price of manual extraction. Tutorials, API references, and other resources show you how to automate form processing for a broad range of scenarios.One of the newest members of the Azure AI portfolio, Form Recognizerapplies advanced machine learning to accurately extract text, key-value pairs, and tables from documents.
With just a few samples, it tailors its understanding to supplied documents, both on-premises and in the cloud. Form Recognizer focuses on making it simpler for companies to utilize the information hiding latent in business documents such as forms. Business expense reporting can be cumbersome for everyone involved in the process.
Manually filling out and approving expense reports is a significant time sink for both employees and managers. Aside from productivity lost to expense reporting, there are also pain points around auditing expense reports.
A solution to automatically extract merchant and transaction information from receipts can significantly reduce the manual effort of reporting and auditing expenses. Given the proliferation of mobile cameras, modern expense reports often contain images of receipts that are faded, crumpled up, or taken in suboptimal lighting conditions.
Existing receipt solutions often target high quality scanned images and are not robust enough to handle such real-world conditions. Form Recognizer eases common pain points in expense reporting, delivering real value back to business. By using the receipt API to extract merchant and transaction information from receipts, developers can unlock new experiences in the workforce.
And since the pre-built model for receipts works off the shelf without training, it reduces the speed to deployment. For employees, expense applications leveraging Form Recognizer can pre-populate expense reports with key information extracted from receipts. This saves employees time in managing expenses and travel that they can focus on their core roles.
For central teams like finance within a company, it also helps expense auditing by using the key data extracted from receipts for verification. Using the data extracted, receipts are sorted into low, medium, or high risk of potential anomalies.
This enables the auditing team to focus on high risk receipts and reduce the number of potential anomalies that go unchecked. MSExpense also plans to leverage receipt data extraction and risk scoring to modernize the expense reporting process.
Instead of identifying risky expenses during auditing, such automated processing can flag potential issues earlier in the process during the reporting or approval of the expenses.
This reduces the turnaround time for processing the expense and any reimbursement. The service was simple to integrate and start seeing value. To learn more about Form Recognizer and the rest of the Azure AI ecosystem, please visit our website and read the documentation. Get started by contacting us.This post explores how we can leverage machine learning techniques to help partially automate the processes of accounting and expenditure reimbursement.
Often, such methods require manual input of information from an invoice or receipt, such as total amount spent, tax amount, type of expenditure, transaction date, etc. This code story will demonstrate how multiclass classification algorithms and Optical Character Recognition OCR can be leveraged to predict the type of expense from an imaged receipt automatically.
By the end of this post, reader will be able to build a Xamarin-based expense recognition from imaged receipt with model built using Azure ML Studio deployed as a web service. Before we can predict or recognize the type of expense from a receipt, we must first convert a database of imaged receipts into structured data via OCR to extract the information into text format.
This information is then used to train a predictive model. The figure below shows the overall structure of the solution in Azure Machine Learning ML Studiowith the following assumptions:. This example will load training images from blob storage and extract text using OCR. The data is then used to train a predictive model using a multiclass neural network with default settingsand finally published as a web service.
The figure below shows the distribution of these six classes.
This code should reside within the Execute Python Script module. It sets up parameters for the OCR API, processes requests, and returns a new data frame which contains text extracted from a receipt, and its associated label that is, its expensing category.
While that information is not utilized in this example, it could be useful if the location of the text is of interest.
For more information on this routine, see an example on GitHub. The multiclass decision jungle and multiclass neural network modules have been tested, and the results are as shown below:. The creation of a mobile app to consume the published expense predictor can be achieved by using this Xamarin-based mobile phone app under the MobileApp folder in our example.
Predicting Expense Type from Receipts with Microsoft Cognitive Services
The app will take a picture of a receipt, send it to the web service, and a predicted type of expense will be returned. This section provides detial information about the experiment settings. Readers are welcome do to experiment with different settings and see how they affect the model performance. Please see Channel 9 video for story behind this project. Receipt-recognition is the related GitHub repository. Olga Liakhovich.
Their test device can detect the exis. Mor Shemesh January 11, Hi, does the code for the ocr work with remote images? Meet the Team! Top Bloggers. Paste your code snippet.With Azure Search we try to help you build really great search applications over your data. Through capabilities like the Azure Search Indexerwe have tried to make it convenient to ingest data from common data sources to enable this full text search support.
One file type we have not yet added support for, but is a common ask, is of images. The idea being you have a file such as JPG, TIFF or PDF with embedded images, you might want to be able to extract the text from these images which can be used to enhance your search index. Imagine you have medical imagery, faxes or scanned documents and want to search over them. You can see the sample of how this was accomplished in the following GitHub repository. Here is the general flow of what is done in the sample:.
The main technologies I used to accomplish this were:. A full outline of how to do this can be found in the following GitHub repository. This sample is just a starting point. You will more than likely want to extend it further. Perhaps you will want to add the title of the file, or metadata relating to the file file size, last updated, etc.
You might also want to add a URL reference to the actual image file so you can allow users to open it directly from your application. If you have any questions or feedback on this, please let me in the comments below. Blog Web. I would also like to point out a special thanks to Jerome Viveiros who wrote a great sample on how to use iTextSharp on his post which formed a basis of much of what I used in my sample that extracts the images from the PDF file. I found the code really simple to use and the extracted text was of very high quality.
The text, if formatted into a JSON document to be sent to Azure Search, then becomes full text searchable from your application. Next steps This sample is just a starting point.TLS 1. For more information, see Azure Cognitive Services security. It ingests text from forms and outputs structured data that includes the relationships in the original file. You quickly get accurate results that are tailored to your specific content without heavy manual intervention or extensive data science expertise.
Form Recognizer is comprised of custom models, the prebuilt receipt model, and the layout API. Form Recognizer custom models train to your own data, and you only need five sample input forms to start.
A trained model can output structured data that includes the relationships in the original form document. After you train the model, you can test and retrain it and eventually use it to reliably extract data from more forms according to your needs. You have the following options when you train custom models: training with labeled data and without labeled data.
By default, Form Recognizer uses unsupervised learning to understand the layout and relationships between fields and entries in your forms. When you submit your input forms, the algorithm clusters the forms by type, discovers what keys and tables are present, and associates values to keys and entries to tables. This doesn't require manual data labeling or intensive coding and maintenance, and we recommend you try this method first.
When you train with labeled data, the model does supervised learning to extract values of interest, using the labeled forms you provide. This results in better-performing models and can produce models that work with complex forms or forms containing values without keys. Form Recognizer uses the Layout API to learn the expected sizes and positions of printed and handwritten text elements. We recommend that you use five manually labeled forms of the same type to get started when training a new model and add more labeled data as needed to improve the model accuracy.
Form Recognizer also includes a model for reading English sales receipts from the United States—the type used by restaurants, gas stations, retail, and so on sample receipt. This model extracts key information such as the time and date of the transaction, merchant information, amounts of taxes and totals and more. In addition, the prebuilt receipt model is trained to recognize and return all of the text on a receipt. Form Recognizer can also extract text and table structure the row and column numbers associated with the text using high-definition optical character recognition OCR.
Follow a quickstart to get started extracting data from your forms. We recommend that you use the free service when you're learning the technology. Remember that the number of free pages is limited to per month.