How to Extract Items of Invoice Using Form Recognizer API

2022-01-12 AI-OCR

Table of Contents

In this article, we will talk about how to use Azure Form Recognizer API to extract items from an invoice.

Extract items from Invoice using API

Download sample code from Form Recognizer Studio

After using Azure Form Recognizer Studio to recognize sample invoice (You can read the article below), we can download the sample code.

How to Extract Items From Invoices Using Azure Form Recognizer
https://thats-it-code.com/azure/how-to-use-form-recognizer-to-extract-items-from-receipt/

Download sample code

Extract items from Invoice using API

After downloading the sample code (sample_analyze_invoices.py), move it to a new folder for extracting items from invoice using Form Recognizer API project.

Move the sample code to project folder

Extract items from Invoice using API

Extract items from Invoice using API

Open the project (the above folder) with Visual Studio Code

Click [Open Folder] in VS Code and select the above folder.
Extract items from Invoice using API

Extract items from Invoice using API

You can read the article below to learn how to create a local development environment.

Lets Create a Programming Environment
https://thats-it-code.com/programming/lets-create-a-programming-environment/

Setup endpoint and key for calling API

Let’s open the sample code, you will see endpoint and key variable is not set.

Extract items from Invoice using API

Let’s go to Azure portal to get the two values.
Firstly, enter “congnitive services” in the top search bar of Azure portal.
And click “Cognitive services multi-service account” in the result list.

Extract items from Invoice using API

Click the service name.

Extract items from Invoice using API

Click [Keys and Endpoint] in the left-side menu.
Click Copy button of [KEY 1] and [Endpoint]respectively and paste them to the endpoint and key variables of the sample code.

Extract items from Invoice using API

Extract items from Invoice using API

Install necessary library

To prevent pollution of the global python environment, we can use virtual environment.
I use pipenv library to create and management python virtual environments.
Firstly, let’s install pipenv library using pip command.

pip install pipenv

Extract items from Invoice using API

In the VS Code, press Ctrl+@ to open terminal panel at the bottom of editor.
And select Git Bash shell.
Extract items from Invoice using API

Execute the following command to create a new virtual environments based python 3.

pipenv --python 3

Extract items from Invoice using API

And use the command below to enter the virtual environment.

pipenv shell

Extract items from Invoice using API

Now let’s see what libraries are used in the sample code.
As you can see, Azure Form Recognizer library is impoted.

Extract items from Invoice using API

Let’s install azure form recognizer library.

pip install azure-ai-formrecognizer==3.2.0b2

Extract items from Invoice using API

But when finished installation, the import warnings still exist.
We have to change the Python intepreter.

Extract items from Invoice using API

This time, the import warnings will gone.

Extract items from Invoice using API

Execute sample code

Next, let’s execute the sample code.
Execute the command below in the terminal panel at the bottom of VS Code.

python sample_analyze_invoices.py

The result below will show in the terminal.

Extract items from Invoice using API

Extract items from local sample invoice

We can also extract invoice items from local files by modifying some lines.
Firstly, let’s prepare the local sample invoice file.
Let’s place the image below into data folder in project.
This image is from Internet.

Extract items from Invoice using API

Let’s comment out formUrl and replace it with opening the local invoice sample file.

# formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/invoice_sample.jpg"
with open("data/sample-invoice.png", "rb") as f:
    formData = f.read()

And replace begin_analyze_document_from_url method with begin_analyze_document method.

# poller = document_analysis_client.begin_analyze_document_from_url("prebuilt-invoice", formUrl)
poller = document_analysis_client.begin_analyze_document("prebuilt-invoice", formData)
invoices = poller.result()

And let’s execute our code again. The invoice items also be extracted successfully.

Extract items from Invoice using API

Conclusion

In this article, we use Form Recognizer API to extract invoice items from invoice files. We modified the sample code downloaded from Form Recognizer Studio page and extracted items from the local invoice file successfully.

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us