Receipt Reader

I had the idea to write an app that reads and parses the contents of a photo of a receipt to allow for easy bill splitting. This idea came to me when splitting a large happy hour check with 15 friends. Personally, I don't care about the money, and after a few beers, I would happily pay more than my share, but some of the other people in our party were pretty anal -- breaking out the calculators, computing the tip down to the exact cent, etc. "There's got to be a better way!" I thought.

Not surprisingly, somebody already had this idea, but I thought this would be a good exercise, and that I might be able to write a more robust algorithm compared with the apps already on the market.

Backend

I wound up writing the backend for a receipt reader app. This is a flask app hosted on heroku. It accepts GET requests with a receipt image attached, and returns the parsed contents of the receipts as JSON. You can test it out by downloading this image and sending it to the heroku server with with the command curl -F "file=@receipt.jpg" https://receipt-reader-bk.herokuapp.com/ and, after a 15-second delay, the server should response with

{
  "items": [
    {
      "name": "Hac Cheese", 
      "price": 7.0, 
      "quantity": 1
    }, 
    {
      "name": "Madras Chicken Sando", 
      "price": 14.0, 
      "quantity": 1
    }, 
    {
      "name": "Mussels", 
      "price": 5.33, 
      "quantity": 1
    }, 
    {
      "name": "1/3 Charcuterie", 
      "price": 4.0, 
      "quantity": 1
    }, 
    {
      "name": "Brussel Sprouts", 
      "price": 2.0, 
      "quantity": 1
    }, 
    {
      "name": "Fries", 
      "price": 4.0, 
      "quantity": 1
    }, 
    {
      "name": "2 Half HA Tina", 
      "price": 10.0, 
      "quantity": 1
    }, 
    {
      "name": "Half SOB Goat", 
      "price": 5.0, 
      "quantity": 1
    }, 
    {
      "name": "Halt Beatbock", 
      "price": 4.0, 
      "quantity": 1
    }, 
    {
      "name": "Fun Lost Abbey", 
      "price": 6.0, 
      "quantity": 1
    }, 
    {
      "name": "Fun Wreckage", 
      "price": 7.0, 
      "quantity": 1
    }, 
    {
      "name": "Half A eman", 
      "price": 4.0, 
      "quantity": 1
    }, 
    {
      "name": "Full Whiner", 
      "price": 8.0, 
      "quantity": 1
    }
  ], 
  "subtotal": 80.33, 
  "total": 88.57
}

Not quite perfect, but pretty damn good. I think it got all the prices right. Another example image is here.

Algorithm

Preprocessing

The preprocessing script takes a possibly-shitty input image, and prepares it for OCR. Specifically,

I initially experimented with directly detecting the corners using templates, rather than detecting the edges and then computing their intersection. I've found the method desribed above to be more robust to cast shadows that fall across the receipt and to work better when the receipt is not lying flat on the surface.

The notebook used to produce develop the preprocessing script can be viewed here or downloaded here.

OCR

I used Azure's computer vision API to do the OCR. Google and Amazon also offer such services, but Azure's API was the only one that returns the bounding box of each of the words. This is very important for parsing the contents. In addition to the word bounding boxes, the Azure API also tries to group the words into sections. I didn't find this to be useful at all.

Parsing

All of the receipts that I've looked at for this project have the prices right-aligned with a small margin. The algorithm detects the prices first, and then reads the rest of the line:

Finally the results are packed into a JSON object and served by Flask.

The notebook used to produce develop the OCR and content parsing can be viewed here or downloaded here.

Front end

Although I've written simple android apps for using my phone's sensors, I'm hardly an expert on Android development (I lean pretty hard on the IDE to help me out), so there's no functional frontend for this app. If you have Android development experience and are interested in building this out, send me an email!