Skip to main content

TDD with Rspec & Claude - Part 6 - Textract Expense Document that returns data.

·775 words·4 mins
Author
Steve Creedon
Engineering Lead
spec-tdd-claude - This article is part of a series.
Part 6: This Article

I need to extract fields (summary_fields) and tables (line_item_groups) each with an array of line_items

Summary Fields to Fields
#

Summary Fields are blocks of text such as Vendor name or address. A typical summary field looks like this:

{
      "page_number": 1,
      "type": {
        "confidence": 99.87956237792969,
        "text": "VENDOR_ADDRESS"
      },
      "value_detection": {
        "confidence": 99.7436294555664,
        "text": "Manchester Terminal 2\nManchester Airport\nManchester\nM90 1QX"
      }
    }

and is one element of the summary_fields array.

Prompt to extract summary_fields as the fields method
#

Create tests and a method ExpenseDocument#fields that returns an array of ExpenseValue objects. The ExpenseValue Objects are created from summary_fields with the value being the value_detection.text, the context being the type.text and the confidence being the value_detection_context.

describe '#fields' do
    subject(:expense_document) do
      described_class.new(document: document_hash)
    end

    it 'returns an array' do
      expect(expense_document.fields).to be_an(Array)
    end

    ##################################################################
    # GOOD: Never seen the all(be_an(ExpenseValue)) before - I like it
    ##################################################################
    it 'returns an array of ExpenseValue objects' do
      expect(expense_document.fields).to all(be_an(ExpenseValue))
    end

    it 'creates ExpenseValue objects from summary_fields' do
      fields = expense_document.fields
      expect(fields.size).to eq(document_hash[:summary_fields].size)
    end

    it 'sets the value from value_detection.text' do
      first_field = expense_document.fields.first
      expected_value = document_hash[:summary_fields].first.dig(:value_detection, :text)
      expect(first_field.value).to eq(expected_value)
    end

    it 'sets the context from type.text' do
      first_field = expense_document.fields.first
      expected_context = document_hash[:summary_fields].first.dig(:type, :text)
      expect(first_field.context).to eq(expected_context)
    end

    it 'sets the confidence from value_detection.confidence' do
      first_field = expense_document.fields.first
      expected_confidence = document_hash[:summary_fields].first.dig(:value_detection, :confidence)
      expect(first_field.confidence).to eq(expected_confidence)
    end

    it 'creates correct ExpenseValue objects for all summary_fields' do
      fields = expense_document.fields

      fields.each_with_index do |field, index|
        summary_field = document_hash[:summary_fields][index]
        expect(field.value).to eq(summary_field.dig(:value_detection, :text))
        expect(field.context).to eq(summary_field.dig(:type, :text))
        expect(field.confidence).to eq(summary_field.dig(:value_detection, :confidence))
      end
    end
  end
###################################################################
# GOOD: nice bit of defensive coding:
#     return [] unless @document[:summary_fields]
####################################################################
def fields
  return [] unless @document[:summary_fields]

  @document[:summary_fields].map do |summary_field|
    ExpenseValue.new(
      value: summary_field.dig(:value_detection, :text),
      confidence: summary_field.dig(:value_detection, :confidence),
      context: summary_field.dig(:type, :text)
    )
  end
end

Prompt to extract tables
#

Create tests and a method ExpenseDocument#tables that returns an array of line_item_groups. Each line item group returns an array of ExpenseValue objects. The ExpenseValue Objects are created from line_items with the type.text “EXPENSE_ROW”. The value being the value_detection.text, the context being the type.text and the confidence being the value_detection_context.

describe '#tables' do
    subject(:expense_document) do
      described_class.new(document: document_hash)
    end

    it 'returns an array' do
      expect(expense_document.tables).to be_an(Array)
    end

    it 'returns an array of arrays' do
      expect(expense_document.tables).to all(be_an(Array))
    end

    it 'returns one array per line_item_group' do
      expect(expense_document.tables.size).to eq(document_hash[:line_item_groups].size)
    end

    it 'each array contains ExpenseValue objects' do
      expense_document.tables.each do |table|
        expect(table).to all(be_an(ExpenseValue))
      end
    end

    it 'creates ExpenseValue objects only from EXPENSE_ROW items' do
      first_table = expense_document.tables.first
      first_group = document_hash[:line_item_groups].first

      expense_row_count = first_group[:line_items].count do |item|
        item[:line_item_expense_fields].any? { |field| field.dig(:type, :text) == 'EXPENSE_ROW' }
      end

      expect(first_table.size).to eq(expense_row_count)
    end

    it 'sets the value from EXPENSE_ROW value_detection.text' do
      first_table = expense_document.tables.first
      first_expense_value = first_table.first

      first_group = document_hash[:line_item_groups].first
      first_line_item = first_group[:line_items].first
      expense_row = first_line_item[:line_item_expense_fields].find do |field|
        field.dig(:type, :text) == 'EXPENSE_ROW'
      end

      expect(first_expense_value.value).to eq(expense_row.dig(:value_detection, :text))
    end

    it 'sets the context from EXPENSE_ROW type.text' do
      first_table = expense_document.tables.first
      first_expense_value = first_table.first

      expect(first_expense_value.context).to eq('EXPENSE_ROW')
    end

    it 'sets the confidence from EXPENSE_ROW value_detection.confidence' do
      first_table = expense_document.tables.first
      first_expense_value = first_table.first

      first_group = document_hash[:line_item_groups].first
      first_line_item = first_group[:line_items].first
      expense_row = first_line_item[:line_item_expense_fields].find do |field|
        field.dig(:type, :text) == 'EXPENSE_ROW'
      end

      expect(first_expense_value.confidence).to eq(expense_row.dig(:value_detection, :confidence))
    end

    it 'creates correct ExpenseValue objects for all line items' do
      expense_document.tables.each_with_index do |table, group_index|
        line_item_group = document_hash[:line_item_groups][group_index]

        table.each_with_index do |expense_value, item_index|
          line_item = line_item_group[:line_items][item_index]
          expense_row = line_item[:line_item_expense_fields].find do |field|
            field.dig(:type, :text) == 'EXPENSE_ROW'
          end

          expect(expense_value.value).to eq(expense_row.dig(:value_detection, :text))
          expect(expense_value.context).to eq('EXPENSE_ROW')
          expect(expense_value.confidence).to eq(expense_row.dig(:value_detection, :confidence))
        end
      end
    end
  end
def tables
    return [] unless @document[:line_item_groups]

    ############################################################################
    # GOOD ENOUGH: this is a little tricker than #fields as the code has to loop
    # through line_item_groups then line_items but only creating ExpenseValue objects
    # where the line_item is of type 'EXPENSE_ROW'
    # if it were me I probably would have split this into two methods but 
    # this works, happy to keep it.
    ############################################################################
    @document[:line_item_groups].map do |line_item_group|
      line_item_group[:line_items].map do |line_item|
        expense_row = line_item[:line_item_expense_fields].find do |field|
          field.dig(:type, :text) == 'EXPENSE_ROW'
        end

        next unless expense_row

        ExpenseValue.new(
          value: expense_row.dig(:value_detection, :text),
          confidence: expense_row.dig(:value_detection, :confidence),
          context: expense_row.dig(:type, :text)
        )
      end.compact
    end
  end

In a nutshell
#

  • GOOD: I learned some neat new syntax for ensuring all objects in a collection are of a specific type
  • GOOD: Claude put in some nice defensive coding in specific array in the document didn’t exist.
  • GOOD ENOUGH: Claude created some code that was a little verbose - I had to read it a few times before understanding it but it was acceptable.

Thats a Wrap !
#

I hope you found this interesting. Most of all, if you’ve been hesitant on getting started with AI coding I hope this gives you a framework and enough confidence to get started.

If there’s something you think I’ve missed in the articles let me know and I will do my best to add it.

spec-tdd-claude - This article is part of a series.
Part 6: This Article