Skip to main content

TDD with Rspec & Claude - Part 5 - Textract Expense Document that removes geometry.

·1213 words·6 mins
spec-tdd-claude - This article is part of a series.
Part 5: This Article

Previously
#

In the previous article we started our TDD adventure with a simple utility class for encapsulating the value and context of a string extracted by AWS Textract Expense so that we don’t just get the text value but understand what that value means eg. Name, Address.

Create the Expense Document class
#

Now lets build the Expense Document class. In a nutshell this class wraps 8000+ lines of json code and will enable us to access that data in a clean, DSL kind of way.

More details on the why can be found in part 3 of this series.

Prompt to create the Expense Document spec
#

Create an rspec test file for a class called ExpenseDocument. The spec file should have a subject that takes an initializer with the data from spec/fixtures/document.json parsed into a hash as the argumant. The argument should be called document and be exposed as a read only attribute

Unsurprisingly it jumped the gun and created an expense document class too. I’m OK with that.

require 'rails_helper'

RSpec.describe ExpenseDocument, type: :model do
  let(:fixture_path) { Rails.root.join('spec/fixtures/document.json') }
  let(:document_hash) { JSON.parse(File.read(fixture_path), symbolize_names: true) }

  describe '#initialize' do
    subject(:expense_document) do
      described_class.new(document: document_hash)
    end

    ##################################################################
    # NOT SO GOOD: this test is a little unneccessary, it's really
    # testing the test - i.e. creation of the subject.
    ##################################################################
    it 'initializes with a document argument' do
      expect(expense_document).to be_an_instance_of(ExpenseDocument)
    end

    it 'exposes document as a read-only attribute' do
      expect(expense_document.document).to eq(document_hash)
    end

    it 'has the document attribute' do
      expect(expense_document).to respond_to(:document)
    end

    it 'does not allow setting the document attribute' do
      expect(expense_document).not_to respond_to(:document=)
    end
  end
end
class ExpenseDocument
  attr_reader :document

  def initialize(document:)
    @document = document
  end
end

Prompt to create the remove_geometry method!
#

This is where we get into actual code and logic not just simple getters/setters. A large part of any TextractDocument is geometry data that describes the location of the block on the receipt that was removed. I don’t need this and it’s bloating the size of the files I store and use so I want to remove it.

Create a test and method ExpenseDocument#remove_geometry! The method must remove every instance of geometry from ExpenseDocument#document created by the fixture spec/fixtures/document.json

describe '#remove_geometry!' do
    subject(:expense_document) do
      described_class.new(document: document_hash)
    end

    ################################################################
    # GREAT: this isn't any easy thing to test for and I think it does
    # a great job. It's created a geometry_count_before that iterates
    # through the document looking for 'geometry' keys. It checks these
    # keys exist before the method is called and none after.
    ################################################################
    it 'removes all geometry keys from the document' do
      # Verify geometry exists before removal
      geometry_count_before = count_geometry_keys(expense_document.document)
      expect(geometry_count_before).to be > 0

      expense_document.remove_geometry!

      # Verify all geometry keys are removed
      geometry_count_after = count_geometry_keys(expense_document.document)
      expect(geometry_count_after).to eq(0)
    end

    ################################################################
    # GOOD: this is a test to ensure that the method returns
    # self for method chaining. I didn't ask it to do this though 
    # a similar class in my existing app does the same thing...
    ################################################################s
    it 'returns self for method chaining' do
      result = expense_document.remove_geometry!
      expect(result).to eq(expense_document)
    end
  
    ###################################################################
    # NOT SO GOOD: these next two test are overkill. Effectively they're testing 
    # the same thing as the 'removes all geometry keys from the document' above
    # but in a more explcit way.
    # Normally I would favour a more explicit test but the 
    # 'removes all geometry keys from the document' test comprehensively
    # covers the entire document and there is always a chance that geometry
    # data exists outside blocks, summary_fields and line_items
    ###################################################################
    it 'removes geometry from blocks' do
      expense_document.remove_geometry!

      expense_document.document[:blocks]&.each do |block|
        expect(block[:geometry]).to be_nil
      end
    end

    it 'removes geometry from nested structures' do
      expense_document.remove_geometry!

      # Check summary fields
      expense_document.document[:summary_fields]&.each do |field|
        expect(field[:geometry]).to be_nil
        expect(field.dig(:value_detection, :geometry)).to be_nil
        expect(field.dig(:label_detection, :geometry)).to be_nil
      end

      # Check line item groups
      expense_document.document[:line_item_groups]&.each do |group|
        group[:line_items]&.each do |item|
          item[:line_item_expense_fields]&.each do |field|
            expect(field[:geometry]).to be_nil
            expect(field.dig(:value_detection, :geometry)).to be_nil
            expect(field.dig(:label_detection, :geometry)).to be_nil
          end
        end
      end
    end

    ##################################################################
    # GREAT: it's created a method to iterate through every key in the
    # hash counting geometry keys
    ##################################################################
    def count_geometry_keys(obj, count = 0)
      case obj
      when Hash
        count += 1 if obj.key?(:geometry)
        obj.each_value { |v| count = count_geometry_keys(v, count) }
      when Array
        obj.each { |item| count = count_geometry_keys(item, count) }
      end
      count
    end
  end

But wait, I spotted a problem…
#

The count_geometry_keys method is looking for a key :geometry and I’m sure my Textract hash has string keys.

def count_geometry_keys(obj, count = 0)
  case obj
  when Hash
    count += 1 if obj.key?(:geometry)
    obj.each_value { |v| count = count_geometry_keys(v, count) }
  when Array
    obj.each { |item| count = count_geometry_keys(item, count) }
  end
  count
end

The problem starts here where the test chooses to symbolize_names: true not sure why it did that. It makes the code very brittle as every time a document object gets passed to ExpenseDocument the caller has to remember to symbolize the keys.

RSpec.describe ExpenseDocument, type: :model do
  let(:fixture_path) { Rails.root.join('spec/fixtures/document.json') }
  let(:document_hash) { JSON.parse(File.read(fixture_path), symbolize_names: true) }

I could:

  1. Change the code so that it doesn’t symbolize_names
  2. Set the document as a Rails HashWithIndifferentAccess which doesn’t care whether the key is a string of a symbol.

Option 2 is the more robust so I will tell Claude to make the change - TBH normally I would do this by hand by I’m testing how much Claude can do.

Prompt to fix by telling Claude to use the HashWithIndifferentAccess
#

Change the assignment of the document argument in the ExpenseDocument initializer so that the document is a HashWithIndifferentAccess

################################################################################
# GOOD: I'm glad I asked Claude to do this rather than make the change by hand
# I was unaware that Hash in Rails has the #with_indifferent_access method. 
# I would have created a less elegant
#   
# @document = HashWithIndifferentAccess.new(document)
################################################################################

def initialize(document:)
  @document = document.with_indifferent_access
end

It also upated the specs. TBH the second test is a bit irrelevant as it’s basically testing the HashWithIndifferentAccess class which isn’t part of our code. I’ll leave it in but it could easily be deleted.

it 'exposes document as a read-only attribute' do
  expect(expense_document.document).to be_a(ActiveSupport::HashWithIndifferentAccess)
end

it 'allows accessing document keys with both symbols and strings' do
  expect(expense_document.document[:blocks]).to eq(expense_document.document['blocks'])
end

In a nutshell
#

  • NOT SO GOOD: Claude created a test that was effectively testing its own test code.
  • GREAT: Claude impressed me with a test that ensured there were geometry objects present in the hash, ran the remove_geometry! method then ensured there were none using the same code.
  • GREAT: It created a utility method that iterated through all the keys in the hash counting the number of geometry objects and re-used this - see above.
  • GOOD: it created a separate test to ensure the return value from the method was the subject. By making this separate Claude ensured a single test only tested for one thing.
  • NOT SO GOOD: it created a couple of tests that, logically, were testing the same thing as a previous test.
  • GOOD: I spotted a flaw where Claude expected the test hash to have symbolic keys when the reality is they are likely to be strings. When prompted to fix this using a HasWithIndifferentAccess it used a neat solution that I didn’t know existed.

Up Next
#

I used Claude & TDD to create tests and methods to implement the #tables and #fields methods.

spec-tdd-claude - This article is part of a series.
Part 5: This Article