Skip to main content

TDD with Rspec & Claude - Part 4 - Textract Expense Value.

·1621 words·8 mins
spec-tdd-claude - This article is part of a series.
Part 4: This Article

Previously
#

In the previous article in this series I described both Problem and Solution Space along with requirements for a utility Expense Value class and an Expense Document class that makes use of that utility class.

Before Starting - Prepare a Test Fixture
#

When I first started Rails way back - Rails 2.1 I seem to remember - tests relied on fixtures to run. Fixtures are pre-built data that exist in a separate file to the test being run a simple example would be:

users.yml

- parent:
    id:         1
    parent_id:  NULL
    title:      Parent
- child:
    id:         2
    parent_id:  1
    title:      Child

I never liked this because you have to read and understand the data in the fixtures in order to understand what the test is doing.

Typically I use FactoryBot to create mock data that is created within the test file where it is so much easier to read and understand what mock data is important for the test.

I bring this up only because for this Textract code I am reverting back to testing against a fixture simply because the textract data is so highly structured it would be all too easy to make a mistake creating mock test data and then have production code pass for the wrong reason.

When Textract reads an image it returns a documents collection that I store on aws s3 in one file as a collection of connected documents though in reality it’s almost always one document. I downloaded one to use as fixture but this fixture contains an array of documents and all I wanted was a fixture that contains just the first one.

Copy and pasting such a large number of rows is prone to error so I asked Claude to do the heavy lifting for me:

use the fixture spec/fixtures/documents.json to create a file called spec/fixtures/document.json that is the first document in the documents array

Claude complained about the file size and tokens but still went on to extract 8009 lines of document json straight into my document.json file exactly as I wanted.

Create the Expense Value class
#

We want to create a simple class with three attributes and one method:

  • value
  • confidence
  • context
  • sensitive?

I explain why in the previous section of this series so for now I’ll go straight into the TDD.

Prompt to create the test for the Expense Value class (which does not exist yet)
#

Create an rspec test file for a class called ExpenseValue. The spec file should have a subject that takes an initializer with three named argumants, value, confidence and context.

Claude suggested this which went beyond my expectations:

require 'rails_helper'

RSpec.describe ExpenseValue, type: :model do
  
  #####################################################################
  # Good: it did what I asked and created an instance of the class being
  # tested as subject with appropriate named arguments.
  #####################################################################
  
  describe '#initialize' do
    subject(:expense_value) do
      described_class.new(
        value: value,
        confidence: confidence,
        context: context
      )
    end
  
    #####################################################################
    # Good: Good that the initializer arguments are set as let{}
    #
    # Not So Good:
    # my only criticism is admittedly a bit pedantic in that the test values 
    # are hard coded. For the test to be valid my class doesn't have to return
    # the string 'Sample Value' or the confidence 0.95, it just has to return 
    # whatever string or number is passed to the initializer.
    #
    # Instead of:
    #     let(:value) { 'Sample Value' }
    #.    let(:confidence) { 0.95 }
    #
    # I would prefer a random value using the Faker gem.
    #     let(:value) { Faker::Lorem.sentence }
    #     let(:confidence) { Faker::Number.within(range: 0..1) }
    #####################################################################

    let(:value) { 'Sample Value' }
    let(:confidence) { 0.95 }
    let(:context) { 'Sample Context' }
  
    #####################################################################
    # Good: I didn't ask it to create tests for the attributes that
    # return values for named arguments but it's a perfectly reasonable
    # to expect these tests to be created and they're fine except maybe
    # the first one which feels a little unneccessary.
    #####################################################################

    it 'initializes with value, confidence, and context' do
      expect(expense_value).to be_an_instance_of(ExpenseValue)
    end

    it 'sets the value attribute' do
      expect(expense_value.value).to eq(value)
    end

    it 'sets the confidence attribute' do
      expect(expense_value.confidence).to eq(confidence)
    end

    it 'sets the context attribute' do
      expect(expense_value.context).to eq(context)
    end
  end
end

Don’t forget, I don’t consider the prompt to be my specification. The specification is the sum total of the test descriptions and the test is the code within those descriptions.

My griping about using hard coded test values is trivial, the test is still easy to read so I’ll accept what claude has created.

Prompt to create the Expense Value class
#

Having accepted the description and test code we may as well get the tests passing by writing the Production code.

“Create a class that will pass the all of the tests in spec/models/event_value_spec.rb”

Claude created the class and ran the tests. Everything works.



###########################################################################################
# Great: it's used the standard ruby idiom attr_reader for simple getter
# and it ran the tests to make sure they all pass.
###########################################################################################

class ExpenseValue
  attr_reader :value, :confidence, :context

  def initialize(value:, confidence:, context:)
    @value = value
    @confidence = confidence
    @context = context
  end
end

Prompt to create the Expense Value class sensitive? Method
#

The initializer and attribute readers are about as simple as it gets but my Solution Space also requires a sensitive? method that tells me if that expense value is considered to be something we don’t want to share with an LLM.

Claude can’t possibly know what I consider to be sensitive in our use case so I need to tell it.

Write tests in the ExpenseValue spec that returns true if the ExpenseValue#context is in any of the following values “NAME”, “RECEIVER_ADDRESS”, “RECEIVER_NAME”

###################################################################################
# GREAT: in my docs above I referred to the method as sensitive? but forgot to
# specify that in my prompt. Claude seems to have understood the fields are 
# PII (Personally Identifiable Information) and named the method accordingly.
# I'll take that.
###################################################################################

###################################################################################
# GOOD: It has  created a let(:context){} in each context that gets used in the 
# subject initializer e.g:
#     described_class.new(
#       value: 'Test Value',
#       confidence: 0.95,
#       context: context
#     )
# with a variable for each context e.g:
#    let(:context) { 'NAME' }
###################################################################################

describe '#pii?' do
    subject(:expense_value) do
      described_class.new(
        value: 'Test Value',
        confidence: 0.95,
        context: context
      )
    end

    context 'when context is NAME' do
      let(:context) { 'NAME' }

      it 'returns true' do
        expect(expense_value.pii?).to be true
      end
    end

    context 'when context is RECEIVER_ADDRESS' do
      let(:context) { 'RECEIVER_ADDRESS' }

      it 'returns true' do
        expect(expense_value.pii?).to be true
      end
    end

    context 'when context is RECEIVER_NAME' do
      let(:context) { 'RECEIVER_NAME' }

      it 'returns true' do
        expect(expense_value.pii?).to be true
      end
    end
  
    ###################################################################################
    # GOOD: Rather than generate random types that are not
    # NAME or RECEIVER_ADDRESS or RECEIVER_NAME to prove the false case
    # Claude has picked two other AWS types from the same document.json.
    ###################################################################################
  
    context 'when context is not a PII field' do
      let(:context) { 'TOTAL' }

      it 'returns false' do
        expect(expense_value.pii?).to be false
      end
    end

    context 'when context is another non-PII field' do
      let(:context) { 'VENDOR_NAME' }

      it 'returns false' do
        expect(expense_value.pii?).to be false
      end
    end
  end

Claude jumped the gun again by adding the method to ExpenseValue instead of only writing the test but I’m not complaining as this is still simple code and Claude has done a good job:

class ExpenseValue
  PII_FIELDS = %w[NAME RECEIVER_ADDRESS RECEIVER_NAME].freeze

  attr_reader :value, :confidence, :context

  def initialize(value:, confidence:, context:)
    @value = value
    @confidence = confidence
    @context = context
  end

  def pii?
    PII_FIELDS.include?(context)
  end
end

In a nutshell
#

  • Good: Claude learned from my previous instruction to use ‘subject’ as the variable name for the class being tested when it wrote the spec.
  • Good: It used let{} to create the initializer variables rather than hard code them directly into subject creation..
  • Not so Good: Claude used hard coded test values such as let(:value) { 'Sample Value' } when random values from the Faker gem in the application would have been more appropriate. For the test to be valid the instance being tested doesn’t have to return ‘Sample Value’ it just has to return whatever string is passed to it in the constructor.
  • Good: When it created the Expense Value class claude used the standard ruby idiom of attr_reader to expose these arguments as attributes - I had wondererd if it might try to write its own getter methods.
  • Great: I asked claude to write a test to see if the context attribute matched a number of fields like NAME or RECEIVER_ADDRESS, forgot to give the test a name and claude automatically used the name pii? Having, correctly, assumed this was to identify sensitive personal information.
  • Good: When testing whether the pii? method was working it used a local variable for context e.g. let(:context) { 'NAME' } which kept the tests nicely DRY (though my choice of the attribute ‘context’ does make things a little less easier to read.)
describe '#pii?' do
    subject(:expense_value) do
      described_class.new(
        value: 'Test Value',
        confidence: 0.95,
        context: context ##### shared subject code uses context variable
      )
    end

    context 'when context is NAME' do
      let(:context) { 'NAME' } ##### each test sets that context variable

      it 'returns true' do
        expect(expense_value.pii?).to be true
      end
    end

    context 'when context is RECEIVER_ADDRESS' do
      let(:context) { 'RECEIVER_ADDRESS' } ##### each test sets that context variable

      it 'returns true' do
        expect(expense_value.pii?).to be true
      end
    end
  • Good: when testing the pii? method against various context variables it used real options from the fixture such as NAME for a positive result and TOTAL for a negative result rather than abstract strings . This makes the test easier to follow.

Up Next
#

Create the Textract Wrapper class Expense Document and write the test & code for its remove_geometry! method.

spec-tdd-claude - This article is part of a series.
Part 4: This Article