Database trigger recalculates totals, for data integrity

2022-02-27

When you make a database-backed app, you write code to ensure data integrity in related fields.

If an item is sold, you update the inventory. If someone withdraws money, you update their balance. You recalcuate the total and update the related field in the database whenever somebody makes a change.

Usually this is done in your main code: your JavaScript, Python, Ruby, Java, or whatever.

But what if some future code doesn’t use your currently-crucial function? A new back-end interface, API hook, or simple shell script might not use your current code. New code will access the database directly, not using your old code.

I encountered this exact situation at my last company, and felt the pain from inventory and even finances becoming wrong, all because some new code was accessing the database directly.

So that’s why I’m an evangelist now for how important it is to put your crucial code in the database itself. This is data logic (not “business logic”) and should be bound to the data. Database functions can be triggered, ensuring integrity, no matter what outside code is accessing it.

Here’s an example in PostgreSQL, using a shopping cart:

First, make three simple tables:

Items with prices.
Lineitems with quantities.
Invoices with the total price.

Create two example items, a $5 and a $9 item. And create invoice #1 for testing.

create table items (
  id serial primary key,
  price int not null check (price > 0)
);

create table invoices (
  id serial primary key,
  total int
);

create table lineitems (
  invoice_id int not null references invoices(id),
  item_id int not null references items(id),
  quantity int not null check (quantity > 0),
  primary key (invoice_id, item_id)
);

-- example data:
insert into items (price) values (5);
insert into items (price) values (9);
insert into invoices (total) values (0);

download code

Now, you want it to re-calculate the Invoice total whenever Lineitems are changed. That’s a special function called a trigger. First you create the function, then create a trigger that executes the function.

-- re-calculate the total of a lineitem's invoice
create function recalc() returns trigger as $$
begin
  -- update invoice using lineitems's invoice_id
  update invoices set total = (
    select sum(quantity * price)
    from lineitems
    join items on lineitems.item_id = items.id
    where invoice_id = new.invoice_id)
  where id = new.invoice_id;
  return new;
end;
$$ language plpgsql;
-- run this function after any change to lineitems
create trigger recalc
  after insert or update or delete on lineitems
  for each row execute procedure recalc();

download code

Test it by adding Lineitems, and updating their quantities. After each change, look at the Invoice. It works!

insert into lineitems (invoice_id, item_id, quantity) values (1, 1, 1);
insert into lineitems (invoice_id, item_id, quantity) values (1, 2, 1);
select * from lineitems; select * from invoices;

select 'Notice new total when you update quantity:' look;
update lineitems set quantity = 5 where invoice_id = 1 and item_id = 2;
select * from lineitems; select * from invoices;

select 'But when you delete a line? Oh no. Total is still $50' look;
delete from lineitems where invoice_id = 1 and item_id = 2;
select * from lineitems; select * from invoices;

-- Let's try again in recalc-4.sql
drop function recalc() cascade;

download code

But notice if you delete a Lineitem, it doesn’t update the Invoice! What the hell? Why not?

Ah, that’s because the function refers to a record variable called “new”, passed by the trigger on Lineitems, and uses it to get the invoice_id. But when when you delete a Lineitem, the record variable is called “old”, since it refers to a Lineitem that’s already been deleted. Damn.

You could make two separate triggers, one that uses “new”, called only on update and insert operations, and one that uses “old”, called only on deletes. But then you’d be duplicating the calculation of the total. So instead, just add one ugly if/then to assign either “new” or “old” to a variable called “r”, and use “r” instead. Here’s the updated function:

-- re-calculate the total of a lineitem's invoice
create function recalc() returns trigger as $$
declare
  r record;
begin
  -- use "new" lineitems record for insert/update, or "old" if delete
  if (tg_op = 'DELETE') then
    r = old;
  else
    r = new;
  end if;
  -- update invoice using lineitems(now "r")'s invoice_id
  update invoices set total = (
    select sum(quantity * price)
    from lineitems
    join items on lineitems.item_id = items.id
    where invoice_id = r.invoice_id)
  where id = r.invoice_id;
  -- must return incoming "new" or "old" record when done
  return r;
end;
$$ language plpgsql;
-- run this function after any change to lineitems
create trigger recalc
  after insert or update or delete on lineitems
  for each row execute procedure recalc();

download code

Now test adding, updating, and deleting, and notice they all update the Invoice total:

insert into lineitems (invoice_id, item_id, quantity) values (1, 2, 1);
select * from lineitems; select * from invoices;

select 'Notice new total when you update quantity:' look;
update lineitems set quantity = 5 where invoice_id = 1 and item_id = 2;
select * from lineitems; select * from invoices;

select 'Notice new total when you delete:' look;
delete from lineitems where invoice_id = 1 and item_id = 2;
select * from lineitems; select * from invoices;

download code

Cool? You get much more security and peace-of-mind, knowing that no matter what code updates your database, the related fields will be updated automatically.

Download the final example file here: /code/recalc.sql.

After you get this example working, let’s try another example of putting data logic in the database, by making functions for updating the cart.

Also look at using triggers to clean incoming data.