Simplify: move code into database functions
2015-05-04If you are a web or API developer, programming code that uses an SQL database, this is for you.
I’ve found a very different and useful way to structure code. It’s made such a big difference for me that I had to share it here.
How things are
Most web development — whether custom or using frameworks like Rails, Django, Laravel, Sinatra, Flask, and Symfony — tends to work the same way:
- At the core is a database, which is just the storage of data.
- All intelligence is in Ruby/Python/PHP/JavaScript classes.
Why that’s bad
These norms have some dangerous implications:
- Everything must go through these Ruby/Python/PHP/JavaScript classes — including shell scripts and other things not part of this application.
- Nothing else may access the database directly, since doing so may break the rules defined by these surrounding classes.
- The database is treated as dumb storage, even though the database is smart enough to have most of this logic built-in.
- But if you add data logic into the database itself, it’s now duplicated, requiring changing in multiple places if the rules change.
- These two systems — the database and its surrounding code — are coupled and dependent on each other.
- If it’s ever advantageous to switch applications (say from a web app to mobile app, or Python to JavaScript), you’re going to have to re-write all of that data logic.
Simple vs complex
Please go watch this amazing 35-minute talk as soon as possible: Simplicity Matters by Rich Hickey.
Here are his important points for this article:
- “Complex” is objective. It means many things tied together.
- “Simple” is objective. It means one ingredient — the opposite of complex.
- These are unrelated to “easy”. It is easy to install and bind yourself to something very complex (like ORM), and can be hard to build something simple.
- Classes, models, and methods (OOP) are an unnecessary complication.
- Information is simple, so don’t hide it behind a micro-language.
- Work with values directly: hash/map of strings.
- Since a JSON API — a hash/map of strings — is often the eventual interface, it’s even more reason to skip the abstractions and work with values directly.
Why this hit home for me
Databases outlive the applications that access them.
I’ve been using the same SQL database since 1997: same data, values, and SQL tables. But the code around it has changed so many times.
In 1997, I started in Perl. In 1998, I switched to PHP. In 2004, a rewrite in Rails. In 2007, back to PHP. In 2009, minimalist Ruby. In 2012, client-side JavaScript.
Each time I’d have to re-write all of the logic around the database: how to add a new person into the database, how to verify an invoice is correct, how to mark an order as paid, etc.
But that whole time, my trusty PostgreSQL database stayed the same.
Since most of this is data logic, not business logic, it should be in the database.
So I’m putting this data logic directly into PostgreSQL, since I plan to stay with it for many more years, but plan to keep experimenting with programming languages. (Nim, Elixir, Racket, Lua, whatever.)
How things could be
Web developers have been treating the database as dumb storage, but it’s actually quite smart.
It’s simple to have all of this intelligence in the database itself.
It’s complex to have it tied to surrounding outside code.
Once you put all of the intelligence directly into the database, then the outside code disappears!
Then the database is self-contained, and not tied to anything.
Your outside interface can switch to JavaScript, Haskell, Elixir or anything else with ease, because your core intelligence is all inside the database.
How to do it
Table constraints
The easiest place to start is constraints:
create table people (
id serial primary key,
name text not null constraint no_name check (length(name) > 0),
email text unique constraint valid_email check (email ~ '\A\S+@\S+\.\S+\Z')
);
create table tags (
person_id integer not null references people(id) on delete cascade,
tag varchar(16) constraint tag_format check (tag ~ '\A[a-z0-9._-]+\Z')
);
Define what is considered valid/invalid data here.
In my people example above, it says name can’t be empty, email must match that pattern with “@” and “.” and no whitespace. Then it says tags.person_id has to exist in the people table, but if the person is deleted then delete the tags, too. And the tag has to fit that regexp pattern of lowercase letters, numbers, dot, underscore, dash.
It helps to name your constraints for later use in error catching.
Triggers
For things that happen before or after you alter data, use triggers:
create function clean() returns trigger as $$
begin
new.name = btrim(regexp_replace(new.name, '\s+', ' ', 'g'));
new.email = lower(regexp_replace(new.email, '\s', '', 'g'));
end;
$$ language plpgsql;
create trigger clean before insert or update of name, email on people
for each row execute procedure clean();
This example cleans the input before it’s put into the database, in case someone accidentally put a space in their email address, or a line-break in their name.
Functions
Make little re-usable functions for things you’ll use often inside your code.
create function get_person(a_name text, a_email text) returns setof people as $$
begin
if not exists (select 1 from people where email = a_email) then
return query insert into people (name, email)
values (a_name, a_email) returning people.*;
else
return query select * from people where email = a_email;
end if;
end;
$$ language plpgsql;
That’s one I use often: Given someone’s name and email, if they’re not already in my database, add them. Then, either way, return the database info for this person.
Views for JSON
Instead of requiring outside code to convert your data into JSON, you can have the database create JSON directly.
For this, use views as JSON structure templates. Inside the view, use json_agg for nested values.
create view person_view as
select *, (select json_agg(t) as tags from
(select tag from tags where person_id=people.id) t)
from people;
This will be used in the API functions, below:
API functions
These are the only functions your outside code will access.
They return only JSON.
create function update_password(p_id integer, nu_pass text, out js json) as $$
begin
update people set password=crypt(nu_pass, gen_salt('bf', 8)) where id = p_id;
js := row_to_json(r) from (select * from person_view where id = p_id) r;
end;
$$ language plpgsql;
create function people_with_tag(a_tag text, out js json) as $$
begin
js := json_agg(r) from
(select * from person_view where id in
(select person_id from tags where tag = a_tag)) r;
end;
$$ language plpgsql;
No matter what you need to do with your database, the procedural languages built-in to PostgreSQL can do it.
PL/pgSQL is not the most beautiful language, but the simplicity of having everything in the database is worth it.
If you like JavaScript, check out the promising plv8.
Now, if you need a REST API:
require 'pg'
require 'sinatra'
DB = PG::Connection.new(dbconfig)
def qry(sql, params=[])
@res = DB.exec_params('select js from ' + sql, params)
end
after do
content_type 'application/json'
body @res[0]['js']
end
get '/people' do
qry('get_people()')
end
get %r{/people/([0-9]+)} do |id|
qry('get_person($1)', [id])
end
put %r{/people/([0-9]+)} do |id|
qry('update_password($1, $2)', [id, params[:password]])
end
get '/people/tagged' do
qry('people_with_tag($1)', [params[:tag]])
end
Or if you need a client library:
require 'pg'
require 'json'
DB = PG::Connection.new(dbconfig)
def js(func, params=[])
res = DB.exec_params('select js from ' + func, params)
JSON.parse(res[0]['js'])
end
def people
js('get_people()')
end
def person(id)
js('get_person($1)', [id])
end
def update_password(id, newpass)
js('update_password($1, $2)', [id, newpass])
end
def people_tagged(tag)
js('people_with_tag($1)', [tag])
end
That’s it!
Now whether a REST API or client library, all it really has to do is pass the arguments into the database functions, and return the JSON.
I’m not trying to convince everyone to do things this way. But I hope you find it useful or at least interesting to consider.