Saturday, June 30, 2012

Belt and Braces Error Checking

SQLite does not enforce column types.  You can have a numeric column but store the string "abc" in one of its rows.  SQLite is a dynamically typed database.

I can’t find an authoritative explanation of why this is so.  It could be for maximum compatibility with other DB engine CREATE TABLE statements.  Or it could be because the database is the wrong place to put your validation.

The argument that the database is the wrong layer to do type checking is compelling at first pass:

I’m not convinced that the limited and highly complex form of column typing provided by most SQL DBs is beneficial. Yes, it can prevent stupid mistakes like putting a string where an integer should be, but “integer” does not sufficiently catch the majority of business rules. It may need to be constrained within a range, for instance, which the database can’t help you with. In this case you end up having to implement a second safety layer in code, where you can be explicit and precise about the kinds of values that are allowed, and even their relationships (those that are not adequately mapped onto SQL constraints).  [grayvedigga on reddit]

… Take a web application: validation in the browser in Javascript, conversion from strings on form submission, column type enforcement at the database layer. There is a tremendous duplication of effort here, which makes code hard to maintain and concise expression of business rules impossible

Aha of course!  I said it sounds compelling.

But I think its wrong.

The persistence layer should enforce type by default.

The routine web app example is an excellent illustration:

  1. You have to validate in the client-side using Javascript, to give good meaningful feedback to the user and be user friendly
  2. You have to validate in the business logic in the server, because the client is untrusted and may not even support Javascript (says a habitual NoScript user)
  3. And you have to validate in the persistence layer to catch the bugs in the server code!

When you do manage to jam nonsense into SQLite3, you get nonsense out too:

sqlite> CREATE TABLE t (i INTEGER);
sqlite> INSERT INTO t VALUES ("abc");
sqlite> INSERT INTO t VALUES (2.5);
sqlite> SELECT * FROM t;
abc
2.5
sqlite> SELECT SUM(i) FROM t;
2.5
sqlite> SELECT MAX(i) FROM t;
abc

SQLite does do some checking, of course.  It checks that integer primary keys are integer.  It checks that NOT NULL columns are not null.  It enforces referential integrity.

Its even tracking what a column is declared as, which it terms ‘affinity’.  It just doesn’t enforce it.  It’d be a single if-statement after its converted a value to see if it matches the affinity type.  But it doesn’t check.

We are just a configuration flag (when creating a database) away from it enforcing column types.  That’d be the solution: SQLite to let you say when you create the database that you want enforcement of column types.

It has only happened to me once or twice that my rigidly typed databases have complained about data types; but if they hadn’t caught my errors, it would have been catastrophic!

At least SQLite isn’t dynamic schema too,  That’d open up whole new opportunities for trivial spelling mistake bugs!

I heard of a trendy project storing all its data in some fancy (as in hipster, lightweight to the point of omission and unstable) key-value store.  All the records were little JSON envelopes.  And they realised (it being a common realisation apparently) that something like 50% of their storage was actually the keys and not values.  So they decided to change commentID to CID or something like that.  But they missed one script somewhere and that went around adding commentID fields to records that already had a CID assigned.  So suddenly they had corrupt records in that they had both commentID and CID keys and the ambiguity that entailed and it all went pear-shaped very quickly and took quite a lot of hassle to clean up.

This woeful tale is analogous to the problem I see with a schema-based persistence layer not doing rigid type checking.

The persistence layer should validate input because I trust that layer to police its simplified laws (that the column is integer, or a valid date or such) much more than I trust my code.  I am a mere mortal.  I can so easily miss a check somewhere and not realise.

I shudder at the thought of databases with the string "NULL" in a date column and "Y" in a boolean.

Writing the business logic in a statically typed language or using policing at an ORM level do mitigate this and reduce the chances of bad data even getting submitted to the persistence layer, but even those can be buggy or leave gaps.

At a higher abstraction, I’m unhappy with Postel’s Law

be conservative in what you do, be liberal in what you accept from others

It just means that each extension to a component has more and more legacy - technical debt, of a kind - to juggle and will end up complected.

My Law: frameworks should always be strict in what they accept.

If the database rejects a write, programmers view that as their problem and not a limitation in the persistence layer.  A promiscuous database is just letting programmer bugs through.  Not cool.

You need belt and braces error checks.  Every layer should be strict in what it receives.  Assert always.

Notes

  1. c0d3 reblogged this from williamedwardscoder
  2. williamedwardscoder posted this

 ↓ click the "share" button below!