Monday, April 4, 2016

CHECK constraint for MySQL - NOT NULL on generated columns

Update: Starting with MySQL 8.0.16 we do have CHECK constraints implemented in SQL! See here.

During our recent TechTour event the idea came up to implement JSON document validation not necessarily via foreign keys (as I have shown here) but to define the generated column as NOT NULL. The generation expression must be defined in a way that it returns NULL for invalid data.
DISCLAIMER: This has already been explored by yoku0825 in his blogpost. He deserves all credit!

Let's do a short test:

mysql> CREATE TABLE checker ( 
    i int, 
    i_must_be_between_7_and_12 BOOLEAN 
         AS (IF(i BETWEEN 7 AND 12, true, NULL))  
         VIRTUAL NOT NULL);
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO checker (i) VALUES (11);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO checker (i) VALUES (12);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO checker (i) VALUES (13);
ERROR 1048 (23000): Column 'i_must_be_between_7_and_12' cannot be null




As you can see I used the column name to create a meaningful error message when inserting invalid data. It is perfectly possible to add a generated validation column for each data column so that you run several check constraints.
Or you can even check a combination of columns:

mysql> CREATE TABLE squares (
     dx DOUBLE, 
     dy DOUBLE, 
     area_must_be_larger_than_10 BOOLEAN 
           AS (IF(dx*dy>10.0,true,NULL)) NOT NULL);
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO squares (dx,dy) VALUES (7,4);

Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO squares (dx,dy) VALUES (2,4);

ERROR 1048 (23000): Column 'area_must_be_larger_than_10' cannot be null

As generated columns are virtual by default this costs no extra storage. Data volume is the same. The expression is evaluated when inserting or updating data.
If you add a validation column to an already existing table and want to verify all existing rows, you could define the validation column as STORED (instead of the default VIRTUAL). This will fail if there are any invalid rows in your existing data set. However in normal operation a virtual column seems more appropriate for performance reasons. So I recommend to always use VIRTUAL validation columns and check pre-existing data separately with a small procedure.

1 comment:

  1. Good 'hack' here for working around the lack of CHECK constraints in MySQL. And works - as the final case shows - even with expressions on multiple columns (sort of EXTENDED CHECK). I don't think it works with virtual columns in MariaDB unfortunately (https://mariadb.com/kb/en/mariadb/virtual-computed-columns/ does not list NOT NULL as valid syntax)

    ReplyDelete