Logging Tables, Without the Mess

Logging tables are usually at the periphery of a design effort, sometimes added as an after-thought. Usually, a logging table has a large number of rows and many optional columns, most containing NULLs. In addition, because ALTER TABLE statements become expensive in a production system, these tables often include some kind of a serialized array (JSON, CSV) for all the optional fields. In otherwise well engineered systems, these logging tables stick out as an embarrassing mess.

A logging table may look something like this one, a few mandatory and many optional (1,2,3…X) columns.

One way to tackle the problem is to design away storage of nulls by vertically partitioning the table  quite efficient if the DB supports join elimination, also known as table elimination.

Join elimination is a feature of the query optimizer, which excludes tables from the execution plan  provided they are not needed to obtain the result set.

To illustrate join elimination, suppose we have a User table with an optional UserHeight column, expected to be NULL for most of the records.

Storage of nulls can be designed away; in this example, only records with known UserHeight have a row in the User_2 table.

So, if we have a query,

select
  FirstName
, LastName
, UserHeight
     from User_1 as a
left join User_2 as b on b.UserID = a.UserID ;

then the query plan looks like:

If a query does not need any columns from the second table, like:

select FirstName
     from User_1 as a
left join User_2 as b on b.UserID = a.UserID ;

then the plan does not include the second table.

It is important to note that the first table is always included, for example:

select UserHeight
     from User_1 as a
left join User_2 as b on b.UserID = a.UserID ;

The above uses only columns from the second table; however, the plan is the same as in the first case.

It may not be obvious that this feature properly resolves views, for example:

create view vUser as
select
    a.UserID as UserID
  , FirstName
  , LastName
  , UserHeight
     from User_1 as a
left join User_2 as b on b.UserID = a.UserID ;

So, for:

select
  FirstName
, LastName
, UserHeight
from vUser ;

the result is as expected:

And, if there are no columns from the second table in the select list:

select FirstName
from vUser ;

then only the first table is included in the execution plan.

The principle may now be applied to the logging table problem. Instead of the first logging example, we could use something like:

and create a view:

create view vLog as
select
    a.LogID as LogID
  , LogTime
  , LogCategory
  , Optional_1
  , Optional_2
  , Optional_3
  , Optional_X
from Log as a
left join Log_1 as a1 on a1.LogID = a.LogID
left join Log_2 as a2 on a2.LogID = a.LogID
left join Log_3 as a3 on a3.LogID = a.LogID
left join Log_X as aX on aX.LogID = a.LogID ;

For example, if a query requires only Log and Log_3 tables then:

select
    LogID
  , LogTime
  , Optional_3
from vLog ;

The plan includes only these two tables.

To add a new column (property), the following has to be done:

  1. Create a new table with LogID and the new column (not very expensive to create an empty table).
  2. Modify the view to include the new table (fairly simple).
  3. Modify the logging code (has to be done in any case).

For this technique to work, the DB optimizer must support table elimination, and the number of optional tables should be manageable. What is manageable? My guess would be something like 10, 20, 50  simply try it out. All newer versions of MS SQL Server, Oracle, DB2, PostgeSQL support the feature, but MySQL 5.x still does not; MariaDB does.

About Damir Sudarevic

Data-warehouse Guy
digs data, likes cats.