Transaction Linearization¶
Experimental feature
Please be advised that this is a very new, experimental feature. The soundness of the approach or the implementation has not been fully vetted yet.
For most common cases, transaction isolation levels like
serializable
are sufficient to avoid
problems with concurrent transactions.
There are, however, cases where this may not be enough and Postgres won't be tracking complex dependencies in your operations provided they don't form so-called dangerous structures.
When transaction T1 writes to tables Rw are logically dependent on reads from Rr, Postgres won't be able to make detect a conflict with another transaction T2 that may have written to Rr after T1 read from it if T2 would commit changes sooner than T1. This may result in a state that is logically inconsistent if operations on entities are deemed unrelated.
For this, we introduce a limited, experimental tooling for linearizing transaction.
To quote Jepsen
Linearizability is one of the strongest single-object consistency models, and implies that every operation appears to take place atomically, in some order, consistent with the real-time ordering of those operations: e.g., if operation A completes before operation B begins, then B should logically take effect after A.
To facilitate such a mode, we introduce a few rules (SLT: serializable, linearized transaction):
- Any write in SLT1 that happens after another SLT0 read from the same relation, must result in a serialization linearization failure in either transaction.
- Any commit in SLT1 that wrote into a relation that happens after another SLT0 read from the same relation, must result in a serialization linearization failure in either transaction.
- Any commit in SLT1 that wrote into a relation after any SLTn has obtained a snapshot, must result in a serialization linearization failure in either transaction, provided SLTn will read from this relation.
In all cases, false failures are possible and transaction retrying strategy is advisable.
Motivating example¶
Let's imagine an oversimplified financial account system.
erDiagram
accounts {
id serial
closed_on timestamptz
}
accounts ||--o| account_transfers : has
account_transfers {
time timestamptz
debit_account_id int
credit_account_id int
amount bigint
}
Want to see the schema?
Importantly, accounts may be closed at some point. After the closure, new transfers must not occur.
Let's create some accounts:
It is only natural to employ serializable transaction for modelling financial domain. Let's start moving the money.
-- [[T0: first transaction]]
begin transaction isolation level serializable;
--- Transfer some more money, check that the account is not closed
with allowed_accounts as (select *
from accounts a
where closed_on is null)
insert
into account_transfers (debit_account_id, credit_account_id, amount)
select id, 2, 100
from allowed_accounts
where id = 1;
--- Now there's a transfer in `account_transfers`
Without commiting this transaction, we now execute another transaction that closes the account:
--- [[T1: another transaction]]
begin transaction isolation level serializable;
update accounts
set closed_on = statement_timestamp()
where id = 1;
commit;
And only then we commit the original transaction:
--- first[[T0]]
--- Wait for some time – doing something important
select pg_sleep(5);
--- Done!
commit;
What happened here?
Both transaction committed successfully, but if we look closely, we'll see that for practical reasons, we have a transfer recorded after the account was closed. The administrator closed the account, saw the response but after that, a new transfer appeared. Now, we have a closed account with a non-zero balance.
Is something wrong with serializable transactions? Well, no. They simply guarantee a serial order of execution. Not necessarily the one that was observed.
If we redo the same with linearized transactions:
-- [[T0: first transaction]]
begin transaction isolation level serializable;
--- Linearize
select omni_txn.linearize();
--- Transfer some more money, check that the account is not closed
with allowed_accounts as (select *
from accounts a
where closed_on is null)
insert
into account_transfers (debit_account_id, credit_account_id, amount)
select id, 2, 100
from allowed_accounts
where id = 1;
--- Now there's a transfer in `account_transfers`
And then the other transaction:
--- [[T1: another transaction]]
begin transaction isolation level serializable;
--- Linearize
select omni_txn.linearize();
--- Close the account
update accounts
set closed_on = statement_timestamp()
where id = 1;
commit;
--- ERROR: linearization failure
--- DETAIL: transaction 747 has a predicate lock on `accounts`
Now, we observed a serialization linearization failure! In this particular case, we observed that another transaction already tried to read accounts (to see if the account was closed) and, out of abundance of caution, we should retry or fail.
And we can now commit the first one!
--- first[[T0]]
--- Wait for some time – doing something important
select pg_sleep(5);
--- Done!
commit;
The above example is not representative of a realistic system, but is only meant to illustrate want kind of conflicts one can guard for. In this particular example, it shows how it prevented the invalid state of a closed account with a balance.
Quick start¶
To linearize, you must be in a serializable transaction first. After that, all you need to do is invoke the following function:
select omni_txn.linearize();
-- to check if we're in a linearized transaction
select omni_txn.linearized();
--#> t
It will make current transaction linearized. It will start to intercept all mutating statements like INSERT, UPDATE, DELETE, and MERGE.
Should a linearization failure occur, it will raise a serialization error exception with particular details of the failure.
This is compatible with omni_txn.retry
primitive, allowing to build effective
mechanisms for handling such constraints.