петак, 26. август 2011.

Why select max(id) is wrong with PostgreSQL


Today I got complaints from ex-team leader about some stored procedure and "bad coding practice".

I'm going to elaborate problem a bit (it's classic in DB development). And it's real error, not bad coding practice ...

The stored procedure is part of API for a primitive queuing mechanism written in plpgsql for my ex company.
The queuing mechanism was written 5.5 years ago.


Problematic procedure contains code like this:

....
insert into table a (....)
select max(id) from a

....

So I did basic mistake: first insert into table, followed by immediate selection of max id.

Default isolation level for PostgreSQL is READ COMMITTED, so (theoreticaly) the same procedure invoked from the other transaction could commit between insert and select of the previous transaction moving MAX(ID) to a higher value.


Let's "construct" an example.

First, let's create two tables: table a will be main table which will store IDs and the second one, b, will store inserted ID (i.e. expected ID) and MAX(ID) (i.e. the ID we got).

create table a (id serial);
create table b (a_id integer, a_max_id integer);


Here is the store procedure:

create or replace function insert_into_a (_sleep_time numeric(12,4) )

returns integer as
$body$
declare
_id integer;
begin
-- I did small trick here: I prefetched nextval from a_id_seq (as I had to do 6 years ago! shame on me!)
-- to show the difference between what we excpected to get and what we got.
select nextval('a_id_seq') into _id;

-- ok, let's insert nextval
insert into a values (_id);

-- let's sleep for given number of seconds
perform pg_sleep(_sleep_time);

-- insert generated id and max id
insert into b select _id, max(id) from a;

return _id;

end;

$body$
language plpgsql volatile
cost 100;



The procedure invokes pg_sleep: handy procedure which can help you create race condition tests and emulate real execution.
pg_sleep accepts one numeric parameter which represents number of seconds. The parameter is double precision, so fractional-second delays can be specified.

OK, now we have tables and stored procedure: let's test the system.

First single transaction which sleeps 0 seconds:

milos=# select * from b;

a_id | a_max_id
------+----------
(0 rows)

milos=# begin;
BEGIN
milos=# select insert_into_a(0);
insert_into_a
---------------
1
(1 row)

milos=# commit;
COMMIT
milos=# select * from b;
a_id | a_max_id
------+----------
1 | 1
(1 row)


OK, expected result.


Now real action.

Transaction T1

milos=# begin;

BEGIN
milos=# select insert_into_a(100);
....

100 seconds: plenty of time to create second transaction which will screw our final result.


Transaction T2

milos=# begin;

BEGIN
milos=# select insert_into_a(0);
insert_into_a
---------------
3
(1 row)

milos=# select * from b;
a_id | a_max_id
------+----------
1 | 1
3 | 3
(2 rows)

milos=# commit;
COMMIT


So far, so good.


Back to committed transaction T1:

 insert_into_a 

---------------
2
(1 row)

milos=# select * from b;
a_id | a_max_id
------+----------
1 | 1
3 | 3
2 | 3
(3 rows)


HA!
Last pair (2,3) clearly shows "bad coding practice" :)

So, the proper way is to prefetch the ID using nextval('a_id_seq') and than to re-use it using currval('a_id_seq').

OK, I agree that this example is fake: we achieved with pg_sleep something that cannot be done with real procedure, but still we have race condition, and sometimes it can produce error which can be hard to trace and find (maybe it didn't fail in 5.5 years but ....).