уторак, 6. децембар 2011.

Recreate primary keys as CLUSTERED (SQL Server 2008)

Let's say that you want to re-create all nonclustered PKs and make them clustered.
Here is stored procedure which will generate necessary DDLs.

Table that will store DDLs:


create table log_table
(
id integer identity,
t text
);



Stored procedure:


IF OBJECT_ID ( 'sp_fk_pk', 'P' ) IS NOT NULL
DROP PROCEDURE sp_fk_pk;
GO

CREATE PROCEDURE sp_fk_pk
AS

-- Declare variables to store data returned by cursor
DECLARE
@childColumnList varchar(2000),
@parentColumnList varchar(2000),
@childSchema varchar(100),
@childTable varchar(100),
@fkName varchar(100),
@parentSchema varchar(100),
@parentTable varchar(100);

-- clean log table
TRUNCATE TABLE log_table;


-- generate DDL to drop all foreign keys
INSERT INTO LOG_TABLE (t)
SELECT
'alter table ['+ SCHEMA_NAME(TB.schema_id)+'].['+OBJECT_NAME(F.PARENT_OBJECT_ID)+'] drop constraint ' + F.NAME +';'
FROM SYS.FOREIGN_KEYS AS F
JOIN sys.tables AS TB ON F.parent_object_id = TB.object_id
JOIN sys.tables AS FT ON F.REFERENCED_OBJECT_ID = FT.object_id;



/**
PRIMARY KEY
**/



-- declare the cursor to get all NONCLUSTERED indexes on primary keys
DECLARE dm_cursor_pk Cursor
FOR
select TABLE_SCHEMA, TABLE_NAME, name
from sys.indexes i, INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
where name=tc.CONSTRAINT_NAME
and type_desc='NONCLUSTERED';

-- open cursor
OPEN dm_cursor_pk

-- iterate over recordset
FETCH NEXT from dm_cursor_pk
INTO
@childSchema,
@childTable,
@fkName;

WHILE (@@FETCH_STATUS <> -1)
BEGIN
IF ( @@FETCH_STATUS <> -2 )

SET @childColumnList = NULL;
-- get list of columns that make PK
select @childColumnList = coalesce(@childColumnList + ', ', '') + column_name
from INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
where constraint_name=@fkName;

-- insert DDL for index recreation into log table
INSERT INTO LOG_TABLE (T) VALUES ('create unique clustered index ' + @fkName + ' on [' + @childSchema + '].[' + @childTable + '](' + @childColumnList + ') with drop_existing;');

-- iterate over recordset
FETCH NEXT from dm_cursor_pk
INTO
@childSchema,
@childTable,
@fkName;
END

-- close the cursor
CLOSE dm_cursor_pk

-- deallocate the space
DEALLOCATE dm_cursor_pk



/*
FOREIGN KEYS CREATION
*/


-- declare the cursor to get all foreign keys
DECLARE dm_cursor_fk Cursor
FOR
SELECT
distinct(F.NAME)
FROM SYS.FOREIGN_KEYS AS F
JOIN SYS.FOREIGN_KEY_COLUMNS AS FC ON F.OBJECT_ID = FC.CONSTRAINT_OBJECT_ID
JOIN sys.tables AS TB ON F.parent_object_id = TB.object_id
JOIN sys.tables AS FT ON F.REFERENCED_OBJECT_ID = FT.object_id;


-- open cursor
OPEN dm_cursor_fk

-- iterate over recordset
FETCH NEXT from dm_cursor_fk
INTO
@fkName;

WHILE (@@FETCH_STATUS <> -1)
BEGIN
IF ( @@FETCH_STATUS <> -2 )
SET @childColumnList = NULL;
SET @parentColumnList = NULL;

-- get list of columns that make FK
SELECT
@childSchema = SCHEMA_NAME(TB.schema_id),
@childTable = OBJECT_NAME(F.PARENT_OBJECT_ID),
@childColumnList = coalesce(@childColumnList + ', ', '') + COL_NAME(FC.PARENT_OBJECT_ID,FC.PARENT_COLUMN_ID),
@parentSchema = SCHEMA_NAME(FT.schema_id),
@parentTable = OBJECT_NAME (F.REFERENCED_OBJECT_ID),
@parentColumnList = coalesce(@parentColumnList + ', ', '') + COL_NAME(FC.REFERENCED_OBJECT_ID,FC.REFERENCED_COLUMN_ID)
FROM (SELECT * FROM SYS.FOREIGN_KEYS WHERE NAME=@fkName) AS F
JOIN SYS.FOREIGN_KEY_COLUMNS AS FC ON F.OBJECT_ID = FC.CONSTRAINT_OBJECT_ID
JOIN sys.tables AS TB ON F.parent_object_id = TB.object_id
JOIN sys.tables AS FT ON F.REFERENCED_OBJECT_ID = FT.object_id
ORDER BY SCHEMA_NAME(TB.schema_id), OBJECT_NAME(F.PARENT_OBJECT_ID);

-- generate DDL to recreate FK
INSERT INTO LOG_TABLE (T) VALUES ('alter table [' + @childSchema + '].[' + @childTable + '] '
+ ' add constraint ' + @fkName + ' foreign key (' + @childColumnList + ') references ' + '[' + @parentSchema + '].[' + @parentTable + '] '
+ '(' + @parentColumnList + '); ');
-- iterate over recordset
FETCH NEXT from dm_cursor_fk
INTO
@fkName;

END

-- close the cursor
CLOSE dm_cursor_fk

-- deallocate the space
DEALLOCATE dm_cursor_fk


GO




Invocation of the procedure:

exec sp_fk_pk



Check the DDLs:

select t from log_table order by id


Note: use generated DDLs at your own risk

Sample schema:

create table a (
id integer not null,
t varchar(10)
);
alter table a add constraint pk_a primary key nonclustered(id);

create table b (
id integer not null,
a_id integer
);
alter table b add constraint pk_b primary key nonclustered(id);
alter table b add constraint fk_b_a foreign key (a_id) references a (id);

create table c (
id integer not null,
t varchar(10)
);
alter table c add constraint pk_c primary key nonclustered(id);

create table d (
id integer not null,
id1 integer not null,
t varchar(10)
);
alter table d add constraint pk_d primary key nonclustered(id,id1);


create table e (
id integer not null,
d_id integer,
d_id1 integer
);
alter table e add constraint pk_e primary key nonclustered(id);
alter table e add constraint fk_e_d foreign key (d_id,d_id1) references d (id,id1);

петак, 26. август 2011.

Why select max(id) is wrong with PostgreSQL


Today I got complaints from ex-team leader about some stored procedure and "bad coding practice".

I'm going to elaborate problem a bit (it's classic in DB development). And it's real error, not bad coding practice ...

The stored procedure is part of API for a primitive queuing mechanism written in plpgsql for my ex company.
The queuing mechanism was written 5.5 years ago.


Problematic procedure contains code like this:

....
insert into table a (....)
select max(id) from a

....

So I did basic mistake: first insert into table, followed by immediate selection of max id.

Default isolation level for PostgreSQL is READ COMMITTED, so (theoreticaly) the same procedure invoked from the other transaction could commit between insert and select of the previous transaction moving MAX(ID) to a higher value.


Let's "construct" an example.

First, let's create two tables: table a will be main table which will store IDs and the second one, b, will store inserted ID (i.e. expected ID) and MAX(ID) (i.e. the ID we got).

create table a (id serial);
create table b (a_id integer, a_max_id integer);


Here is the store procedure:

create or replace function insert_into_a (_sleep_time numeric(12,4) )

returns integer as
$body$
declare
_id integer;
begin
-- I did small trick here: I prefetched nextval from a_id_seq (as I had to do 6 years ago! shame on me!)
-- to show the difference between what we excpected to get and what we got.
select nextval('a_id_seq') into _id;

-- ok, let's insert nextval
insert into a values (_id);

-- let's sleep for given number of seconds
perform pg_sleep(_sleep_time);

-- insert generated id and max id
insert into b select _id, max(id) from a;

return _id;

end;

$body$
language plpgsql volatile
cost 100;



The procedure invokes pg_sleep: handy procedure which can help you create race condition tests and emulate real execution.
pg_sleep accepts one numeric parameter which represents number of seconds. The parameter is double precision, so fractional-second delays can be specified.

OK, now we have tables and stored procedure: let's test the system.

First single transaction which sleeps 0 seconds:

milos=# select * from b;

a_id | a_max_id
------+----------
(0 rows)

milos=# begin;
BEGIN
milos=# select insert_into_a(0);
insert_into_a
---------------
1
(1 row)

milos=# commit;
COMMIT
milos=# select * from b;
a_id | a_max_id
------+----------
1 | 1
(1 row)


OK, expected result.


Now real action.

Transaction T1

milos=# begin;

BEGIN
milos=# select insert_into_a(100);
....

100 seconds: plenty of time to create second transaction which will screw our final result.


Transaction T2

milos=# begin;

BEGIN
milos=# select insert_into_a(0);
insert_into_a
---------------
3
(1 row)

milos=# select * from b;
a_id | a_max_id
------+----------
1 | 1
3 | 3
(2 rows)

milos=# commit;
COMMIT


So far, so good.


Back to committed transaction T1:

 insert_into_a 

---------------
2
(1 row)

milos=# select * from b;
a_id | a_max_id
------+----------
1 | 1
3 | 3
2 | 3
(3 rows)


HA!
Last pair (2,3) clearly shows "bad coding practice" :)

So, the proper way is to prefetch the ID using nextval('a_id_seq') and than to re-use it using currval('a_id_seq').

OK, I agree that this example is fake: we achieved with pg_sleep something that cannot be done with real procedure, but still we have race condition, and sometimes it can produce error which can be hard to trace and find (maybe it didn't fail in 5.5 years but ....).