db2 performance: 2012

An Encounter with chain of lockwait!!!

I have been aways from blogging for quite a long time, the office hours have been really hectic and then some personal engagements. The weather is getting awesome with Maunsoon approaching.

So here I am back with all my energy to share some piece of knowledge which I myself acquired during this hectic office schedule. So let me start with "a story'.

There was some part of application which was facing lock timeout issues, people blamed the application code and tried to figure out/locate the piece of code which may be causing this, no one could succeed. Later it could be observed in other places as well. Now it was more important to find out the set of queries which may be causing this.

To find out details of lockwaits you can query the view SYSIBMADM.MON_LOCKWAITS to get details of locks which may be causing lockwaits...
These details though will be only level one, and will give only immediate blocked query, if the actual query which is suffering is in-turn dependent on this query, you will have to again query the same view. There are very good chances that by the time you get to second level of query the query would have either committed/forcefully rolledback (in case of a deadlock victim) or timed out.

So I decided to write a piece of SQL which can give me details of the lock and chain of lock wait.
So I wrote a view (please download the definition from here).

So I tried to test the dead lock scenario. I created below tables:

Tables : TEST and TEST2

create table test2 (col1 int, col2 int);
create unique index ind_test2 on test2 (col1);
commit;

create table test (col1 int, col2 int);
create unique index ind_test on test (col1);
commit;

Once the tables were created, I fired some queries from three different session so that each one would block other:

Session1	Session2	Session3
insert into test2 (col1 , col2) values (1,2);	insert into test (col1 , col2) values (1,2); insert into test2 (col1 , col2) values (1,2);	insert into test (col1 , col2) values (1,2);

Then from a forth session I fired "select * from lockChain_INFO"
Which gave an output similar to below one:

Then from session1 I fired one more query which was:
insert into test (col1 , col2) values (1,2);

After this when I again queried the same view I got below result:

These are insert statements and are static ones, and can easily give you the cause of locking (i.e. clashing index keys). Imagine if you are trying to update the same row from two session (and that in application which uses dynamic query so you are not sure of the row).

To help you there is a function available which will provide you the lock info: MON_FORMAT_LOCK_NAME

SELECT SUBSTR(NAME,1,20) AS NAME, SUBSTR(VALUE,1,50) AS VALUE FROM TABLE(MON_FORMAT_LOCK_NAME('LOCK_ID')) as LOCK_DTL;

when lock_id is BLOCKING_LOCK received from view. This query gave descriptive info about the lock as in which row_id and which table etc:

This granular info helped me figure out that the issue was not actually in application or database, there was an issue out of configuration created by the client, which forced usage/update of same row for multiple purpose.

Once this issue was identified, we suggested configuration changes which smoothed the things.

hope this will be equally useful for you guys! :-)

I had worked on couple of more similar snapshot views, which I will be sharing in blogs to come!

Download View:

lockChain_INFO

DB2 10 is here: What's new? The performance aspect!!!

Hi guys... So DB2 10 is here and is available for download...

I was involved in DB2 10 beta EAP and you can listen to my podcast with industry experts (which quite an honor)...

Well lets see what's new in v10 which can help in improving product performance...

It has an improved index scan which is termed as jump scan, for people from Oracle world this is similar to skip scan... I tried this out during EAP and have seen significant improvement in CPU timings and if you have wisely chosen your tablespace sizing e.g. extentsize and pagesize I/O also will decrease... This feature is specifically helpful if you have an index on two columns and you have queries on second key of the index... This wisely chooses the index keys which shall be evaluated for the match!

Second improvement is in terms of I/O, Smart Prefetching... This is particularly helpful for data spread over a highly clustered disk or if index pages have low density... In such cases it can automatically switch from sequential to readahead (a new type of prefetching) prefetch... This is more efficient and automatically switched to readahead if optimizer see that sequential prefetch is wasting I/O...

There are a lot of new SQL optimization enhancements, these are automatically taken care by optimizer when needed... For example now hash joins will be used by optimizer even if data types do not match, Early Aggregation of data, early removal of distincts, improvement in estimation of performance cost... There are various new indicators introduced in EXPLAIN_TABLES and EXPLAIN info generated by db2expln, which shall indicate if any of these features have been used or now... More on that latter...

Another good improvement is that RUNSTATS will now support index sampling, which shall avoid the need to scan the whole index for collecting statistic... Also, the new parameter auto_sampling will enable sapling for all background statistics gathering... Another improvement from statistics gathering perspective is the Automatic Stats gather for Statistical view when you set auto_stats_view... Talking of statistical view reminds that there are few other improvements e.g. now actual statistics from expression columns will be respected for generation of a more apt plan...

There are some improvements in intra-parallelism for query... Most catchy one is the ability to do parallel scans on range partitioned tables and indexes... Also the intra parallelism can be controlled from application by calling ADMIN_SET_INTRA_PARALLEL... Also degree of parallelism now can be controlled for a particular workload by using MAXIMUM DEGREE option in ALTER WORKLOAD...

There is an important improvement in STORAGE and I/O i.e. introduction of STORAGE GROUPS... Earlier till v9.7 if I was chosing ASM tablespaces, I had to loose on distribution of I/O... i.e. all the tablespaces with ASM were created in the path designated for automatic storage (which is provided at the time of DB creation or later by alter DB)... But now one has to create storage groups and attach one or more storage path to a group... At the time of tablespace creation one will have to attach them to a storage group... So now I can create all my ASM tablespaces on different disks... So if you have already read my previous posts on tablespace design (1, 2, and 3) then you will know how good is this feature...

I will wind up this post now and may be will try to put up all these features in little detail...

Till then keep optimizing...

So all those people who are running on v9 or prior should definitely plan to migrate to this new version...

Oh yeah, the compatibility as well has been improved and is as good as 98%... well do not trust me??? read for yourself this post by Serge Rielau...

Other posts/blogs regarding the new release:

Serge Rielau has done a series of post on new features... Follow him on twitter to read more...

DB2Night Show hosted by Scott Hayes has done an episode on same and there is a sequel to follow...

A few tips for faster Bulk Refresh of a Table

So I am back with some tips to improve insert performance... I am wondering why I did not choose this for my 2nd chance @ db2night show's DB2 has talent...
Well a lot has happened in DB2 world meanwhile... There was a new LUW release and I shall be soon writing some stuff about that..
So there are various scenarios when you have to do a bulk insert and I too had to deal with one of those... The story and various solutions which came to me may bore you, but stick with it so that you know what you should not do and I will share some piece of code at the end... :)

There are obviously other tips for improvement of bulk inserts or any type inserts but those are more to do with your DB CFG/ DB physical design... i will here share more on how you can handle it programatically...

So there was this very complex query which read data from 6-7 tables all of which had around 4-5 million of records... With all the filters and everything the record returned by the query was around 6-7million... Obviously you can not process this chunk of data at once at your server and hence you pagenate the data and read only in bits... But again reading page by page means the view query is fired again and again and then sorting also happens for every new page read...
So a better way is to either use a MQT or a just a table which is loaded with this data and rowsequence and then you can read in bits based on rowsequence... We had used the table approach in our scenario...

I believe usage of MQT is good when you do not have to access MQT directly and your queries written over under-lying tables automatically use MQT...
But the problem was the problem was the insert of data in the temporary table was taking a huge amount of time... Also we had indexes on a few columns which were accessed during processing... This is was not a single case and there were few more cases of such processes which did bulk delete and bulk insert which were slow... Moreover these temporary tables were to be marked/updated so that one know how the failure recovery should happen in case the process is Aborted/terminated abnormally... Another problem which complicated it, whatever solution I had to give should be generic and should work on Oracle and SQL Server as well (well that comes with my job)...

Someone suggested that one can create a MQT for the query which populates data in the temporary table (and counter parts Materialized View and Indexed in Oracle and SQL Server respectively)...
The problem was update of MQT counterparts in Oracle and SQL Server is either difficult/not possible/require special license...

So the suggestion was extended to use this MQT (and MV and IV in Oracle and SQL Server) to load data in temporary table...
But I discarded it...
You first load 7million records in one table (MQT is essentially a table) and then load a table from this table... As a DBA you are doing your best to screw yourself down the line...

So I thought if MQT is also maintained as a Table and DB2 internally populates data when doing a full refresh, so lets try to think how DB2 would be optimizing this process...

So I thought
1) May be it truncates the table instead of deleting the records...
So i created a stored proc to truncate the table but retain the storage (I can't just use the truncate statement cause I had to support Oracle as well, where syntax is different, BTW anyone has an idea why DB2 mandates the use of IMMEDIATE keyword?)

CREATE OR REPLACE PROCEDURE TESTPROC
( TABLENAME VARCHAR(30), SCHEMANAME VARCHAR(15))
LANGUAGE SQL
BEGIN
     DECLARE STATEMENT1 VARCHAR(1024);--
     SET STATEMENT1='SET CURRENT_SCHEMA=' || SCHEMANAME; --
     EXECUTE IMMEDIATE STATEMENT1;--

     set CNT=1;--
     SET STATEMENT1='TRUNCATE TABLE ' ||
                      TRIM(SCHEMANAME) ||
                       '.' ||
                       TRIM(TABLENAME) ||
                       ' REUSE STORAGE IMMEDIATE';--
     commit;--
     EXECUTE IMMEDIATE STATEMENT1;--
     commit;--
END;

But this still resolved issue of deletion... Inserts were still taking alot of time... So second thought which came to my mind was...
2) May be it will drop the indexes on MQT before populating data(disable index is a requirement which DB2 team should start looking at)

So modified the above stored proc...
CREATE OR REPLACE PROCEDURE TESTPROC
( TABLENAME VARCHAR(30),  SCHEMANAME VARCHAR(15), COLUMNLIST VARCHAR(1024), INSERTVAL VARCHAR(1024))
SPECIFIC TESTPROC
LANGUAGE SQL
BEGIN
  DECLARE STATEMENT1  VARCHAR(1024);--
  DECLARE STATEMENT2 VARCHAR(2048);--
  DECLARE INDEX_STMT WASADMIN.VARCHAR_ARR;--
  DECLARE CNT INTEGER;--

  SET STATEMENT1='SET CURRENT_SCHEMA=' || SCHEMANAME; --
  EXECUTE IMMEDIATE STATEMENT1;--

-- Truncate the table before loading/refreshing with new data
  set CNT=1;--
  SET STATEMENT1='TRUNCATE TABLE ' || TRIM(SCHEMANAME) ||  '.' || TRIM(TABLENAME) || ' REUSE STORAGE IMMEDIATE';--
  commit;--
  EXECUTE IMMEDIATE STATEMENT1;--

-- Get a copy of index creation statements...

  FOR CR1 AS a CURSOR
  WITH HOLD
  FOR
     (SELECT INDNAME FROM SYSCAT.INDEXES WHERE TABNAME=TABLENAME)
  DO
    SET INDEX_STMT[CNT]='CREATE INDEX ' ||
                          CR1.INDNAME ||
                            ' ON ' ||
                              TABLENAME ||
                                  '( ';--

    FOR CR2 AS a CURSOR WITH HOLD FOR
    (
        SELECT I.INDNAME,
        IC.COLNAME ||
            CASE WHEN  IC.COLORDER='A' THEN ' ASC ,' ELSE ' DESC ,' END AS COL
        , IC.COLSEQ
        FROM SYSCAT.INDEXES I JOIN SYSCAT.INDEXCOLUSE IC ON I.INDNAME=IC.INDNAME
        AND I.INDSCHEMA=IC.INDSCHEMA
        AND I.INDNAME=CR1.INDNAME
        AND I.TABNAME=TABLENAME
        ORDER BY IC.COLSEQ
    )
    DO
      SET INDEX_STMT[CNT]=INDEX_STMT[CNT] || CR2.COL ;--
    END FOR;--

    set INDEX_STMT[CNT]=SUBSTR(INDEX_STMT[CNT],1,length(INDEX_STMT[CNT])-1);--

    set INDEX_STMT[CNT]=INDEX_STMT[CNT] || ')';--

    SET CNT=CNT+1;--

-- Drop the index once its definition is stored...

    set STATEMENT1='drop index ' || CR1.INDNAME;--
    EXECUTE IMMEDIATE STATEMENT1;--
    commit;--
  END FOR;--


-- Once definition is captured for all indexes (and they are dropped) insert the data in the table using the
             -- Column list provided in the arguement
             -- and values
              --(which can be VALUES (ValueFor_Column1, Val_ForColumn2, .... Val_forColumnN)
              --or can be a select stmt

  set STATEMENT2='INSERT INTO '
                      || trim(SCHEMANAME) ||
                           '.' || trim(TABLENAME) ||
                               ' ( ' || COLUMNLIST ||
                                      ' ) ' ||
                                        INSERTVAL;--
  EXECUTE IMMEDIATE STATEMENT2;--
  commit;--

-- Now re-create all the indexes...
  FOR CR3 as a CURSOR
    WITH HOLD
    for
      ( SELECT T.STMT FROM UNNEST(INDEX_STMT) WITH ORDINALITY AS T(STMT,ID))
  DO
    EXECUTE IMMEDIATE STMT;--
  END FOR;--
END;

I could have used files to store the index creation instead of variable, but i wanted to make it generic so that it works even when new indexes are added and wanted to avoid I/O.

Now a tip to ensure that for such tables, when cardinality shoots high during a bulk insert, you can mark them as VOLATILE by

"ALTER TABLE TABLNAME VOLATILE"

Further performance gains can be achieved if you are ready to compromise on recovery of the table being loaded. In that case you can mark the table not logged just before the insert:

"ALTER TABLE TABLENAME NOT LOGGED INITIALLY"

If anyone can help me figure out what other optimizations are done by MQT for refreshing a table, I can try to write a piece of code for that as well...

Keep optimizing...

Virtual Indexes: My 2nd Chance in DB2 Has talent 2012

I will share the story of my 2nd chance at DB2 has talent... A better experience and managed the time well... But everyone was at their best in that episode and for a moment I thought have a got a wildcard entry in final round of DB2 has talent... Lot of people discussed the interested topics and a few discussed some features/tools which are not well documented... I too had chosen a topic which is not much discussed: "Virtual Indexes"... A way by which you can evaluate your indexes without even creating those... So let me start with it (and I will keep a check on my typos)...

So how many times you have faced an issue of poor performing queries and you feel probably adding an index will help improve your queries performance... But then you realize one index will not be sufficient for all the queries and you are wondering which columns to be combined for indexing... The permutation combination is generally good enough for you to not try actual creation of indexes as that will take some time and if it is not good you need to drop and start over...

So I made up a story... I had a table design, pretty simple one...

Three tables, two of those mapped to each other in the third table by their key columns... The mapping table has a self referencing relationship of parent-child...

I had a query which was being executed on the tables, again a very simple query... Join between 2 tables through the third table and then the third table was also involved in a self-join... But still the query was taking a lot of time to be executed... I guessed probably the indexes are missing and though of using index Adviser...

I could have achieved the same by setting my current explain mode=advise indexes and then run the queries and see what are the new indexes advised in ADVISE_INDEX table... But I was surprised to see the result in the table...

I was confused, every other column in my select list or in my where clause made a place in one or the other advised index...

I was not able to figure out which index, I shall use and which one I shall ignore... The major problem was creating all these indexes will slow down performance for my CRUD operations... A dba won't have all the time in this world to try out all the indexes in every possible permutation and go into the loop of "create index"-"generate plan"-"check usage"-"drop index"-"repeat"...

I was wondering if there is a way to force optimizer to assume as if an index is there and then generate a plan using the index if possible... Then I came across an option in db2advis and a value for "current explain mode"...

Once you have got the index suggestions, you can set your current explain mode to evaluate indexes and then run the queries which you think will be benefited by suggested indexes... then use db2exfmt to format the explain info... Alternatively using -p option in db2advis you can force the adviser to store explain info for the suggested changes... Then use db2exfmt you can generated the explain plan and see if suggested indexes are used or not... But I was not satisfied with the plan generated for any of those suggestions, specially when I ran the update queries for those tables in "evaluate indexes" as current explain mode, I was more convinced not to create all of them...

I took a backup of suggested indexes and then deleted the rows from ADVISE_INDEX and repopulated them from backup in various permutations to check which combination is most effective but none of them were as convincing... So I decided to insert my own rows in ADVISE_INDEX indexes i thought will benefit the select queries without having much impact on updates and inserts...

Something which I could not discuss during the presentation is I had a some challenges in trying out my own indexes, cause initially the optimizer ignored the indexes I imagined (or virtually created)... Then I had to work out the proper values for NLEAFS i.e. no of leaf nodes for your index tree (roughly no of distinct values in your column combination), NLEVELS means no of level in your index tree (is one of most difficult parameters to decide), FULLKEYCARD i.e. distinct key values in your index (a little easy compared to other a select distinct should do), FIRSTKEYCARD/FIRST2KEYCARD/FIRST3KEYCARD are for cardinality in first column, 2nd column and 3rd column in your index (again a little easy)... NEVELS can be decided by equation 2^x=no of distinct values where x is nlevels... SYSSTAT views can help you in deciding these parameters... COLCOUNT defines the no of column in the index...COLNAMES will define the columns which are part of index... INDEXTYPE (CLUS/REG/DIM/BLOK ) defines the type of index...

Finally with a little struggle and little time I could decide the best indexes for my tables... On exploring more about this feature I realized that I can virtually drop an index and check how the query performance is impacted before I actually drop it... All I have to do is insert a row for an existing index USE_INDEX column set to ‘N’ and EXISTS as ‘Y’...
I wrapped up with putting a slide for advantages of this (I will just put the slide here and will not be explaining)...

To end this blog I will just mention that my DB2 z/OS folks can also use the same feature DSN_VIRTUAL_INDEXES ... Oracle user can also try the feature of Virtual indexes by using NOSEGMENT clause in index creation... I will sign off this blog entry... Might take a few days to return, I have to enhance my skills on HADR and DB2 Internals (process and memory model)... Might have a short blog for insert enhancements and piece of code that you can probably re-use...

My Experience @ DB2 Has Talent: DB Design Basics

Hi people, I recently participated in DB2 has talent and it is still going on [now am out of it :( ]...
Make a point to register for rest of the episodes of this, I assure you of great DB2 tips...
I will be sharing my first presentation for the show!!! Which was about "db physical design"... I will avoid putting much text here, as this is more or less based on my blog series "Optimize your Tablespaces and Bufferpools"... You can check them here

Optimize your Tablespaces and Bufferpools-1: Tablespace Design design

Optimize your Tablespaces and Bufferpools-2: Tablespace Sizing

Optimize your Tablespaces and Bufferpools-3: Bufferpool Optimization

I could not complete the presentation for obvious reasons (it was a bad choice, a topic you have yourselves spread over 3 blogs)... Additionally there were too many typos (which I do quite often in my blogs as well)... You can read Susan's blog to get more specific presentation tips!!! I will share the slides I had used for presentation...

Be critical of what you did n be Logical to what you gonna do

This is going to be an offbeat post!!! You might not find a very good technical tip in this but I hope this will help you improve your attitude towards performance tuning...

Nothing n trust me nothing can subsidize a poorly designed application...

I will just discuss some basic mistakes which people commit, which should be revisited:

1) indexing
If you have not indexed properly you should not expect too much out of a processor even if it is say 10GHz!!! You have to live with skewed I/O

2) Placement of Tables
If you have not segregated your tables as per their nature and volume of data, using too many container is not going to help you

3) Level of Normalization
Some redundancy won't ruin your application esp if the data is write once, read always... I know this is going to be debated...
I heard someone saying "It is better to have something which works rather than something Architecturally correct and does not work!!!"

4) access queries
do you expect a magical h/w solution to improve performance of a query "select * from my_Blogs where upper(Title) like '%CRITICAL%LOGICAL%' "
No hardware can help you scale when actual problem is with your Application and Database design...

5) Designing at a per-mature phase in Development Life cycle
An even more basic problem... You did not care to understand business, and screwed up the db design and now you have to fire a count(*) to get count of records for a blog reads from a given ACCOUNT, if you knew business may be you had planned this better...

6)improper molding of business scenarios in your architecture
Business defines requirements and not other way round... As a DBA understand business and it is absolutely normal to question business requirements...
My user requirement might ask me for a new functionality, but I need to see how it fits in my current solution and its architecture...
Just do not try to solve a problem with an advanced feature, which could have been solved in a much simpler way... For Instance many people think that Partitioning or MDC (on a column which has even distribution of data) is gonna help them get an awesome performance for tables which has huge volume of data...

7)Get out of conventional notions...
I have seen well experienced people thinking that Stored Procs will perform better compared to a prepared statement wrapped in a java code... or application caching will always help... These are myth... Stored procs are definitely preferable in certain instances, but not always... Putting too much load on application cache leaves too less memory with your application for itself and for any data which is not cached and will be read online...

Please guys, these were just examples and I do not want people to carry the notion that I have committed these blunders...

I really pity people who think that performance is more of an infrastructure and hardware issue... Or going for some advanced features of a s/w will help them... If you are suggesting these to your clients, BEWARE your client will curse you first for providing a bad solution and second for increasing its IT cost to support a poorly designed application...

I hope this post will help think more rationally... So when you hit a performance bottleneck, be critical of what you have already developed and be logical about what solution you are going to propose...

db2 performance

About Me

Connect to me