Commit Priority for transaction Resources in Websphere!

PROBLEM

Recently we faced a situation where in DB commits which were issued before the JMS puts, were not actually reflecting in the database.

To elaborate further, consider this…

  1. A MQ message arrives on a Queue say Q1.
  2. Corresponding MDB(for Q1) gets the message and starts processing.
    1. Creates some DB records.
    2. Puts a MQ message with DB information, say primary key, on another queue Q2.
  3. MDB, for Q2, gets the message and starts processing.
    1. Fetches DB information as created in Step 2.a and FAILS with “ENTITY WITH KEY NOT FOUND IN THE DATABASE” ???????????

Where the hell did it go?

After some digging/retesting with enhanced logging we concluded that even though the commit to DB was issue before MQ put, because there was a DB failover which happened at the same time and since the DB was not available momentarily WAS proceeded with the next resource commit, which in this case happen to be the MQ PUT.

What was supposed to happen?

  1. A global transaction is started
  2. MQ resource branch …0000000000000000000000000001 (to GET a message from the Q1)
  3. Oracle resource branch …0000000000000000000000000002 (is added for the database work)
  4. MQ resource branch …0000000000000000000000000003 (is added to PUT a message to Q2)

 

  1. WebSphere TM prepares …0000000000000000000000000003(MQ.PUT)
  2. WebSphere TM prepares …0000000000000000000000000002(DB.COMMIT)
  3. WebSphere TM prepares …0000000000000000000000000001(MQ.GET)

 

  1. WebSphere Tm commits …0000000000000000000000000001(MQ.GET)
  2. WebSphere Tm commits …0000000000000000000000000002(DB.COMMIT)
  3. WebSphere Tm commits …0000000000000000000000000003(MQ.PUT)

What ACTUALLY happened?

  1. WebSphere Tm commits …0000000000000000000000000001 (MQ.GET)
  2. WebSphere Tm commits …0000000000000000000000000002 (DB) —-> Oracle commit delay, so ignored for now and move to the next resource.
  3. WebSphere Tm commits …0000000000000000000000000003 (MQ.PUT)
  4. WebSphere Tm commits …0000000000000000000000000002 (DB) -> retried/wait longer and success

The order would have been accurate when and only when there is no oracle commit delay caused by the DB data source. However if there is a commit delay/failover probably, the logic of the IBM TRANSACTION MANAGER is to temporarily skip it (it will retry/wait later) and move on to the next resource with the same commit order. ALTHOUGH WE ARE NOT ALLOWED TO DO SO, and to avoid even further delays with IBM support, we know for sure this logic because we’ve reverse engineered the IBM TRANSACTION MANAGER code. Because of this logic, if all the resources have the same commit order or no commit order specified, the commit steps that you will see in case of oracle commit delay is the following.

POSSIBLE SOLUTION

Specify a commit order on your resources. More on commit priority…

(http://www.ibm.com/support/knowledgecenter/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/cjta_rescom.html)

If you have specified a commit order as indicated earlier, the IBM TRANSACTION MANAGER will not skip the commit and not move on to the next resource commit, it will wait/retry until the DB is effectively committed.

  1. WebSphere Tm commits …0000000000000000000000000001 (MQ.GET)
  2. WebSphere Tm commits …0000000000000000000000000002 (DB) —-> Oracle commit delay, but wait and retry before attempting the commit on MQ.PUT data source

10. WebSphere Tm commits …0000000000000000000000000003 (MQ.PUT)

Advertisements