Deadlock in Real World

May 4, 2010 by

Last year, I was assigned to handle some defects that would occur only when the system was being in high load. Those defects were apparently the result of some concurrent problems both in third party API and my own module. The characteristic of the problems is quite interesting because it features a couple well known problems that have already been described in many programming articles. What I like the most about these problems is that it evolved from a simple exception to a serious deadlock by just adding a few line of code. I will log about the problems here hoping that it could be any useful for other developers.

Again, like all of my programming posts, I can’t show you the real production code so I will show you an example that will simulate the problems. The usage scenarios and the nature of the problem will be just the same.

Let’s say I have an API for sending/receiving message over network. This API has been designed to be asynchronous by nature. Users can send a message to subscribe for a type of data and gets update of the data from server until users decide to unsubscribe.

public class AsyncTransport {

    private final Vector subscription = new Vector();
    private final MockUpdateMsgGenerator idGen = new MockUpdateMsgGenerator();
    private volatile DispatchingThread disp;
    private volatile boolean stop;

    public void connect(String host, String port) {
        //Mock implementation don't actually connect to any thing.
        disp = new DispatchingThread();
        disp.start();
    }

    public Subscription subScribe(Message request, AsyncMsgListener listener) {
        String subID = idGen.genIDforMessage(request);
        Subscription sub = new Subscription(subID, listener);

        subscription.add(sub);
        sendMessage(subID, request);

        return sub;
    }

    private void sendMessage(String subID, Message request) {
        //Mock implementation, just add the subID to MockUpdateMsgGenerator
        //so it can generate mock update for this subscription.
        idGen.addActiveID(subID);
    }

    public Update readNextUpdateFromNetwork() throws InterruptedException{
        //Read the next mock update
        return idGen.genMockUpdateFromSubIDList();
    }

    public void unsubScribe(Subscription sub) {
        subscription.remove(sub);

        //Tell MockUpdateMsgGenerator not to generate mock update for this id.
        idGen.removeActiveID( sub.getSubscriptionID() );
    }

    public void stop() {
        stop = true;
        this.disp.interrupt();
    }

    class DispatchingThread extends Thread {

        public DispatchingThread() {
            super("AsyncTransport Dispatching Thread");
        }

        public void run() {
            try {
                while (!stop) {
                    Update update = idGen.genMockUpdateFromSubIDList();
                    for (int i = 0; i < subscription.size(); i++) {
                        ((Subscription) subscription.get(i)).notifyIfSubscribeFor(update);
                    }
                }
            } catch (InterruptedException ex) {
                //Thread stop
            }
        }
    }//DispatchingThread
}

class Update {

    private final String ID;
    private final Message msg;

    public Update(String ID, Message msg) {
        this.ID = ID;
        this.msg = msg;
    }

    public String getSubscribtionID() {
        return ID;
    }

    public Message getMessage() {
        return msg;
    }
} 

public class Subscription{
    public final String ID;
    public final AsyncMsgListener listener;

    public Subscription(String ID, AsyncMsgListener listener) {
        this.ID = ID;
        this.listener = listener;
    }

    public String getSubscriptionID(){
        return this.ID;
    }

    public void notifyIfSubscribeFor(Update update){
        if( update.getSubscribtionID().equals(ID) ){
            listener.onMsg( update.getMessage() );
        }
    }
}

public interface AsyncMsgListener {
    public void onMsg(Message msg);
}

public class Message {
    private final String data;

    public Message(String data){
        this.data = data;
    }

    public String getData(){
        return this.data;
    }
}

I have filtered out unnecessary complexity by making the AsyncTransport a standalone class. It doesn’t really connect to anything. The class MockUpdateMsgGenerator will generate mock update for us.

Once the transport connects to server, the DispatchingThread will be started to keep listening to socket connection to see if there is any update from server. When an update arrives, the thread will iterate over subscription list and ask each subscription to fire a notification if the update contains ID match the ID of the subscription. The data structure used to store all subscriptions is Vector. Since all methods in Vector are synchronized (It’s a well known fact and stated in API doc), the designer of this API might think that it’s safe to access this subscription list concurrently. Users can subscribe/unsubscribe while the dispatching thread is iterating over the list.

                while (!stop) {
                    Update update = idGen.genMockUpdateFromSubIDList();
                    for (int i = 0; i < subscription.size(); i++) {
                        ((Subscription) subscription.get(i)).notifyIfSubscribeFor(update);
                    }
                }
 

Here is the code to test the API.

public class ASyncTester {
    public static void main(String[] args) throws InterruptedException {
        AsyncTransport transport = new AsyncTransport();
        transport.connect("localhost", "9999");

        int threadCount = 20;
        int taskSize = 100;
        ExecutorService exec = Executors.newFixedThreadPool(threadCount);
        for (int i = 0; i < taskSize; i++) {
            exec.submit(new SubscribtionCallable(transport));
        }

        exec.shutdown();
        exec.awaitTermination(1, TimeUnit.DAYS);

        transport.stop();
        System.out.println("Done");
    }
}

class SubscribtionCallable implements Callable<Void> {

    private final AsyncTransport transport;

    public SubscribtionCallable(AsyncTransport transport) {
        this.transport = transport;
    }

    public Void call() throws Exception {
        Subscription sub = transport.subScribe(new Message("ASYNC request"), new AsyncMsgListener() {
            public void onMsg(Message msg) {
                System.out.println(msg.getData());
            }
        });

        //wait to get some updates
        Thread.sleep(Math.abs(new Random().nextLong() % 100));
        transport.unsubScribe(sub);
        return null;
    }
}

We have 20 threads sharing 100 Callable tasks. All the tasks are doing the same series of actions; subscribe for data, wait for some update and then unsubscribe.

Check Then Act

The most annoying thing about concurrent problems is that they are hard to be reproduced. A kind of concurrent problem may occur only when the execution of various threads is in a certain order and we can’t just go inside JVM and arrange that order to happen. So, it’s possible that the configuration that works on my machine may not produce the same result on your machine. The configuration in this place I mean the conditions that affect concurrent execution of the program; number of running thread, task. You may want to try running the program with various configurations to see that one may make the program run just fine but one may result in an exception.

For my machine, the configuration shown in the code snippet above will make the Main class run just fine. But when I increase thread count to 50 to run 500 tasks, I will get an exception.

Exception in thread "AsyncTransport Dispatching Thread" java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 49
at java.util.Vector.get(Vector.java:694)
at deadlock.transport.AsyncTransport$DispatchingThread.run(AsyncTransport.java:62)

This is the first problem I am going to show you in this post. Although it’s not possible to corrupt the internal states of Vector by concurrently calling get(), set() or size() method, you still need to think carefully about what actually comprise your mutual exclusive section; the area of code that need to be executed without any intervention from any other threads.

In our example, the dispatching loop need both subscription.size() and subscription.get() to run in mutual exclusive section. This scenario has been known as check-then-act scenario. The dispatching thread will check if the current index is less than the vector size, if it’s true then the element at that index will be retrieved. The steps that lead to the exception are shown below.

  1. The current index is 49, the current size of vector is 50.
  2. The index has been compared to the size of the vector. Since 49 is less than 50 then the body of for loop is executed.
  3. Another thread call unsubscribe() result in removing an element of the vector. Now the vector’s size is 49. The possible index for retrieving element from this vector is now 0 – 48.
  4. The body of for loop gets to be executed and the expression trying to get the element at index 49 result in java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 49

The check-then-act problem above can be addressed in many ways. The easiest way may be using synchronized block to create a critical section.

                    synchronized(subscription){
                        for(int i=0; i<subscription.size(); i++){
                            ( (Subscription)subscription.get(i) ).notifyIfSubscribeFor( update );
                        }
                    }

I am using the vector object itself as the lock of the synchronized block so all methods call of the vector instance will not be able to run if the code in our critical section is running. The state of the vector will not be changed from other threads during the execution of the synchronized block. That means, in our dispatching loop, if subscription.size() return 50 then it’s guaranteed that there will be element with index 49 in the vector by the time subscription.get(i) get to be executed sine it’s not possible for any thread to remove an element from the vector while the dispatching thread is in the critical section.

Unfortunately, this simple fix will lead to another problem. I need to introduce you a new usage scenario of the AsyncTransport so I can demonstrate the consequence problems of the fix above.

Synchronous Requesting Model

Let’s say there are some messages in my program that is not suit to asynchronous call. They are just simple request/reply calls. I will create an API to act as an adapter to make simple synchronous call over asynchronous API.

public class SyncTransport {

    private final AsyncTransport transport;

    public SyncTransport(AsyncTransport transport) {
        this.transport = transport;
    }

    public Message request(Message request, long timeout) throws InterruptedException {
        SyncRequestor syncAdapter = new SyncRequestor(transport);
        return syncAdapter.makeRequest(request, timeout);
    }
}

class SyncRequestor implements AsyncMsgListener {

    private final AsyncTransport transport;
    private volatile Message msg;

    public SyncRequestor(AsyncTransport transport) {
        this.transport = transport;
    }

    public void onMsg(Message msg) {
        this.msg = msg;

        synchronized (this) {
            notify();
        }
    }

    public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);
        synchronized (this) {
	wait(timeout);
            transport.unsubScribe( sub );
        }

        return this.msg;
    }
}

The request() method delegates the execution to SyncRequestor class. The class will subscribe for data and wait for the first update. The execution flow will be block here. Since it is synchronous call, SyncRequestor knows that there will be only one update from server. The update is synchronous reply. Once the first message update arrives, the requestor stores it and calls notify() to tell the waiting thread that the reply is ready. The requesting thread (the thread that call request() method ) then perform unsubscribing and return the stored reply message.

At this point, I have introduced you both synchronous and asynchronous requesting model in my example program. Let’s look at the code in MockUpdateMsgGenerator. The class is able to generate update for both requesting model. String data in Message will be used to identify the requesting model of the message.

public class MockUpdateMsgGenerator{
    public static final String SYNC_MSG_PREFIX = "SYNC";
    public static final String ASYNC_MSG_PREFIX = "ASYNC";

    private final List activeSubId = new LinkedList();
    private final List activeSyncSubId = new LinkedList();

    public String genIDforMessage(Message msg){
        String prefix =  msg.getData().startsWith(SYNC_MSG_PREFIX)? SYNC_MSG_PREFIX : ASYNC_MSG_PREFIX;
        return prefix  + ":" + randomInt();
    }

    public synchronized void addActiveID(String id){
        if(id.startsWith( SYNC_MSG_PREFIX ) ){
            activeSyncSubId.add(id);

        }else{
            activeSubId.add(id);
        }

        notify();
    }

    public synchronized void removeActiveID(String id){
        activeSyncSubId.remove(id);
        activeSubId.remove(id);
    }

    public synchronized Update genMockUpdateFromSubIDList() throws InterruptedException{
        while( activeSubId.size() == 0 && activeSyncSubId.size() == 0){
            wait();
        }

        String id;
        if( activeSyncSubId.size() != 0 ){
            //Synchronous request should get response quickly so send update for
            //this requesing model first.
            id =  (String)activeSyncSubId.remove(0);

        }else{
            //Randomly generates update for subscriber.
            int randomIndex =  randomInt() % activeSubId.size();
            id = (String)activeSubId.get(randomIndex);
        }

        return new Update(id , new Message(id + " : DummyData" ) );
    }

    public int randomInt(){
        return Math.abs(new Random().nextInt() );
    }
}

Now we are ready to run our new example.

public class SyncTester {
    public static void main(String[] args) throws InterruptedException {
        AsyncTransport transport = new AsyncTransport();
        transport.connect("localhost", "9999");

        final SyncTransport syncTransport = new SyncTransport(transport);

        ExecutorService exec = Executors.newFixedThreadPool(20);
        for (int i = 0; i < 200; i++) {
            exec.submit(new Callable<Void>() {

                public Void call() throws Exception {
                    Message resp = syncTransport.request(new Message("SYNC msg"), 1000);
                    if(resp != null){
                        System.out.println(resp.getData());
                    }else{
                        System.out.println("Timeout");
                    }

                    return null;
                }
            });
        }

        exec.shutdown();
        exec.awaitTermination(1, TimeUnit.DAYS);

        transport.stop();
        System.out.println("Done");
    }
}

Wait Leak

When I call request() with timeout 1000 milliseconds, I will get all reply messages and the program terminate properly. This seems to indicate that our MockUpdateMsgGenerator is working quite efficiently. But a strange thing happens when I specify 0 as timeout to wait forever until the reply arrives. The program prints out some reply messages and then freeze. The program doesn’t seem to proceed anytime soon. I may need to know what exactly is going on in the program. What are the parts of the program that being executed? What are the blocking threads and what there are blocking on? The jstack command is the tool I need.

Please note that my real production code is Linux based. I am just simulating the problem using the example program on my window laptop.

C:\Documents and Settings\ThinkPad>jps
3600 Jps
688 SyncTester
2668

C:\Documents and Settings\ThinkPad>jstack 688
2010-05-03 15:49:15
Full thread dump Java HotSpot(TM) Client VM (11.3-b02 mixed mode, sharing):

"pool-1-thread-5" prio=6 tid=0x02b04c00 nid=0xa48 in Object.wait() [0x02fbf000..0x02fbfb14]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x229f0020> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:53)
- locked <0x229f0020> (a deadlock.transport.sync.SyncRequestor)

at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-4" prio=6 tid=0x02b03800 nid=0xeec in Object.wait() [0x02f6f000..0x02f6fc14]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x229e2d70> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:53)
- locked <0x229e2d70> (a deadlock.transport.sync.SyncRequestor)

at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Actually, all 5 threads in our thread pool are blocked but I cut out just 2 threads to show here since all threads are blocked in the same execution path. We can see in the stack trace that thread “pool-1-thread-4” is waiting on an instance of SyncRequestor. The program is freeze because all working threads are blocked waiting for notification of the readiness of reply messages. The question is why the threads are still waiting. It’s unlikely that some reply messages are not make it to the onMsg() call back method since this same test with timeout 1000 milliseconds is working properly.

Actually, the notifications have been fired but the requesting threads don’t catch it in time. There are two threads working concurrently. One is the update dispatching thread, another is the requesting thread. It is possible that once the requesting thread finish executing transport.subScribe() method, the subscription has been processed in a very short time and the update dispatching thread set the reply message to SyncRequestor and call notify() before the requesting thread get a chance to all wait().

This problem is called wait-leak describing the situation that notification has been made before the target audience start waiting for it. The code snippet below is our program with slightly modification to show that some notification has already fired before the requesting thread call wait() method.

public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);

        synchronized (this) {
            if(msg != null){
                System.out.println("Reply already arrived");
            }

            wait(timeout);
            transport.unsubScribe(sub);
        }

        return this.msg;
    }

You may try running the program with the modification above to see that if there are 3 line of “Reply already arrived” then jstack tool will report that there are 3 threads still waiting for notification.

The wait leak problem is one of the reason that it’s a best practice to call wait() in condition checking loop. Another reason we should use the practice is that there is something called spurious wake up that can make things go wrong. Below is the SyncRequestor with the checking whether the reply has already arrived.

public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);

        synchronized (this) {
            if(timeout > 0){
                waitWithTimeOut(timeout);
            }else{
                waitForever();
            }

            transport.unsubScribe(sub);
        }

        return this.msg;
    }

    private void waitWithTimeOut(long timeout) throws InterruptedException {
        long timeoutLimit = System.currentTimeMillis() + timeout;
        while (msg == null) {
            long remainingTimeout = timeoutLimit - System.currentTimeMillis();
            if (remainingTimeout > 0) {
                wait(remainingTimeout);
            } else {
                //timeout
                break;
            }
        }
    }

    private void waitForever() throws InterruptedException{
        while (msg == null) {
            wait();
        }
    }

Deadlock

Let’s get back to our original issue; the consequence of our check-then-act fix. Now we have the check-then-act and the wait-leak problem fixed. Should SyncTransport run without any problem?

Well, with a small thread pool size like 10 and a realistic timeout like 1 or 2 second then you may run it days and nights without any problem. Concurrent problems can be very hard to be detected because the problematic code can run just fine for years if the conditions are right. And just when our system goes under an unusual circumstance, the problems show up in the worst possible time.

I’ve found that, on my laptop, the program will freeze if I run it with thread pool size of 50 and 500 millisecond timeout. To find out what went wrong, jstack is our best friend again. I don’t need to spend much time figuring out what happen. Jstack shows the problem at the bottom of its report.

Found one Java-level deadlock:
=============================
"pool-1-thread-50":
waiting to lock monitor 0x02b1a404 (object 0x22a80330, a java.util.Vector),
which is held by "AsyncTransport Dispatching Thread"
"AsyncTransport Dispatching Thread":
waiting to lock monitor 0x02a83954 (object 0x22a81670, a deadlock.transport.sync.SyncRequestor),
which is held by "pool-1-thread-17"
"pool-1-thread-17":
waiting to lock monitor 0x02b1a404 (object 0x22a80330, a java.util.Vector),
which is held by "AsyncTransport Dispatching Thread"

Java stack information for the threads listed above:
===================================================
"pool-1-thread-50":
at java.util.Vector.add(Vector.java:727)
- waiting to lock <0x22a80330> (a java.util.Vector)
at deadlock.transport.AsyncTransport.subScribe(AsyncTransport.java:22)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:44)
at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
"AsyncTransport Dispatching Thread":
at deadlock.transport.sync.SyncRequestor.onMsg(SyncTransport.java:39)
- waiting to lock <0x22a81670> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.Subscription.notifyIfSubscribeFor(Subscription.java:23)
at deadlock.transport.AsyncTransport$DispatchingThread.run(AsyncTransport.java:67)
- locked <0x22a80330> (a java.util.Vector)
"pool-1-thread-17":
at java.util.Vector.removeElement(Vector.java:593)
- waiting to lock <0x22a80330> (a java.util.Vector)
at java.util.Vector.remove(Vector.java:745)
at deadlock.transport.AsyncTransport.unsubScribe(AsyncTransport.java:40)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:58)
- locked <0x22a81670> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Found 1 deadlock.

If you are good at reading stack trace then the report is straight forward. The execution order that cause deadlock is shown below.

  1. The requesting thread pool-1-thread-17 call SyncRequestor.makeRequest() and grabs lock in an instance of SyncTransport to wait for reply notification.

    at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:58)
    - locked <0x22a81670> (a deadlock.transport.sync.SyncRequestor)

            synchronized (this) {
                if(timeout > 0){
                    waitWithTimeOut(timeout);
                }else{
                    waitForever();
                }
    
                transport.unsubScribe(sub);
            }
    
  2. The timeout is too short. The requesting thread wakes up and proceeds to perform unsubscribing.
    at deadlock.transport.AsyncTransport.unsubScribe(AsyncTransport.java:40)

  3. The unsubscribing process involves removing subscription from subscription vector. Jstack shows that the requesting thread is blocked waiting to grab the lock of the vector.

    at java.util.Vector.removeElement(Vector.java:593)
    - waiting to lock <0x22a80330> (a java.util.Vector)
    at java.util.Vector.remove(Vector.java:745)

    It turns out that a thread has already grabs the lock of the vector. The requesting thread has to wait for that thread to release the lock of the vector before it can proceed.

  4. At this moment, the “AsyncTransport Dispatching Thread” which runs concurrently with the requesting thread is executing the update dispatching loop. The dispatching thread grabs the lock of subscription vector to iterate over all subscriptions.


    at deadlock.transport.AsyncTransport$DispatchingThread.run(AsyncTransport.java:67)
    - locked <0x22a80330> (a java.util.Vector)

          synchronized(subscription){
                for(int i=0; i<subscription.size(); i++){
                     ( (Subscription)subscription.get(i) ).notifyIfSubscribeFor( update );
                }
           }
    

    It’s the update dispatching thread that holds the lock of subscription vector which make the requesting thread cannot proceed.

  5. It’s a very bad coincidence that the update dispatching thread is calling onMsg() method on the to-be-unsubscribed subscription. The thread set reply message then tries to grab the lock of target SyncRequestor instance to be able to call notify().


    at deadlock.transport.sync.SyncRequestor.onMsg(SyncTransport.java:39)
    - waiting to lock <0x22a81670> (a deadlock.transport.sync.SyncRequestor)
    at deadlock.transport.Subscription.notifyIfSubscribeFor(Subscription.java:23)

           public void onMsg(Message msg) {
               this.msg = msg;
    
               synchronized (this) {
                   notify();
                }
           }
           
  6. The update dispatching thread couldn’t grab the lock of the SyncRequestor instance because, in bullet 1, pool-1-thread-17 hasn’t release it yet. Now you can see that this is a deadlock. The requesting thread is holding the lock of SyncRequestor instance and wait for the lock of subscription vector, the update dispatching thread is holding the lock of the vector and waiting for the lock of the SyncRequestor instance.

    Jstack has summed it up quite nice.

    "AsyncTransport Dispatching Thread":
    waiting to lock monitor 0x02a83954 (object 0x22a81670, a deadlock.transport.sync.SyncRequestor),
    which is held by "pool-1-thread-17"

    "pool-1-thread-17":
    waiting to lock monitor 0x02b1a404 (object 0x22a80330, a java.util.Vector),
    which is held by "AsyncTransport Dispatching Thread"

Again, there are many solutions for this problem. I choose to fix it by just move the unsubscribing method out of the synchronized block. The method doesn’t need to be executed in the synchronized block at all. It’s always a good idea to check if your synchronized block is too big and remove expressions that don’t really need to be in that critical section.

public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);

        synchronized (this) {
            if(timeout > 0){
                waitWithTimeOut(timeout);
            }else{
                waitForever();
            }
        }

        transport.unsubScribe(sub);
        return this.msg;
}

Now, the requesting thread will release the lock of SyncRequestor instance before perform unsubscribing. This will break the circle lock acquisition and prevent deadlock in the program.

read more

Related Posts

Share This

Java Web Services Deployment Model

Apr 13, 2010 by

The product I am developing is going to come out in early access package pretty soon. My QA and Support team are still not quite get it why the web services in our product needs additional steps to add some descriptor files when it going to be deployed on Sun Java System Web Server. I also found this confusion quite commonplace among beginners who start learning JAX-WS. I would like to explain it here so the next time somebody asks me again then I can just direct him to this post.

Java EE deployment model
The final package of JAX-WS web services is an ordinary war file just like any other Servlet/JSP web application’s war file. If you look at the basic JAX-WS tutorial you will notice that the implementation can be as minimal as a simple Java class with a couple of web service annotations.

@WebService()
public class NewsService {
    public String[] getAlert(){
        //implementation
    }
}

You can just pack this class to a war file without the need of any descriptor file even web.xml and then deploy it to Java EE application server to get a nice and simple web services.

You are able to do this kind of deployment because your application server has implemented all the necessary specifications for developing/deploying web services on Java EE platform. The main specification defining potable programming model for Java EE platform is JSR-109 (Implementing Enterprise Web Services). If you search internet and mailing list, you may found many developers call web services with this kind of deployment model as “JSR-109 web services”. Actually, JSR-109 always works together with other specification like JSR-224 (JAX-WS) and JSR-181 (web services annotation) so I will choose to call this deployment model “Java EE deployment model”.

If you choose to make your web services compliant with JSR-109 specification then your war file can be deployed on any Java EE application servers that support the specification. I have developed my unit testing on Glassfish and give the same package to QA team to run system test on WebSphere.

Now, let’s look at it in the real action.
Here we are creating a web services stating from creating a web application project. To demonstrate JSR109 web services, I will choose target server to be Glassfish which is a full Java EE application server.

ChoosingContainer

Once you have created a web service (using web services wizard or manually create java class with web services annotation), you will see that there is nothing in your web.xnl that related to web services information. You can ignore the sun-web.xml, this file is specific to Glassfish and it is not descriptor in our main topic. You may want to delete all descriptor in WEB-INF and deploy your web application to see that your web services application is still able to work properly.

JSR109_IDE

Here is the default web.xml file content.

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">
    <session-config>
        <session-timeout>
            30
        </session-timeout>
    </session-config>
    <welcome-file-list>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>
</web-app>

Now you may want to look at the final war file that you can distribute to your client.
You will see that there is no web services library packed in your war file at all. Since this is JSR109 web services so it expect Java EE server to provide all necessary libraries.

D:\Program\netbeans-projects\JSR109\dist>jar -tf JSR109.war
META-INF/
META-INF/MANIFEST.MF
WEB-INF/
WEB-INF/classes/
WEB-INF/classes/ws/
WEB-INF/classes/ws/news/
WEB-INF/classes/ws/news/NewsService.class
index.jsp

Servlet Based deployment model
If you want to deploy the NewsService example above on non full Java EE server (Sun Java System Web Server, Tomcat) then you may have some concerns to think about.

The most important concern is that this kind of deployment is not portable. From the eyes of the non Java EE container, your web services war file is no different from the normal Servlt/JSP web application’s war file. If you want to do web services then you have to choose an implementation of JAX-WS and pack all the necessary libraries into your war. Now you will get a web application that happens to have some libraries that know how to do web services.

Another thing is that your web services implementation classes are not going to be automatically discovered by your container. You will have to edit some file descriptors to tell your JAX-WS implementation library to know how you want your web services to be deployed. These descriptors are specific to the library you are using. The web services in my product need sun-jaxws.xml file when it is going to be deployed on SJSWS because we are using SUN JAX-WS reference implementation for this kind of deployment.

In NetBeans, if you choose target server to be Apache Tomcat then you will get the war file structure as shown below.

ServletBased_IDE

You will see that sun-jaxws.xml file will be added to your WEB-INF directory once you create a web services. Both sun-jaxws.xml and web.xml will contain descriptive information about your web services.

web.xml

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">
    <listener>
        <listener-class>com.sun.xml.ws.transport.http.servlet.WSServletContextListener</listener-class>
    </listener>
    <servlet>
        <servlet-name>NewsService</servlet-name>
        <servlet-class>com.sun.xml.ws.transport.http.servlet.WSServlet</servlet-class>
        <load-on-startup>1</load-on-startup>
    </servlet>
    <servlet-mapping>
        <servlet-name>NewsService</servlet-name>
        <url-pattern>/NewsService</url-pattern>
    </servlet-mapping>
    <session-config>
        <session-timeout>
            30
        </session-timeout>
    </session-config>
    <welcome-file-list>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>
</web-app>

sun-jaxws.xml

<endpoints version="2.0" xmlns="http://java.sun.com/xml/ns/jax-ws/ri/runtime">
  <endpoint implementation="ws.news.NewsService" name="NewsService" url-pattern="/NewsService"/>
</endpoints>

And here is the layout of the final package.
D:\Program\netbeans-projects\ServletBasedWS\dist>jar -tf ServletBasedWS.war
META-INF/
META-INF/MANIFEST.MF
WEB-INF/
WEB-INF/classes/
WEB-INF/classes/ws/
WEB-INF/classes/ws/news/
WEB-INF/lib/
META-INF/context.xml
WEB-INF/classes/ws/news/NewsService.class
WEB-INF/lib/activation.jar
WEB-INF/lib/webservices-api.jar
WEB-INF/lib/webservices-extra-api.jar
WEB-INF/lib/webservices-extra.jar
WEB-INF/lib/webservices-rt.jar
WEB-INF/lib/webservices-tools.jar
WEB-INF/sun-jaxws.xml
WEB-INF/web.xml
index.jsp

You will see that jar files of METRO (JAX-WS RI ) have been packed along with our final war file.

read more

Related Posts

Share This

Bug of Glassfish 2.1 in clearing ServletContext’s attributes

Apr 12, 2010 by

I had spent the whole morning finding out what caused an error in a so simple class of a module. It turned out to be a defect of Glassfish Server. I would like to describe the defect here as my own reference.

Since I can’t show you the real production code so I will use this example to describe the defect

public class AppContextListener implements ServletContextListener{
    public static final String AUTHEN_ENABLED = &quot;AUTHEN_ENABLED&quot;;

    private volatile AuthenProcessor authenProsessor;

    public void contextInitialized(ServletContextEvent sce) {
        authenProsessor = createAuthenProcessor();

        if( readConfigBoolean(AUTHEN_ENABLED) ){
            sce.getServletContext().setAttribute(AUTHEN_ENABLED, new Boolean(true) );
            authenProsessor.startup();
        }
    }

public void contextDestroyed(ServletContextEvent sce) {
        Boolean authenEnabled = (Boolean)sce.getServletContext().getAttribute(AUTHEN_ENABLED);

        if( authenEnabled.booleanValue() ){
            authenProsessor.shutdown();
        }
    }
}

I think I don’t need to explain what the class does because it’s easy enough to read. The default value for AUTHEN_ENABLED configuration is true. Once you get the web application running and then tries to undeploy it you will get NullPointerException at the line authenEnabled.booleanValue().

We don’t have any Servlet or JSP yet so it’s impossible that there will be any module that removes AUTHEN_ENABLED attribute from the ServletContext. I have googled it out and found that this is a defect of Glassfish 2.1 that all attributes of ServletContext will be cleared before the contextDestroyed() method gets to be called.

You may find the detail at defect log Issue6594. Actually, the Issue of the defect is 6442 but I found the summary in Issue6594 contains more information.

This defect has been fixed in Glassfish V3. I have deploy the web application on V3 and problem has gone.

read more

Related Posts

Share This

Simple little things

Dec 10, 2009 by

A couple weeks ago, I had a chance to talk to one of my old friends who had been running his startup firm for about 2 years. He came to Bangkok for his business and stayed at my place. We talked about a lot of things especially about how his business was going. He had shared with me his story regarding how hard it was to find and hire a decent developer.  We both agreed that there were not many developers here in Thailand (compare to the number of all university graduates) that had passion in developing software.  My friend told me that when he took a look at the code written by his staff, there were many times that he felt like he could make it a lot better by just adding some simple little changes here and there.

Well, this kind of talk isn’t new. The question has been asking for long time. What is the most important virtue for a man to be successful in his professional life?  Is it the level of intelligence or the passion or the hard working? In forum, soon-to-be-graduates keep asking what are the abilities or skill that the potential employers have been looking for. Novice developers take two or three training courses to acquire knowledge about the coolest technology. There is nothing wrong about doing that. In fact, it’s quite a reasonable preparation for jobs applying.  But sometime I think that it’s not so much about having the most advance skill. In the entry level, it’s more like having great attitude and strong passion.

If you love what you are doing (I know it’s hard), there is great chance that you will spend more time or investing more effort in your work. You are doing that not because you are hoping to get promotion to the next level as soon as possible but because you care about your work itself. A module of your code may already wok just fine. But if you keep learning, one day you may find out that with just some little changes you can make a lot improvement for the module. Those changes are not so complicate that you may need higher education to figure them out. It may be just a pattern or best practice that you learned from a blog entry you has read this morning.

For example, I have seen many place that Log4J logger has been used as;

public ItemResponse processItem(ItemRequest req) throws ItemException {
String rev;
try {
rev = itemArchive.getRevision(req.getItemID() );
} catch (ItemArchiveException ex) {
logger.error(ex);
//recovery code goes here
}
//some code here
}

If you have spent even little time playing with Log4J you may already know that the above logging will not print the stack trace of  the ItemArchiveException. This can cause a headache in trouble shooting problem in real production. Let’s look at the API doc.

The second line is the version that should be used for logging exception. All the stack trace will be logged.

  • public void error(java.lang.Object message)
  • public void error(java.lang.Object message, java.lang.Throwable t)

With a little change to use logger.error(“Failed to get item revision”, ex) can result in a big benefit here.

Speaking of logging stack trace, in my experience, chained exception is very useful when it comes to investigating log file. You should always includes the cause exception in the exception you are about to throw unless you can make the about-to-be-threw exception so meaningful in itself (which I will say it’s pretty hard to do so).

Let’s look at the above list again. What if I want to throw ItemException if I couldn’t get revision value for the item?

try{
// some code here
} catch (ItemArchiveException ex) {
throw new ItemException("Couldn’t get revision for " +  req.getItemID());
}

It will be quite impossible to know what exactly went wrong by just knowing the ID of the item in trouble. Again, a little change can be really helpful

throw new ItemException(“Couldn’t get revision for ” + req.getItemID(), ex);

I have to tell you that I am not trying to exaggerate the problem of these two cases above. I have experienced both cases in real production code and it cause quite a headache for me.

Now back to my main idea of this post.
Well, I just realized that I don’t have a main idea for this post at all. It’s my rambling about how hard it is to spot developers who really enjoy writing software.

read more

Related Posts

Share This

Spurious Wakeup

Sep 14, 2008 by

I have finished reading Effective Java long time ago. It is such a great book. The more I passed through each page the more I realized how little I know about java programming. The distance between “coder” and “developer” is really far

I read most of the item listed in the book. One of the items I have skipped was Item 50: Never invoke wait outside a loop. The code below show the concept of this item

synchronized (obj) {
    while (<condition does not hold>)
        obj.wait();

     ... // Perform action appropriate to condition
 }

Looking at the name of the practice, I thought I knew all the reasons behind it so I just skipped it. Today I found an interesting post asking What is spurious wakeup. I read it and found that “threads can wake up on wait() for no reason at all”!!!!! There are quite many good references for this fact and one of them is, guess what, the item 50 of Effective Java. I would have known it for long time ago if I just read it. One of the reasons behind it stated in the item is that

The waiting thread could wake up in the absence of a notify. This is known as a spurious wakeup. Although The Java Language Specification [JLS] does not mention this possibility, many JVM implementations use threading facilities in which spurious wakeups are known to occur, albeit rarely [Posix, 11.4.3.6.1]

I thought sometimes it was OK to call wait() without condition-checking loop if the object to wait on represented just one condition, the object was shared only between waiting and notifying threads and the code’s execution order guaranteed that wait leaks would not occur. Now giving that JVM implementation can send spurious wakeup signal, the condition checking loop is A MUST

Apparently, the spurious wakeup is an issue (I doubt that it is a well known issue) that intermediate to expert developers know it can happen but it just has been clarified in JLS third edition which has been revised as part of JDK 5 development. The javadoc of wait method in JDK 5 has also been updated

A thread can also wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. While this will rarely occur in practice, applications must guard against it by testing for the condition that should have caused the thread to be awakened, and continuing to wait if the condition is not satisfied. In other words, waits should always occur in loops

You may wonder (like me) why JLS designer decided to allow this kind of thing to happen. It’s not that I don’t want to use condition checking loop. The checking is the best practice that developers should always do no matter of them knowing anything about the spurious wakeup or not. But I just don’t see the benefit of allowing the wakeup for no reason. It turns out that this is something about performance as stated in Multithread Programming with Java

Due to some arcania in the hardware design of modern SMP machines, it proves to be highly convenient to define them like this. The hardware runs a little faster, and the programmer needs to reevaluate the condition anyway

read more

Related Posts

Share This