Debugger is your friend

Apr 16, 2011 by

Bottom Line: I have listed my top reasons for why debugger is really useful

Debugger has always been a big help for me. It is one of the main tools I uses in my day-to-day work, both in coding and troubleshooting. One great thing about debugger is that you can get big benefit form it just by using some simple features like step-over or step-into. I don’t quite get it why debugger hasn’t been used as much as it should be (or may be it’s just where I work that debugger is underused). I have listed my top reasons for the question of why we should utilize debugger

1. Debugger can show you what EXCATLY is happening.

This may be the most obvious reason. The ability to see the actual current value at each execution step is great. I have seen many posts on technical forum asking why a small code snippet doesn’t work like expected. Example of these kind of questions are “Why does my loop run forever?” or “Why I got NullPointerException”. The problematic program is often quite small and primitive. Well, if you are good at reading source code then you may able to trace from the output back to root cause. But if you are not, or your code just too hard to read then why don’t you run the code with debugger and see why that expression doesn’t return false so your loop run forever

Sometimes, I get frustrated with this kind of questions. The answer can be found just by using debugger. It’s more effective and faster than posting question on forum and wait for reply next day later

2. Finding the Happy Path

This is my favorite reasons for using debugger. It’s useful when I have to deal with unfamiliar modules of my project or when I am trying to get some information form third-party framework. I don’t want to fully understand everything in the code I am working with. I just need enough information that enables me to proceed. The benefits of debugger in this category are two folds.

Stepping through components according to default configuration

A program may contain many configurations. A different set of configurations may result in different implementations or different modules glued together at runtime. I may not familiar with the framework perform configuration parsing or some configurations are derived from several values. I don’t want to read all this code for starting up the program. I have a business logic that I am interested in. I just want to know that if I run the program with default configurations, what the class performing the logic is. When I see abstract factory class, I may not want to know how it retrieves the real implementation. I just want to jump right to the implementation code. With debugger, I can just step over the parts that I don’t interested in and try step-into the part that I think might lead me to the actual implementation according to the default configuration

Stepping through Happy Path
The Happy Path, in my definition, means the default path that runs successfully without unexpected thing. It is the execution path on which a program will spend most of its time.

A program contains a lot of conditional branches which dispatch execution to different paths. An execution path may be used only when a special component is present or when a program is in a specific state like suspended or when the program has been run on a specific type of runtime. Again, I don’t want to understand all the possible paths. I can just step-over each expression to see that Happy Path for a use case scenario. With debugger I can filter out a lot of information that is not directly effect the usage scenario I am interested right now.

You may think that this is not much useful because you can guess the Happy Path by just reading the code. All competent developers are good at reading source code to get the knowledge of how the system works. If you see something like:

if( throwable != null){ //….}

if( processor.configureWith(Interceptor.PRE_PROSESSOR ) ){ //…}

Then it is quite easy to guess. Unfortunately, not all developers like to keep their code clean and easy to read. If the code is something like:

if( slot.length > 0 && attributedList.contains(currentSymbol) && link.nextLink != null){
    //….
}
 

You may need some time to see when each variable is in a specific state to know if this code is a part of Happy Path or it’s for another scenario of requesting.

3. Visualize code execution

Writing code needs imagination. Programmers perform some level of design in their head then translate it to source code. Reading code sometimes needs more imagination than writing it, especially when you are not the one who developed the code base. Programmers will load a chunk of code into their brains, execute it line-by-line trying to figure out what the code is doing while keeping values of all variables on the current stack trace in their mind at the same time. I think the ability of visualizing code execution is essential. When I have to deal with the code that I am not familiar with or read a complicated logic, I sometimes use debugger to help me visualize how the code works

For example, I am doing self-study on Continuation-Passing Style. I found a short and nice java example program showing who to find factorial value of a number using CPS

public class Factorial {
    public static void main(String[] args) {
        int n = 3;
        int factorial = faci(n,
                new Cont() {
                    public int k(int v) {
                        return v;
                    }
                });
        System.out.println(factorial);
    }

    static int faci(int n, Cont cont) {
        while (n != 0) {
            cont = new FacCont(n, cont);
            n--;
        }
        return cont.k(1);
    }
}

interface Cont {
    int k(int v);
}

class FacCont implements Cont {

    private final int n;
    private final Cont cont;

    public FacCont(int n, Cont cont) {
        this.n = n;
        this.cont = cont;

    }

    public int k(int v) {
        return cont.k(n * v);
    }
}

A bright and fast-brain developer may read the example and able to figure everything out in instantly. Sadly, I am not that kind of developers. I used debugger to see the actual value in each step to help me understand it faster. I have stepped into faci() method and went through executions until I went out of while loop. What I saw at this point was a chain of FacCont object

The FactCont instance #64 contained variable n with value 1 and a reference to FactCont instance #63

The FactCont instance #63 contained variable n with value 2 and a reference to FactCont instance #62

It went on like this through the chain

The I stepped into the first call of k() method ( at the line > return cont.k(1); )

1*1 = 1 so the value 1 had been sent to the next FactCont in the chain.

2*1 = 2 so the value 2 had been sent to the next FactCont in the chain.

3*2 = 6 so the value 6 had been sent to the next FactCont in the chain.

Now, I knew what was going on. FactCont encapsulates the continuation; the next execution that need to be done. In the build-up code in while-loop, a continuation has been passed to another continuation to form a series of execution

Note: I am not expert in CPS. If I understand it wrong then feel free to comment

4 Cross Check some facts with my brain

Human brain is incredible. Sometimes, it’s effective and innovative. Sometimes’ it’s so stupid and un-reliable. Have you ever had a moment that you spent a large amount of time finding simple little ridiculous thing that caused a bug in your system. In a good day that your mind is quite clear, you may need just 5 minute to spot it out. But in the day before release date or the day you just hear from the manager that your bonus may not be what you have expected then you may look at the root cause 10 times and still don’t be able to figure it out

This brain dysfunction can occur even with a few line of code. I remember a time when I spent half an hour trying to figure out what went wrong in a JUnit test method with 10 line of code. Apparently, I have performed assertion on the wrong object

Debugger doesn’t directly solve this problem. I use it as a tool to crosscheck that the things in my mind is the same as the things that are actually happening in the execution. If I have spent some time unsuccessfully finding some thing in my code then I may switch to run it with debugger and see if my brain has play a trick with me or not

Those are all my top reasons for using debugger. I am sure there are more useful usage scenarios of it. I think most competent developers are making a heavy use of it already so I will just encourage those novices to give it a try. Debugger is your friend

read more

Related Posts

Share This

XML Special Characters

Feb 23, 2011 by

A couple weeks ago, QA team of my project has reported that there was a defect in a module I have developed. The module is a part of our web services for distributing news data. The inbound execution flow is quite simple; the module accept SOAP request from client, transform it to a proprietary XML format of our backend server then send the transformed request to the backend news engine. The characteristic of the defect is that the users can not use string containing XML special characters as a search keyword e.g. they can not query news headlines containing “S&P500”. The news engine could not parse the requests causing exception on the backend server.

It’s a pretty well known fact that there is a set of special characters that must be properly escaped using entity reference before an XML instance that contains these characters can be consumed by any standard XML parser. These characters are apostrophe ( ‘ ), ampersand ( & ), quotation mark ( ” ), less-than symbol ( < ) and greater-than symbol ( > ). Normally, we don’t need to inspect all characters of our string data manually. All decent XML libraries should be able to handle the task for us. For example, JAX-WS stub internally uses JAXB for message parsing so you can just set “S&P500” directly as a search keyword.

NewsProvider_Service service = new NewsProvider_Service();
NewsProvider provider = service.getNewsProviderPort();
List<String> hls = provider.getNewsHeadlines("companies:S&P500");

JAXB will escape the “&” character using “&” entity reference to form a valid XML document. The actual SOAP string sent through network will look like:

<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
    <S:Body>
        <ns2:getNewsHeadlines xmlns:ns2="http://ws.news.devguli/">
            <query>companies:S&amp;P500</query>
        </ns2:getNewsHeadlines>
    </S:Body>
</S:Envelope>

The receiver at the other end of this communication must also use XML libraries to parse the message to get back the original “companies:S&P500” string.

It was not too hard to figure out that there was something wrong with the code responsible for creating the messages between my web services and the backend server. I must have done something that let the characters out un-escaped. What I didn’t quite understand was that the error occurred only when the search keyword contained “<” or “&” character. The rest of the special characters could be sent to the backend server just fine. I thought it was something about implementation dependent behavior of each XML libraries so I tried playing with various API and found that all standard Java XML libraries perform escaping only for “<”, “>” and “&”. I have to admit I haven’t noticed this before.

public class Main {
    public static void main(String[] args) throws Exception {
        String msg = " [ < ], [ > ],  [ \" ] , [ & ], [ ' ]";
        StringWriter writer = new StringWriter();

        writeDOM(msg, writer);
        System.out.println("DOM Output = " + writer);

        writer = new StringWriter();
        writeJAXB(msg, writer);
        System.out.println("JAXB Output = " + writer);

        writer = new StringWriter();
        writeSTAX(msg, writer);
        System.out.println("StaX Output = " + writer);
    }

    public static void writeDOM(String msg, Writer writer) throws Exception{
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = builder.newDocument();

        Element data = doc.createElement("Data");
        data.appendChild( doc.createTextNode(msg) );

        doc.appendChild( data);

        Transformer tr = TransformerFactory.newInstance().newTransformer();
        tr.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        tr.transform( new DOMSource(doc.getDocumentElement()) , new StreamResult(writer) );
    }

    public static void writeJAXB(String msg, Writer writer) throws Exception{
        QName qn = new QName("Data");
        JAXBElement<String> elem = new JAXBElement<String>(qn, String.class, msg);

        JAXBContext ctx = JAXBContext.newInstance(String.class);
        Marshaller m = ctx.createMarshaller();
        m.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
        m.marshal(elem, writer);
    }

    public static void writeSTAX(String msg, Writer writer) throws Exception {
        XMLStreamWriter xmlWriter = XMLOutputFactory.newInstance().createXMLStreamWriter(writer);

        xmlWriter.writeStartElement("Data");
        xmlWriter.writeCharacters(msg);
        xmlWriter.writeEndElement();
        xmlWriter.close();
    }
}

The above code show that JAXB, DOM and StAX are all output the same string; “ [ &lt; ], [ &gt; ], [ " ] , [ &amp; ], [ ' ]”. I tried to gather for more information and found a post on StackOverflow that Jon Skeet (you must have heard the name if you are a regular at the site) had post a valuable reply to answer the question.

From section 2.4 of the XML 1.0 spec (5th edition)

“The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<" respectively. The right angle bracket (>) may be represented using the string “>”, and must, for compatibility, be escaped using either “>” or a character reference when it appears in the string “]]>” in content, when that string is not marking the end of a CDATA section.”

The above paragraph states that escaping “<” and “&” is the must. This explains why, in my cases, the exception occurred only when the request contains search keyword with those two characters but the requests with keyword containing “>” works just fine.

For the greater-than character, it seems like the rules are a bit relax. In writing-out operation, XML libraries “must” escape “>” character if the libraries want to produce XML instance that compatible with SGML standard (superset of XML standard). I can see all standard Java XML libraries do just like that but I am not sure it’s because this compatibility concern or it’s just a good practice to do so. In reading-in operation, the “>” characters in the raw xml string don’t need to be escaped. XML libraries are able to parse a file containing the string content as shown below successfully.

<Data> Special text A > B </ Data>

There is one exception. If the greater-than character is part of the string “]]>” but the string doesn’t form a proper CDATA section then the “>” character must be escaped.

<Data> Special text  ]]> </Data>  // cause parsing error
<Data> Special text  ]]&gt; </Data>  // valid XML
<Data><![CDATA[ Special text]]></Data> // valid XML

I have to say this small defect teach me a lot about escaping special characters in XML.

Root cause

I will write about the root cause of this defect here in case you are interested how my module make uses of XML libraries but still has a hole that let those special characters out un-escaped. The reason is that the module contains a part that construct XML document by just appending string together. The messages using in web services are JAXB object and I have a requirement to marshal some JAXB objects to XML string but with different namespace. Since namespace information is the inherent property of JAXB which can’t be modified so I have to marshal those object to SAX content handler and manipulate namespace information in the handler instead.

public static String overrideNameSpace(JAXBElement<?> jaxb)
throws JAXBException{
        Marshaller m = JAXBContext.newInstance("com.devguli").createMarshaller();

        StringBuffer xml = new StringBuffer();
        String nsPrefix = "news";

        NameSpaceOverriderHandler handler = new NameSpaceOverriderHandler (nsPrefix);
        m.marshal( jaxb , new SAXResult(handler) ); 

        xml.append("<" + nsPrefix + ":Root xmlns:" + nsPrefix + "='http://devguli.com/news'>");
        xml.append(handler.getOutputXML() );
        xml.append("</" + nsPrefix + ":Root>");

        return xml.toString();
    }

public class NameSpaceOverriderHandler extends DefaultHandler {

    private final String nsPrefix;
    private final StringBuilder outputXml;

    public NameSpaceOverriderHandler(String nsPrefix) {
        this.nsPrefix = nsPrefix;
        this.outputXml = new StringBuilder();
    }

    public void startElement(String uri, String localName, String qName, Attributes atts)
    throws SAXException {
        outputXml.append("<" + nsPrefix + ":" + localName + serializeAllAttributes(atts) + ">");
    }

    public void endElement(String uri, String localName, String qName)
    throws SAXException {
        outputXml.append("</" + nsPrefix + ":" + localName + ">");
    }

    public void characters(char[] ch, int start, int length) throws SAXException {
        outputXml.append(ch, start, length);
    }

     ………
     ………

    public String getOutputXML() {
        return outputXml.toString();
    }

}

You can see in the characters() method of SAX callback that I just appended all characters directly without checking whether the char array contained XML special characters or not.

read more

Related Posts

Share This

Setup JAX-WS RI source on NetBeans

Dec 30, 2010 by

I have been looking at JAX-WS RI code to gather some information for a while. I have set up the implementation source code on NetBeans so I can debug the library. I believe there is no document with fine details about how to do it. The official page of JAX-WS project says you can checkout the code, do ant to build the project then “use your favorite IDE to hack the code”. Frankly, the amount of details in the instruction is quite fair enough. The source directory is already a NetBeans project which saves you from source directory and library configuration. The problem is how to link this project to a web services application so I can debug the JAX-WS implementation source. I will log all the steps here as a reminder and also for anybody who might find it useful.

Setup JAX-WS RI project

First of all, you need an account on java.net. You may login and see all the repository paths at http://java.net/projects/jax-ws/sources.

It will take too long time to checkout the whole trunk. You may need just the jaxws-ri directory.

D:\Program\src-repo> svn checkout https://svn.java.net/svn/jax-ws~sources/trunk/jaxws-ri

The next step is to use ant to build the code base. The build process will generate some classes that needed to be included to make our code base compiled without error. The video below will show you how to setup the project.

Debugging web services application

I think the hard part is about web services deployment model. If we choose to make our web services application comply with JSR109 specification then the deploying mechanism for our application is tightly integrated to application servers. Application servers will search for classes with web services annotations and automatically expose the classes as web services. I don’t know how to hook into this mechanism to change JAX-WS implementation that an application server is using. So I have to use the Servlet-based deployment model instead. With this model, web services application is just a normal Java web application. Web services library is just some jars bundled with war file. If we play with library loading in a right way, we can make my web services run with our specific JAX-WS implementation.

JDK 1.6 comes with an implementation of JAX-WS. The implementation is also from Metro project but in different package name. We need to use the endorsed directory mechanism to override the default implementation in JDK. That is way I said we need to play with library loading. I have captured all the steps in the video below.

read more

Related Posts

Tags

Share This

JAXB Binder and XPath

May 14, 2010 by

I came across javax.xml.bind.Binder when I was reading SOA Using Java Web Services (excellent book). I had never used this class before so I set out to find how the class could be used. I found that the class hadn’t been mentioned as much as the classes like Marshaller or UnMarshaller but it was very useful.

Binder is usually used to perform partial binding; unmarshalling JAXB object from a part of XML DOM tree. JAXB specification states three use cases of the class. Two are related to partial binding and another one is about the capability of using XPath navigation. It is the last one that I am interested in the most because I actually have a module of my product that I can make use of this technique perfectly.

Below is the XML schema I have created just to simulate the functionality of the module.

<schema xmlns="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://ws.news.com/query"
    xmlns:tns="http://ws.news.com/query"
    elementFormDefault="qualified"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <element name="Query" type="tns:Query"/>

    <complexType name="Query">
    	<sequence>
    		<element name="TimeOut" type="int"/>
    		<element name="Hit" type="int"/>
    		<element name="Filter" type="tns:Filter"/>
    	</sequence>
    </complexType> 

    <complexType name="Filter">
		<group ref="tns:Searchable"/>
	</complexType>

	<element name="And" type="tns:BooleanExpr"/>
	<element name="Or" type="tns:BooleanExpr"/>

	<group name="Searchable">
		<choice>
			<element name="Company" type="string"/>
			<element name="Section" type="string"/>
			<element name="TitleText" type="string"/>
			<element name="TitleAndBodyText" type="string"/>
			<element ref="tns:And"/>
			<element ref="tns:Or"/>
		</choice>
	</group>

	<complexType name="BooleanExpr">
		<sequence>
			<group ref="tns:Searchable" minOccurs="2" maxOccurs="unbounded"/>
		</sequence>
	</complexType>
</schema>

The schema describes request format of a kind of search engine. Users are able to search for item that associated with metadata; Company/Section or search for item that contains a particular string. In my real production code, it’s a news server. The interesting thing is that the schema allow user to group searchable indexes using boolean operator like And, Or. The boolean operators can also be comprise of sub boolean operators allowing the request to grow with no limit of the depth of content tree.

The functionality of the module I’ve mentioned is to extract all occurrences of JAXB object correspondent to , perform decoration on the content of the indexes then replace the original content with the newly decorated one. Below is an example of a simple request.

<Query xmlns="http://ws.news.com/query"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

	<TimeOut>10</TimeOut>
	<Hit>60</Hit>
	<Filter>
		<Or>
			<And>
				<Or>
					<Section>News</Section>
					<Section>Announcement</Section>
					<Section>Product</Section>
				</Or>
				<Company>BBL.BK</Company>
			</And>
			<And>
				<Or>
					<Section>News</Section>
					<Section>Announcement</Section>
					<Section>Product</Section>
				</Or>
				<Company>PTT.BK</Company>
			</And>
			<And>
				<Section>Trade</Section>
				<Company>SCB.BK</Company>
				<Company>MSFT.O</Company>
				<Company>IBM.N</Company>
			</And>
		</Or>
	</Filter>
</Query>

Manipulating JAXB object is normally easier than operating on low level DOM. But traversing through the whole JAXB object hierarchy is not much better than traversing DOM tree. Especially when the JAXB object we are working with is not quite straightforward. Let’s look at the generated BooleanExpr class for example.

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "BooleanExpr", propOrder = {
    "searchable"
})
public class BooleanExpr {

    @XmlElementRefs({
        @XmlElementRef(name = "Or", namespace = "http://ws.news.com/query", type = JAXBElement.class),
        @XmlElementRef(name = "TitleAndBodyText", namespace = "http://ws.news.com/query", type = JAXBElement.class),
        @XmlElementRef(name = "TitleText", namespace = "http://ws.news.com/query", type = JAXBElement.class),
        @XmlElementRef(name = "Section", namespace = "http://ws.news.com/query", type = JAXBElement.class),
        @XmlElementRef(name = "And", namespace = "http://ws.news.com/query", type = JAXBElement.class),
        @XmlElementRef(name = "Company", namespace = "http://ws.news.com/query", type = JAXBElement.class)
    })
    protected List<JAXBElement<?>> searchable;

public List<JAXBElement<?>> getSearchable() {
        if (searchable == null) {
            searchable = new ArrayList<JAXBElement<?>>();
        }
        return this.searchable;
    }
}

The concept of data binding between Java and XML is not a perfect world. XML is a very large and complex standard. It’s very difficult if not impossible to define mapping between Java representation and the whole XML information set seamlessly. Some XML artifacts are not able to be mapped to Java with all XML constraints 100% preserved.

Content in BooleanExpr is a choice model group which combines with the maxOccurs=”unbounded” constraint to make the getSearchable() method doesn’t look so nice. Traversing through this BooleanExpr need some checking to see what is the object being operated on.

public static void handleBooleanExpr(BooleanExpr expr){
  List<JAXBElement<?>> searchableList = expr.getSearchable();
  for(JAXBElement<?> elem : searchableList){
  if( elem.getName().equals(andQname ) || elem.getName().equals(orQname )){
	handleBooleanExpr( (BooleanExpr)elem.getValue() );

   }else{
	if( elem.getName().equals(companyQname ) ){
		decorate(elem);
	}
   }
  }
}

I am using just one choice model group in the example because I don’t want to make it too complicated. You may be able to guess that the code to traverse JAXB object will get bloated quickly if there are three or more choice model groups.

If our JAXB object was DOM document then XPath is the clear choice for this kind of task. But if I want to use DOM then I have to marshall the JAXB object to DOM, apply XPath query to perform decoration then unmarshall the modified DOM back to JAXB object. I need to repeat this round-trip processing every time I want to use XPath on the request. It would be nice if I can operate on the request both with JAXB object and XPath. JAXB Binder allows you to do just that.

public void decorateCompany(Query query) throws JAXBException, XPathExpressionException{
		Binder<Node> binder = _ctx.createBinder();
		Node queryDOMView = createBlankDOMDocument(true);  

		//Marshall Query object to a blank DOM document.
		//Binder will maintains association between two views.
		QName qname = new QName("http://ws.news.com/query", "Query");
		binder.marshal( new JAXBElement<Query>(qname, Query.class, query)  , queryDOMView);

		//Search for all occurrences of Company using XPath.
		XPath xpath = XPathFactory.newInstance().newXPath();
		xpath.setNamespaceContext( new QueryNamespaceContext());
		NodeList compList = (NodeList)xpath.evaluate("//query:Company", queryDOMView, XPathConstants.NODESET);

		//Perform decoration
		for(int i=0; i<compList.getLength(); i++){
			Node comp = compList.item(i);
			comp.setTextContent( decorate( comp.getTextContent() ));
		}

		//Synchronize the changes back to Query object.
		binder.updateJAXB(queryDOMView);

	}

	public Node createBlankDOMDocument(boolean namespaceAware) {
		DocumentBuilderFactory fact = DocumentBuilderFactory.newInstance();
		fact.setNamespaceAware(namespaceAware);
		DocumentBuilder builder;
		try {
			builder = fact.newDocumentBuilder();

		} catch (ParserConfigurationException e) {
			throw new RuntimeException(e);
		}

		return builder.newDocument();
	}

Binder maintains the association between JAXB object and its correspondent XML information set. You can bind Query object to DOM document then modify JAXB object and update the modification to the associated DOM. Or you can modify the DOM tree and then synchronize the changes back to JAXB object. This will give us the best from both worlds. It’s easy to get simple properties like Hit or TimeOut from Query object and I also have option to use low level XML manipulation like XPath to search for particular information from the whole Query object graph.

read more

Related Posts

Share This

I don’t know what exactly web services are

May 11, 2010 by

Back in my days at university, if anybody asked me or my friends “what is the difference between web application and web services” then the answers would be exactly the same; “web application is human-to-machine interaction but web services are machine-to-machine interaction”. I don’t recall how I and my friends got to remember the answers but we all had the perception that it was the most preferable answer. At the time, web services were very new. Tools and concept related to the topic were not matured yet especially for Java world which was quite late for the web services train. My self-study involved a lot of reading, trying to guess what exactly web services were. The perception of web services in my mind was that it was something very advanced involving many technologies working together to overcome all those problems that had been troubling distributed computing for a long time.

My professor once gave an assignment to the class to think of use-cases that could be implemented as web services. I recalled having an argument with one of my friend about this assignment. He suggested that a simple thing like providing basic information to client could be implemented as web services. I didn’t recall what exactly he suggested but it was so simple and contrasted strongly with my perception of web services at the time. We had discussion and I kept telling him his idea was not web service. He finally asked me “what are web services then” and I didn’t know what to answer.

One characteristic associated with web services at its early state was the capability of performing dynamic integration with other systems. A service provider could look up for a potential collaborator from UDDI server and performed some kind of binding to integrate the functionality of the collaborator into the service provider. A book portal web services might look up for book store web services at runtime to find the cheapest price for a particular book order. This just-in-time integration was the reason that I hesitated to accept that the idea of my friend could be called web services. I felt like the idea was just a simple request/reply scenario. It didn’t feature the great web services vision at all.

Years have passed. The technologies around web services have become more matured.
I am now less interested in finding the definition of web services and more interested in learning what each web services technology actually does. I am well familiar with WSDL, SOAP, JAX-WS JAXB and the other java web services specifications. Having something that I can actually play with gives me more confident feeling on what web services are about. I still don’t have one universal definition for web services and don’t think I will ever able to come up with one. But I can tell you with confidence what a particular web services technology does and what it doesn’t do.

Looking back when I started learning web services, I wonder if the vision of just-in-time integration has ever been realized at all. How is it possible for two parties with no common understanding to dynamically integrate with each other to form business services? Let’s say I am writing a book portal program. The program dynamically looks up for a collaborator and gets back BarnesAndNoble.wsdl. How can the program automatically know what operation defined in the WSDL file should be called to get current price of a book? If the operation accept request message as an xml element with type string, should the program fill in the name of the book or the ISBN? It seems like this kind of integration is impossible without a shared understanding or predefined protocol between two parties. It is strange that these questions had never entered my head when I was still in my university at all. I just thought that this thing could be done. I just needed to learn more to know how it worked.

I have recently come across an interesting article; Web Services: It’s So Crazy, It Just Might Not Work. The author has stated a very interesting point saying that “Parseable != Interoperable”. The fact that web services use XML as data format doesn’t automatically make all web services interoperable. XML may be an open standard which can be parsed in any platform but web services are not only required to be able to parse XML messages, it also need to understand the meaning of the messages as well.

“At best, XML makes it possible for businesses or developer groups to share data, provided they agree on the semantics of that data in advance. This is not to say XML is not an enormous advance. It plainly is. However, its advance lies in aiding data interoperability where shared semantics can be assumed. It does nothing at all to create semantic interoperability.”

What I believe web services are about
I may not know what exactly web services are but I can tell you what I believe web services are about

System Integration: We expose our system as web services because we want other systems on whatever platform to integrate with our system with less pain. The ultimate goal is easy integration and not anything else so you should keep you product manager from going out to claim this new technology is going to make your product run faster.

Machine-to-machine interaction: what is the reason of using web services to implement a simple use-case like searching books based on certain simple criteria, given the fact that the use-case can be done by a normal web application.

You may implement the book searching system as a web application which users can access via web browser. A user may see a text field with “Criteria” label in front of it, he type it some criteria and click submit button to get search result. This communication is between human and your web application. But if you choose to expose your application as a web services then you expect users to write a piece of code to connect to your system and call the search service programmatically. A user may be building a book price comparing system which connects to various book providers.

A lot of XML: If you are good at web services development then you are definitely good at XML programming.

read more

Related Posts

Share This

Deadlock in Real World

May 4, 2010 by

Last year, I was assigned to handle some defects that would occur only when the system was being in high load. Those defects were apparently the result of some concurrent problems both in third party API and my own module. The characteristic of the problems is quite interesting because it features a couple well known problems that have already been described in many programming articles. What I like the most about these problems is that it evolved from a simple exception to a serious deadlock by just adding a few line of code. I will log about the problems here hoping that it could be any useful for other developers.

Again, like all of my programming posts, I can’t show you the real production code so I will show you an example that will simulate the problems. The usage scenarios and the nature of the problem will be just the same.

Let’s say I have an API for sending/receiving message over network. This API has been designed to be asynchronous by nature. Users can send a message to subscribe for a type of data and gets update of the data from server until users decide to unsubscribe.

public class AsyncTransport {

    private final Vector subscription = new Vector();
    private final MockUpdateMsgGenerator idGen = new MockUpdateMsgGenerator();
    private volatile DispatchingThread disp;
    private volatile boolean stop;

    public void connect(String host, String port) {
        //Mock implementation don't actually connect to any thing.
        disp = new DispatchingThread();
        disp.start();
    }

    public Subscription subScribe(Message request, AsyncMsgListener listener) {
        String subID = idGen.genIDforMessage(request);
        Subscription sub = new Subscription(subID, listener);

        subscription.add(sub);
        sendMessage(subID, request);

        return sub;
    }

    private void sendMessage(String subID, Message request) {
        //Mock implementation, just add the subID to MockUpdateMsgGenerator
        //so it can generate mock update for this subscription.
        idGen.addActiveID(subID);
    }

    public Update readNextUpdateFromNetwork() throws InterruptedException{
        //Read the next mock update
        return idGen.genMockUpdateFromSubIDList();
    }

    public void unsubScribe(Subscription sub) {
        subscription.remove(sub);

        //Tell MockUpdateMsgGenerator not to generate mock update for this id.
        idGen.removeActiveID( sub.getSubscriptionID() );
    }

    public void stop() {
        stop = true;
        this.disp.interrupt();
    }

    class DispatchingThread extends Thread {

        public DispatchingThread() {
            super("AsyncTransport Dispatching Thread");
        }

        public void run() {
            try {
                while (!stop) {
                    Update update = idGen.genMockUpdateFromSubIDList();
                    for (int i = 0; i < subscription.size(); i++) {
                        ((Subscription) subscription.get(i)).notifyIfSubscribeFor(update);
                    }
                }
            } catch (InterruptedException ex) {
                //Thread stop
            }
        }
    }//DispatchingThread
}

class Update {

    private final String ID;
    private final Message msg;

    public Update(String ID, Message msg) {
        this.ID = ID;
        this.msg = msg;
    }

    public String getSubscribtionID() {
        return ID;
    }

    public Message getMessage() {
        return msg;
    }
} 

public class Subscription{
    public final String ID;
    public final AsyncMsgListener listener;

    public Subscription(String ID, AsyncMsgListener listener) {
        this.ID = ID;
        this.listener = listener;
    }

    public String getSubscriptionID(){
        return this.ID;
    }

    public void notifyIfSubscribeFor(Update update){
        if( update.getSubscribtionID().equals(ID) ){
            listener.onMsg( update.getMessage() );
        }
    }
}

public interface AsyncMsgListener {
    public void onMsg(Message msg);
}

public class Message {
    private final String data;

    public Message(String data){
        this.data = data;
    }

    public String getData(){
        return this.data;
    }
}

I have filtered out unnecessary complexity by making the AsyncTransport a standalone class. It doesn’t really connect to anything. The class MockUpdateMsgGenerator will generate mock update for us.

Once the transport connects to server, the DispatchingThread will be started to keep listening to socket connection to see if there is any update from server. When an update arrives, the thread will iterate over subscription list and ask each subscription to fire a notification if the update contains ID match the ID of the subscription. The data structure used to store all subscriptions is Vector. Since all methods in Vector are synchronized (It’s a well known fact and stated in API doc), the designer of this API might think that it’s safe to access this subscription list concurrently. Users can subscribe/unsubscribe while the dispatching thread is iterating over the list.

                while (!stop) {
                    Update update = idGen.genMockUpdateFromSubIDList();
                    for (int i = 0; i < subscription.size(); i++) {
                        ((Subscription) subscription.get(i)).notifyIfSubscribeFor(update);
                    }
                }
 

Here is the code to test the API.

public class ASyncTester {
    public static void main(String[] args) throws InterruptedException {
        AsyncTransport transport = new AsyncTransport();
        transport.connect("localhost", "9999");

        int threadCount = 20;
        int taskSize = 100;
        ExecutorService exec = Executors.newFixedThreadPool(threadCount);
        for (int i = 0; i < taskSize; i++) {
            exec.submit(new SubscribtionCallable(transport));
        }

        exec.shutdown();
        exec.awaitTermination(1, TimeUnit.DAYS);

        transport.stop();
        System.out.println("Done");
    }
}

class SubscribtionCallable implements Callable<Void> {

    private final AsyncTransport transport;

    public SubscribtionCallable(AsyncTransport transport) {
        this.transport = transport;
    }

    public Void call() throws Exception {
        Subscription sub = transport.subScribe(new Message("ASYNC request"), new AsyncMsgListener() {
            public void onMsg(Message msg) {
                System.out.println(msg.getData());
            }
        });

        //wait to get some updates
        Thread.sleep(Math.abs(new Random().nextLong() % 100));
        transport.unsubScribe(sub);
        return null;
    }
}

We have 20 threads sharing 100 Callable tasks. All the tasks are doing the same series of actions; subscribe for data, wait for some update and then unsubscribe.

Check Then Act

The most annoying thing about concurrent problems is that they are hard to be reproduced. A kind of concurrent problem may occur only when the execution of various threads is in a certain order and we can’t just go inside JVM and arrange that order to happen. So, it’s possible that the configuration that works on my machine may not produce the same result on your machine. The configuration in this place I mean the conditions that affect concurrent execution of the program; number of running thread, task. You may want to try running the program with various configurations to see that one may make the program run just fine but one may result in an exception.

For my machine, the configuration shown in the code snippet above will make the Main class run just fine. But when I increase thread count to 50 to run 500 tasks, I will get an exception.

Exception in thread "AsyncTransport Dispatching Thread" java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 49
at java.util.Vector.get(Vector.java:694)
at deadlock.transport.AsyncTransport$DispatchingThread.run(AsyncTransport.java:62)

This is the first problem I am going to show you in this post. Although it’s not possible to corrupt the internal states of Vector by concurrently calling get(), set() or size() method, you still need to think carefully about what actually comprise your mutual exclusive section; the area of code that need to be executed without any intervention from any other threads.

In our example, the dispatching loop need both subscription.size() and subscription.get() to run in mutual exclusive section. This scenario has been known as check-then-act scenario. The dispatching thread will check if the current index is less than the vector size, if it’s true then the element at that index will be retrieved. The steps that lead to the exception are shown below.

  1. The current index is 49, the current size of vector is 50.
  2. The index has been compared to the size of the vector. Since 49 is less than 50 then the body of for loop is executed.
  3. Another thread call unsubscribe() result in removing an element of the vector. Now the vector’s size is 49. The possible index for retrieving element from this vector is now 0 – 48.
  4. The body of for loop gets to be executed and the expression trying to get the element at index 49 result in java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 49

The check-then-act problem above can be addressed in many ways. The easiest way may be using synchronized block to create a critical section.

                    synchronized(subscription){
                        for(int i=0; i<subscription.size(); i++){
                            ( (Subscription)subscription.get(i) ).notifyIfSubscribeFor( update );
                        }
                    }

I am using the vector object itself as the lock of the synchronized block so all methods call of the vector instance will not be able to run if the code in our critical section is running. The state of the vector will not be changed from other threads during the execution of the synchronized block. That means, in our dispatching loop, if subscription.size() return 50 then it’s guaranteed that there will be element with index 49 in the vector by the time subscription.get(i) get to be executed sine it’s not possible for any thread to remove an element from the vector while the dispatching thread is in the critical section.

Unfortunately, this simple fix will lead to another problem. I need to introduce you a new usage scenario of the AsyncTransport so I can demonstrate the consequence problems of the fix above.

Synchronous Requesting Model

Let’s say there are some messages in my program that is not suit to asynchronous call. They are just simple request/reply calls. I will create an API to act as an adapter to make simple synchronous call over asynchronous API.

public class SyncTransport {

    private final AsyncTransport transport;

    public SyncTransport(AsyncTransport transport) {
        this.transport = transport;
    }

    public Message request(Message request, long timeout) throws InterruptedException {
        SyncRequestor syncAdapter = new SyncRequestor(transport);
        return syncAdapter.makeRequest(request, timeout);
    }
}

class SyncRequestor implements AsyncMsgListener {

    private final AsyncTransport transport;
    private volatile Message msg;

    public SyncRequestor(AsyncTransport transport) {
        this.transport = transport;
    }

    public void onMsg(Message msg) {
        this.msg = msg;

        synchronized (this) {
            notify();
        }
    }

    public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);
        synchronized (this) {
	wait(timeout);
            transport.unsubScribe( sub );
        }

        return this.msg;
    }
}

The request() method delegates the execution to SyncRequestor class. The class will subscribe for data and wait for the first update. The execution flow will be block here. Since it is synchronous call, SyncRequestor knows that there will be only one update from server. The update is synchronous reply. Once the first message update arrives, the requestor stores it and calls notify() to tell the waiting thread that the reply is ready. The requesting thread (the thread that call request() method ) then perform unsubscribing and return the stored reply message.

At this point, I have introduced you both synchronous and asynchronous requesting model in my example program. Let’s look at the code in MockUpdateMsgGenerator. The class is able to generate update for both requesting model. String data in Message will be used to identify the requesting model of the message.

public class MockUpdateMsgGenerator{
    public static final String SYNC_MSG_PREFIX = "SYNC";
    public static final String ASYNC_MSG_PREFIX = "ASYNC";

    private final List activeSubId = new LinkedList();
    private final List activeSyncSubId = new LinkedList();

    public String genIDforMessage(Message msg){
        String prefix =  msg.getData().startsWith(SYNC_MSG_PREFIX)? SYNC_MSG_PREFIX : ASYNC_MSG_PREFIX;
        return prefix  + ":" + randomInt();
    }

    public synchronized void addActiveID(String id){
        if(id.startsWith( SYNC_MSG_PREFIX ) ){
            activeSyncSubId.add(id);

        }else{
            activeSubId.add(id);
        }

        notify();
    }

    public synchronized void removeActiveID(String id){
        activeSyncSubId.remove(id);
        activeSubId.remove(id);
    }

    public synchronized Update genMockUpdateFromSubIDList() throws InterruptedException{
        while( activeSubId.size() == 0 && activeSyncSubId.size() == 0){
            wait();
        }

        String id;
        if( activeSyncSubId.size() != 0 ){
            //Synchronous request should get response quickly so send update for
            //this requesing model first.
            id =  (String)activeSyncSubId.remove(0);

        }else{
            //Randomly generates update for subscriber.
            int randomIndex =  randomInt() % activeSubId.size();
            id = (String)activeSubId.get(randomIndex);
        }

        return new Update(id , new Message(id + " : DummyData" ) );
    }

    public int randomInt(){
        return Math.abs(new Random().nextInt() );
    }
}

Now we are ready to run our new example.

public class SyncTester {
    public static void main(String[] args) throws InterruptedException {
        AsyncTransport transport = new AsyncTransport();
        transport.connect("localhost", "9999");

        final SyncTransport syncTransport = new SyncTransport(transport);

        ExecutorService exec = Executors.newFixedThreadPool(20);
        for (int i = 0; i < 200; i++) {
            exec.submit(new Callable<Void>() {

                public Void call() throws Exception {
                    Message resp = syncTransport.request(new Message("SYNC msg"), 1000);
                    if(resp != null){
                        System.out.println(resp.getData());
                    }else{
                        System.out.println("Timeout");
                    }

                    return null;
                }
            });
        }

        exec.shutdown();
        exec.awaitTermination(1, TimeUnit.DAYS);

        transport.stop();
        System.out.println("Done");
    }
}

Wait Leak

When I call request() with timeout 1000 milliseconds, I will get all reply messages and the program terminate properly. This seems to indicate that our MockUpdateMsgGenerator is working quite efficiently. But a strange thing happens when I specify 0 as timeout to wait forever until the reply arrives. The program prints out some reply messages and then freeze. The program doesn’t seem to proceed anytime soon. I may need to know what exactly is going on in the program. What are the parts of the program that being executed? What are the blocking threads and what there are blocking on? The jstack command is the tool I need.

Please note that my real production code is Linux based. I am just simulating the problem using the example program on my window laptop.

C:\Documents and Settings\ThinkPad>jps
3600 Jps
688 SyncTester
2668

C:\Documents and Settings\ThinkPad>jstack 688
2010-05-03 15:49:15
Full thread dump Java HotSpot(TM) Client VM (11.3-b02 mixed mode, sharing):

"pool-1-thread-5" prio=6 tid=0x02b04c00 nid=0xa48 in Object.wait() [0x02fbf000..0x02fbfb14]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x229f0020> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:53)
- locked <0x229f0020> (a deadlock.transport.sync.SyncRequestor)

at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-4" prio=6 tid=0x02b03800 nid=0xeec in Object.wait() [0x02f6f000..0x02f6fc14]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x229e2d70> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:53)
- locked <0x229e2d70> (a deadlock.transport.sync.SyncRequestor)

at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Actually, all 5 threads in our thread pool are blocked but I cut out just 2 threads to show here since all threads are blocked in the same execution path. We can see in the stack trace that thread “pool-1-thread-4” is waiting on an instance of SyncRequestor. The program is freeze because all working threads are blocked waiting for notification of the readiness of reply messages. The question is why the threads are still waiting. It’s unlikely that some reply messages are not make it to the onMsg() call back method since this same test with timeout 1000 milliseconds is working properly.

Actually, the notifications have been fired but the requesting threads don’t catch it in time. There are two threads working concurrently. One is the update dispatching thread, another is the requesting thread. It is possible that once the requesting thread finish executing transport.subScribe() method, the subscription has been processed in a very short time and the update dispatching thread set the reply message to SyncRequestor and call notify() before the requesting thread get a chance to all wait().

This problem is called wait-leak describing the situation that notification has been made before the target audience start waiting for it. The code snippet below is our program with slightly modification to show that some notification has already fired before the requesting thread call wait() method.

public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);

        synchronized (this) {
            if(msg != null){
                System.out.println("Reply already arrived");
            }

            wait(timeout);
            transport.unsubScribe(sub);
        }

        return this.msg;
    }

You may try running the program with the modification above to see that if there are 3 line of “Reply already arrived” then jstack tool will report that there are 3 threads still waiting for notification.

The wait leak problem is one of the reason that it’s a best practice to call wait() in condition checking loop. Another reason we should use the practice is that there is something called spurious wake up that can make things go wrong. Below is the SyncRequestor with the checking whether the reply has already arrived.

public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);

        synchronized (this) {
            if(timeout > 0){
                waitWithTimeOut(timeout);
            }else{
                waitForever();
            }

            transport.unsubScribe(sub);
        }

        return this.msg;
    }

    private void waitWithTimeOut(long timeout) throws InterruptedException {
        long timeoutLimit = System.currentTimeMillis() + timeout;
        while (msg == null) {
            long remainingTimeout = timeoutLimit - System.currentTimeMillis();
            if (remainingTimeout > 0) {
                wait(remainingTimeout);
            } else {
                //timeout
                break;
            }
        }
    }

    private void waitForever() throws InterruptedException{
        while (msg == null) {
            wait();
        }
    }

Deadlock

Let’s get back to our original issue; the consequence of our check-then-act fix. Now we have the check-then-act and the wait-leak problem fixed. Should SyncTransport run without any problem?

Well, with a small thread pool size like 10 and a realistic timeout like 1 or 2 second then you may run it days and nights without any problem. Concurrent problems can be very hard to be detected because the problematic code can run just fine for years if the conditions are right. And just when our system goes under an unusual circumstance, the problems show up in the worst possible time.

I’ve found that, on my laptop, the program will freeze if I run it with thread pool size of 50 and 500 millisecond timeout. To find out what went wrong, jstack is our best friend again. I don’t need to spend much time figuring out what happen. Jstack shows the problem at the bottom of its report.

Found one Java-level deadlock:
=============================
"pool-1-thread-50":
waiting to lock monitor 0x02b1a404 (object 0x22a80330, a java.util.Vector),
which is held by "AsyncTransport Dispatching Thread"
"AsyncTransport Dispatching Thread":
waiting to lock monitor 0x02a83954 (object 0x22a81670, a deadlock.transport.sync.SyncRequestor),
which is held by "pool-1-thread-17"
"pool-1-thread-17":
waiting to lock monitor 0x02b1a404 (object 0x22a80330, a java.util.Vector),
which is held by "AsyncTransport Dispatching Thread"

Java stack information for the threads listed above:
===================================================
"pool-1-thread-50":
at java.util.Vector.add(Vector.java:727)
- waiting to lock <0x22a80330> (a java.util.Vector)
at deadlock.transport.AsyncTransport.subScribe(AsyncTransport.java:22)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:44)
at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
"AsyncTransport Dispatching Thread":
at deadlock.transport.sync.SyncRequestor.onMsg(SyncTransport.java:39)
- waiting to lock <0x22a81670> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.Subscription.notifyIfSubscribeFor(Subscription.java:23)
at deadlock.transport.AsyncTransport$DispatchingThread.run(AsyncTransport.java:67)
- locked <0x22a80330> (a java.util.Vector)
"pool-1-thread-17":
at java.util.Vector.removeElement(Vector.java:593)
- waiting to lock <0x22a80330> (a java.util.Vector)
at java.util.Vector.remove(Vector.java:745)
at deadlock.transport.AsyncTransport.unsubScribe(AsyncTransport.java:40)
at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:58)
- locked <0x22a81670> (a deadlock.transport.sync.SyncRequestor)
at deadlock.transport.sync.SyncTransport.request(SyncTransport.java:22)
at deadlock.Main2$1.call(Main2.java:23)
at deadlock.Main2$1.call(Main2.java:20)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Found 1 deadlock.

If you are good at reading stack trace then the report is straight forward. The execution order that cause deadlock is shown below.

  1. The requesting thread pool-1-thread-17 call SyncRequestor.makeRequest() and grabs lock in an instance of SyncTransport to wait for reply notification.

    at deadlock.transport.sync.SyncRequestor.makeRequest(SyncTransport.java:58)
    - locked <0x22a81670> (a deadlock.transport.sync.SyncRequestor)

            synchronized (this) {
                if(timeout > 0){
                    waitWithTimeOut(timeout);
                }else{
                    waitForever();
                }
    
                transport.unsubScribe(sub);
            }
    
  2. The timeout is too short. The requesting thread wakes up and proceeds to perform unsubscribing.
    at deadlock.transport.AsyncTransport.unsubScribe(AsyncTransport.java:40)

  3. The unsubscribing process involves removing subscription from subscription vector. Jstack shows that the requesting thread is blocked waiting to grab the lock of the vector.

    at java.util.Vector.removeElement(Vector.java:593)
    - waiting to lock <0x22a80330> (a java.util.Vector)
    at java.util.Vector.remove(Vector.java:745)

    It turns out that a thread has already grabs the lock of the vector. The requesting thread has to wait for that thread to release the lock of the vector before it can proceed.

  4. At this moment, the “AsyncTransport Dispatching Thread” which runs concurrently with the requesting thread is executing the update dispatching loop. The dispatching thread grabs the lock of subscription vector to iterate over all subscriptions.


    at deadlock.transport.AsyncTransport$DispatchingThread.run(AsyncTransport.java:67)
    - locked <0x22a80330> (a java.util.Vector)

          synchronized(subscription){
                for(int i=0; i<subscription.size(); i++){
                     ( (Subscription)subscription.get(i) ).notifyIfSubscribeFor( update );
                }
           }
    

    It’s the update dispatching thread that holds the lock of subscription vector which make the requesting thread cannot proceed.

  5. It’s a very bad coincidence that the update dispatching thread is calling onMsg() method on the to-be-unsubscribed subscription. The thread set reply message then tries to grab the lock of target SyncRequestor instance to be able to call notify().


    at deadlock.transport.sync.SyncRequestor.onMsg(SyncTransport.java:39)
    - waiting to lock <0x22a81670> (a deadlock.transport.sync.SyncRequestor)
    at deadlock.transport.Subscription.notifyIfSubscribeFor(Subscription.java:23)

           public void onMsg(Message msg) {
               this.msg = msg;
    
               synchronized (this) {
                   notify();
                }
           }
           
  6. The update dispatching thread couldn’t grab the lock of the SyncRequestor instance because, in bullet 1, pool-1-thread-17 hasn’t release it yet. Now you can see that this is a deadlock. The requesting thread is holding the lock of SyncRequestor instance and wait for the lock of subscription vector, the update dispatching thread is holding the lock of the vector and waiting for the lock of the SyncRequestor instance.

    Jstack has summed it up quite nice.

    "AsyncTransport Dispatching Thread":
    waiting to lock monitor 0x02a83954 (object 0x22a81670, a deadlock.transport.sync.SyncRequestor),
    which is held by "pool-1-thread-17"

    "pool-1-thread-17":
    waiting to lock monitor 0x02b1a404 (object 0x22a80330, a java.util.Vector),
    which is held by "AsyncTransport Dispatching Thread"

Again, there are many solutions for this problem. I choose to fix it by just move the unsubscribing method out of the synchronized block. The method doesn’t need to be executed in the synchronized block at all. It’s always a good idea to check if your synchronized block is too big and remove expressions that don’t really need to be in that critical section.

public Message makeRequest(Message request, long timeout) throws InterruptedException {
        Subscription sub = transport.subScribe(request, this);

        synchronized (this) {
            if(timeout > 0){
                waitWithTimeOut(timeout);
            }else{
                waitForever();
            }
        }

        transport.unsubScribe(sub);
        return this.msg;
}

Now, the requesting thread will release the lock of SyncRequestor instance before perform unsubscribing. This will break the circle lock acquisition and prevent deadlock in the program.

read more

Related Posts

Share This