RenameTo and Poor Man’s File Based Cache

Jun 17, 2011 by

I have been assigned to investigate a performance problem in one of the servers in my project and I found an interesting concurrent issue that caused a noticeable drop of throughput figure in performance test. The main business logic of the server is to generate image chart for a Servlet front-end module. Let me use an oversimplified example code below to show the pattern of the problem.

class ChartServer{

    public String generateChart(ChartParameters params) throws IOException{

        String cacheFile = computeCacheFileName(params);
        if( isInterDayRequest(params) && isCacheFileExist(cacheFile) ){
            return new File(cacheDir, cacheFile).getAbsolutePath();
        }

        BufferedImage img = generateImageChart();
        File output = new File(cacheDir, cacheFile);
        synchronized( cacheLock ){
            ImageIO.write(img, "PNG", output );
        }

        return output.getAbsolutePath();
    }
}

All request parameters will be used to compute the cache key. Each cache image will be saved to file using this key as the file name. The server can just return the existing cache file for the subsequent request that contains the exact same set of parameters.

To prevent each worker thread from writing to the same file concurrently, all write to cache directory will block on cacheLock object. This block cause a big bottleneck since all thread will have to block on the same object. The question is that, in order to get rid of the global write-lock, how to make a worker thread be able to lock only the file it is going to write

I have some ideas in approaching the problem but I end up not choosing them.

- Apply the cache file name to a hash function to choose a worker thread so the write to the same file will be done sequentially in a thread.

Problem How can I have a good hash function to distribute tasks to all worker threads evenly?

- Create a lock object for each cache file and store it in HashMap. Each thread uses file name as key to get the corresponding lock (in a thread safe manner) from the HashMap and then block on the lock before writing. Now two threads writing on different files will block on different locks. All thread still block on the same HashMap object but let’s hope retrieving object from map will be very fast.

Problem Will all lock be stored in the HashMap for the whole life of process? If the system employs 16 worker threads then there will be 16 files that are being written at a time but each thread must find the lock for its target file in a HashMap that contains all lock objects in cache system.

It would be great if there is a way to fix this without too many changes in the old execution flow.

The FileChannel locking can’t be applied with this case

Since JDK 1.4, Java has provided a way to perform file locking with the use of lock() and tryLock() methods of class FileChannel. Unfortunately, those methods are meant to be used in the process-to-process level. The mechanism can’t be used to coordinate accesses between threads. The code below will result in java.nio.channels.OverlappingFileLockException when the second FileChannel try to get the lock already hold by the first FileChannel.

public static void concurrentWrite() throws IOException {
        FileChannel f1 = new FileOutputStream(new File("data.txt")).getChannel();
        FileChannel f2 = new FileOutputStream(new File("data.txt")).getChannel();

        FileLock lock1 = f1.tryLock();
        System.out.println("lock1 = " + (lock1 == null ? null : " acquired"));

        FileLock lock2 = f2.tryLock();
        System.out.println("lock2 = " + (lock2 == null ? null : " acquired"));
    }

In contrast, if I start the ConcurrentWriter thread below in 2 different processes, the file locking mechanism will work.

class ConcurrentWriter extends Thread {
    private final File output;

    public ConcurrentWriter(File output) {
        this.output = output;
    }

    @Override
    public void run() {
        try {
            writeInLoop();

        } catch (InterruptedException ex) {
            //Let this thread stop;
        } catch (IOException ex) {
            ex.printStackTrace();
        }
    }//run

    public void writeInLoop()
    throws InterruptedException, FileNotFoundException, IOException{

        while (true) {
            FileChannel fs = new FileOutputStream(output).getChannel();
            FileLock lock = tryLock(fs);
            try{
                writeData(fs);
            }finally{
                lock.release();
                fs.close();
            }

            Thread.sleep(100);
        }
    }

    public FileLock tryLock(FileChannel fs)
    throws InterruptedException, IOException {

        FileLock lock = null;
        while ((lock = fs.tryLock()) == null) {
            System.out.println(getName() + ": the file is locked, sleep then try again later");
            Thread.sleep(200);
        }
        System.out.println(getName() + ": got the file lock");
        return lock;
    }

    public void writeData(FileChannel fs)
    throws IOException, InterruptedException {
        //write data
    }
}

The output of a writer thread is shown below.

W1: got the file lock
W1: got the file lock
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: got the file lock
W1: the file is locked, sleep then try again later
W1: the file is locked, sleep then try again later
W1: got the file lock
W1: got the file lock

Save by renaming

I came across this approach when I was searching about file locking. Each worker thread can just write to a temporary file with unique file name; may be the real target file name appended with worker thread’s id. Now each thread is exclusively writing to a temporary file. The temporary file will be renamed to be the real target file after it has been successfully written. The only step that is in high contention is the renaming operation. The approach will wok only if the renaming operation is atomic. Although I found various developers claiming that this operation is atomic in most file system, I haven’t found the good solid reference to support the claim yet. I have tested it on my performance environment with 16 cores machine and I didn’t see any corrupt data or IO exception. All responses contained the correct content size and the content in target cache file was updated so I am quite confident this technique practically works.

With this technique, I can get rid of the global write-lock which causes the contented synchronization point and improve the throughput figure dramatically. The technique may be implemented as the code shown below.

public class FileUtil {
    private FileUtil(){}

    public static boolean saveByRename(BufferedImage img, String format, File destination)
    throws FileNotFoundException, IOException{
        String tmpFileName = destination.getName() + getUniqueSuffixPerThread();
        File tmpFile = new File( destination.getParent(),tmpFileName);

        FileOutputStream out = new FileOutputStream(tmpFile);
        try{
            ImageIO.write(img, format, tmpFile);

        }finally{
            out.close();
        }

        return tmpFile.renameTo(destination);
    }

    private static String getUniqueSuffixPerThread(){
        return Thread.currentThread().getId() + ".tmp";
    }
}

Note: On window, a file can’t be renamed to replace an existing file. There will be no error but the renameTo() method just return false to indicate fail attempt.

What I personally like the most about this approach is that I doesn’t require much change. I don’t need to change the execution flow and it doesn’t involve complicated logic which may require a lot of test cases.

Related Posts

Share This

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>