REMIX - Online Detection and Repair of Cache Contention for the JVM Accepted PLDI 2016 IntroductionAbstractAs ever more computation shifts onto multicore architectures, it is increasingly critical to find effective ways of dealing with multithreaded performance bugs like true and false sharing. Previous approaches to fixing false sharing in unmanaged languages have had to resort to highly-invasive runtime program modification. We observe that managed language runtimes, with garbage collection and JIT code compilation, present unique opportunities to repair such bugs directly, mirroring the techniques used in manual repairs. We present REMIX, a modified version of the Oracle HotSpot8 JVM which can detect cache contention bugs and repair false sharing at runtime. REMIX's detection mechanism leverages recent performance counter improvements on Intel(R) platforms (PEBS), which allow for precise, unobtrusive monitoring of cache contention at the hardware level. REMIX can detect and repair known false sharing issues in the LMAX Disruptor high-performance inter-thread messaging library and the Spring Reactor event-processing framework, automatically providing 1.5-2x speedups over unoptimized code and matching the performance of hand-optimization. REMIX also finds a new false sharing bug in SPECjvm2008, and uncovers a true sharing bug in the HotSpot JVM that, when fixed, improves the performance of three NAS Parallel Benchmarks by 7-25x. REMIX incurs no statistically-significant performance overhead on other benchmarks that do not exhibit cache contention, making REMIX practical for always-on use. RequirementsREMIX Requirements:
REMIX was tested on single- and dual-socket 8-core Haswell (model 63) processors, but should run on any Intel CPU with model numbers 62, 63, 69, 70,71, 78, 86, 94 (Models of the aforementioned processor families available at the time of writing).
To check the CPU model, run awk '/model\s*:/{print $3}' /proc/cpuinfo
You can verify your kernel version and architecture with PaperAccepted version of the paper: remix-final.pdf. TutorialInstallationREMIX source code can be found at https://github.com/upenn-acg/REMIX (GPLv2 License). Precompiled binaries available at remix-0.91-x64-jdk.tar.xz. tar xvfJ remix-0.91-x64-jdk.tar.xz REMIX is executed like regular java, with the addition of the -remix command line parameter. remix/jdk/bin/java -remix ... or, if added to the PATH: export PATH=~/path/to/remix/jdk/bin:$PATH java -remix ... To compile from sources, download the source version, and then: git clone https://github.com/upenn-acg/REMIX.git cd REMIX/jvm-remix/ mkdir build cd build ../openjdk/configure make More information on compiling JDK8 can be found at http://hg.openjdk.java.net/build-infra/jdk8/raw-file/tip/README-builds.html. Simple ExampleConsider the following java file, BaseTest.java: public final class BaseTest implements Runnable { public final static class VolatileLong { public volatile long value = 0L; } public final static int NUM_THREADS = 8; public final static long ITERATIONS = 2000L * 1000L * 1000L; private final int arrayIndex; private static VolatileLong[] longs = new VolatileLong[NUM_THREADS]; static { for (int i = 0; i < longs.length; i++) { longs[i] = new VolatileLong(); } } public BaseTest(final int arrayIndex) { this.arrayIndex = arrayIndex; } public static void main(final String[] args) throws Exception { final long start = System.nanoTime(); runTest(); System.out.println("BaseTest Time " + (System.nanoTime() - start) + "ns"); } private static void runTest() throws InterruptedException { Thread[] threads = new Thread[NUM_THREADS]; for (int i = 0; i < threads.length; i++) threads[i] = new Thread(new BaseTest(i)); for (Thread t : threads) t.start(); for (Thread t : threads) t.join(); } public void run() { long i = ITERATIONS + 1; VolatileLong vl = longs[arrayIndex]; long sum = 0; while (0 != --i) { vl.value = i; sum += vl.value; } } } First we run without REMIX: >> ~/remix/jdk/bin/java BaseTest BaseTest Time 366717292370ns The test took 366.7 seconds. Now let's compare With REMIX: >> ~/remix/jdk/bin/java -remix -XX:+REMIXVerbose BaseTest REMIX: SamplingRatio is 1:10000. REMIX: System tests passed. REMIX: relayout phase starting. REMIX: Considering padding for this=0x7f82d4000e40 klass=LBaseTest$VolatileLong; (direct=1, old=24 0x800060230, backup=0x800060600, parent=(nil)) REMIX: relayout phase done. BaseTest Time 17621340141ns REMIX: Transformed 1 klasses, moved 8 instances. We specified >> ~/remix/jdk/bin/java -remix:detect -XX:REMIXSamplingRatio=10000000 BaseTest BaseTest Time 297897616371ns REMIX: Processed 1 classes: REMIX: Hits for class 'BaseTest$VolatileLong': REMIX: - Hit BaseTest$VolatileLong::value[16]: 1008:0 REMIX: Transformed 0 klasses, moved 0 instances. We can see that the field VolatileLong.value, at offset 16, was the source of 1008 HITM events. Multiplying this latter value by the sampling rate, we see that the hardware generated approximately a billion events during the program's execution. We can compare REMIX's performance to that of manual padding, by padding the VolatileLong class with 7 more longs: public final static class VolatileLong { public volatile long p0, p1, p2, p3, p4, p5, p6; public volatile long value = 0L; } Running this version without REMIX results in a total runtime of 14.6 seconds. The three second delay is mostly the delay before REMIX identifies the false sharing. REMIX OptionsREMIX supports all standard Java8 options, but online repair requires the use of a stop-the-world GC such as the ParNewGC. REMIX java accepts the following additional options: java -remix[:online|:detect] - Enable REMIX support for the JVM, with online repair or detection only. Omitting the coloin defaults to online repair -XX:+REMIXVerbose - Enable verbose reporting. -XX:+REMIXHitVerbose - Enable verbose HITM event reporting. -XX:REMIXSamplingRatio=<ratio> - Control REMIX sampling ratio. Default value is 10000, e.g. only one event out of 10,000 is sampled. -XX:+REMIXDebug - Enable debug logging. REMIX changes the default GC to the ParNewGC. REMIX can also be used in conjunction with the serial GC: ~/remix/jdk/bin/java -XX:REMIXLevel=3 -XX:+UseSerialGC ... instead of specifying Compiling a custom kernel with PERF supportREMIX requires the following settings to be enabled in the kernel CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_CGROUP_PERF=y CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_HAVE_PERF_REGS=y You can check if your current kernel supports these options by looking for them in We supply precompiled Linux 4.4.1 binaries for Ubuntu 15.04 : linux-image-4.4.1_4.4.1-10.00.Custom_amd64.deb, /linux-headers-4.4.1_4.4.1-10.00.Custom_amd64.deb. |