Long running Java process resource consumption monitoring , leak detection and GC tuning


For easily monitoring the JVM metrics there is no better tool than Java VisualVM or its older counterpart JConsole; These two tools comes with the Java JDK. So it is absolutely necessary that you copy a JDK version to some temp directory in which your server runs. {Make sure that you do not install the JDK / put the java executable in the path.}

Before going in JVM monitoring, it is essential to understand a little about Java's memory model.

Note - This article is written with  HotSpot JVM in mind,which is one of the most commonly used one ; implemented by OpenJDK and Oracle - formerly from Sun. This is not the only one , notable being JRockit and others; and some details would change with a different JVM.

When you are running a Java Process in a Server you allocate the Memory for the Java Virtual Machine. This memory is split to process heap { what you give withe the Xmx Xms command}where it is again subdivided on generational basis (Young, Old)  and the PermGen - which is gradually getting removed in Java 7 . Assume that you have not specified the stack size  here.


Have a look at the common parameters usually given below for a JVM running in the server mode,and find the relevance of each. Here you see that the Java heap is given as 31g +  Perm as 1g . So total JVM memory given is 32 g.


java -D[Server:testserver] -XX:PermSize=1024m -XX:MaxPermSize=1024m -Xms31232m -Xmx31232m -server -Xloggc:/opt/gclogs/testserver_gc.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:+PrintGCCause -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintClassHistogramAfterFullGC -XX:G1HeapRegionSize=32m -XX:HeapDumpPath=/opt/testserver/heapdump -XX:+HeapDumpOnOutOfMemoryError -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -XX:StringTableSize=3000000 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintAdaptiveSizePolicy -XX:+ParallelRefProcEnabled -XX:GCHeapFreeLimit=15 -XX:GCTimeLimit=80 -Djavax.net.ssl.keyStore=/opt/certs/keystore.jks -Djava.net.preferIPv4Stack=true  -Djavax.net.ssl.trustStore=/opt/certs/trustore.jks -Djava.io.tmpdir=/opt/process/tmp -Djava.util.prefs.systemRoot=/opt/oodee/jboss-eap/.java/.prefs

Output of top for the above

PID   USER  PR  NI  VIRT  RES  SHR  S %CPU %MEM    TIME+  COMMAND
13168 root  20   0  41.0g  33g  65m S 97.7 78.5   8124:53 java

The JVM itself needs memory for managing the Java process, threads, garbage collection etc. So in the top command for the above mentioned Java process you can see the Resident memory as 33g and virtual memory as 41 g.  You need not be overly concerned about the Virtual memory. Linux is good in managing virtual memory. You need to keep a check only for the resident memory.

So the JVM process is asking Linux for 41g or 8g to do its work and really using(resident memory) about 33g currently.

The resident memory is the RAM occupied by objects in the Heap (biggest part), JVM internal data , meta data of classes and things like Stringtable -PermGen and Thread Stacks. You cannot map the resident memory directly only to heap usage of the application.

We can dig deeper and see how much heap is used by using tools like JConsole or JVisualVM.

There are also command line utilities from the JDK by which you can get the same information, However theses do not plot the values over time, so it is hard to get the trend from these utilities. However they are very useful for JVM heap/ garbage collection introspection,

You can also use nice utilities like JvmTop to get this information.
 JvmTop 0.8.0 alpha - 11:35:39,  amd64, 12 cpus, Linux 2.6.32-35, load avg 0.46

 http://code.google.com/p/jvmtop

 PID MAIN-CLASS      HPCUR HPMAX NHCUR NHMAX    CPU     GC    VM      USERNAME   #T DL
13168 oss-modules.jar 10172m 31232m  627m        1072m  8.80%  0.00% O7U71  root  1343

Currently the process is only using 10 g of the 31g allocated heap. So the rest of the 20g is permgen + threads *stacksize + jvms internal data.

As you can see JVM is no light weight process. Once upon a time when RAM was dear this would have been an issue; not now, at least not unless you start a lot of JVMs in one node.

The same or more indepth information can be got from the JDK tools.

These are jstack- to get Stack trace -for debugging CPU consuming threads, or thread locks; jmap to get the heap dump or heap histogram  and the jstat utility which gives the generational usage, and also the live GC events as they happen in the JVM.


./jstat -gccapacity 13168 20000

 NGCMN    NGCMX     NGC    S0C   S1C       EC      OGCMN      OGCMX       OGC         OC      PGCMN    PGCMX     PGC       PC     YGC    FGC
0.0 31981568.0 19333120.0  0.0 655360.0 18677760.0    0.0 31981568.0 12648448.0 12648448.0 1048576.0 1048576.0 1048576.0 1048576.0   1745     1
     0.0 31981568.0 19333120.0    0.0 655360.0 18677760.0        0.0 31981568.0 12648448.0 12648448.0 1048576.0 1048576.0 1048576.0 1048576.0   1745     1

./jstat -gccause 13168 20000
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  0.00 100.00  30.70  67.18  58.30   1745 1513.719     1   20.213 1533.932 G1 Evacuation Pause  No GC
  0.00 100.00  32.46  67.18  58.30   1745 1513.719     1   20.213 1533.932 G1 Evacuation Pause  No GC
./jstat -gcutil 13168 20000
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00 100.00  34.56  67.18  58.30   1745 1513.719     1   20.213 1533.932

Now moving on to the common use of the tools for other purposes.For heap dumps

su  proceeuser
 /bin/jmap -dump:format=b,file=/tmp/2930javaheap.hrpof 2930(pid of process)
For dumping a big heap make sure you have that much disk space , efficient FTP and powerful machine to analyze; Else anything over maybe 12 to 16 GB is useless Eclipse MAT is a good tool to analyze the heap dump; Also if you cannot transfer ,you can run Jhat on the heap on the same node and you will be able to use the web browser to browse the results generated ; Depending on the heap size and host system resources these tools take some time to analyze

/jdk/jdk1.6.0_38/bin/jhat -J-d64  -stack false -refs false -port 9191 /tmp/45133heap.bin
/jdk/jdk1.6.0_38/bin/jhat -J-d64  -port 9191 /tmp/45133heap.bin
A lighter version of this is class histogram; With the live option it will trigger a full GC and collect the class histogram; So after all tasks are done if you take a histogram and repeat it for one or two cycles and see which are the instances which go up ; can be very helpful in identifying leaks (see below a python script which can help you compare two histograms)

/jdk/jdk1.6.0_38/bin/jmap -histo:live 60030 > /tmp/60030istolive1330.txt

Output will be of the form

 015-06-07T10:58:00.653+0300: 179686.836: [Class Histogram (after full gc):
 num     #instances         #bytes  class name
----------------------------------------------
   1:       6591148     2747828352  [C
   2:        298130      237694584  [B
   3:       6572509      157740216  java.lang.String
   4:       5097003      151094560  [Ljava.lang.Object;
   5:       3810491      121935712  java.util.HashMap$Entry
   6:        617568       89067520  
   7:        617568       79061744  


Another important tool; which will help you see the GC process ongoing in the live JRE is jstat; This also has very low overhead

jboss-as@backendnode root]$  /jdk/jdk1.6.0_38/bin/jstat -gccause 60030 (jboss server pid)  5000 (will repeat every 5 seconds)
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  0.00   0.00   9.54  15.05  46.86   4076  668.310    29  164.007  832.317 Heap Inspection Initiated GC No GC

Further; certain run time parameters can be added to the JRE to get more information; Some of these can be used in production; GC logs are one such ; Currently it is configured to be printed to opt/gclogs Do a ls –lrt and see the last modified file and tail it to see the GC logging happening Note – JDK has the Xrunhorf with options that I have been trying out in standalone clients; This prints after application stop or after cntrl break;

Any queries please do post; Here is a Python script to compare two histogram dumps;
histogramparser.py
__author__ = 'acp'

import re
import fileinput
import operator
import sys

objectschanged={}

def create_object_list(line2,mapofObjects,instance):
    container = line2.split()
    #print(container)
    if container[3] in mapofObjects:
        val=mapofObjects[container[3]]
        #print(container[3],"difference=",abs(int(container[2])-val))
        objectschanged[container[3]]=(int(container[instance])-val)
    else:
        mapofObjects[container[3]]=int(container[instance])


#Call the main function

print("A Python Script to parse the Jmap generated Histograms ")
print("Author - Alex.Punnen")
print("Usage- jmaphistoparser.py  filename1 filename2")
#print ("Where M is the difference in memory and I difference in isntance and fo")

instance=1 #for instance difference
#instance=2 #for memeory difference

mapofObjects={}
content=[]
for line in fileinput.input():
    content.append(line)

count=0
#skip lines
while len(content) >0 :
    line=content.pop()
    if(':' in  line):
        count+=1
        create_object_list(line,mapofObjects,instance)

print("------------------------Objects Changed Between Heaps-------------------------------------------")
sorted_x = sorted(objectschanged.items(), key=operator.itemgetter(1))
for(x,y) in sorted_x:
   print(x,y)


Comments

Popular posts from this blog

OpenLayers Advanced Clustering and Setting dynamic images for OpenLayer Styles via Ajax

JavaScript (Prototypal) Inheritance with new and Object.create