A few simple notes about the Java Heap.

Version : 0.3
Date : 15/01/2012
By : Albert van der Sel
For who : For anyone who whishes a quick and simple answer on this subject



1. Introduction:

This very short note tries to explain some essentials of the "Java Heap", of the Java Virtual Machine (JVM).

Actually, there are 2 main memory area's under control (or usable) of the JVM, which are the "Java heap",
and the "native heap". Both are used for different purposes.
Most people just speak of "The Heap", which ofcourse refers to the "java heap".
As it is, the "native" heap (or system heap) is created by the Operating System, and the Java heap
(after the JVM has initialized), will be part of the native heap.

In short:
  • The Java heap will hold all running java objects. Indeed, this area is our primary concern and object for study.
  • The native heap (which holds the java heap), still needs some reserved memory, for example for JIT
    (just in time compiling) and thread management.
As you probably can guess, the native heap is not the main "concern", although in some situations, for example,
with Java applications that contain many JIT-compiled methods, we should not only focus on the Java Heap alone.
But in usual tuning exercises (like using the -Xmx parameter for the max heap size, and -Xms parameter
for the initial heap size), we indeed are focussing on the regular "java heap".

Before we turn our attention to those heaps, first of all, what exactly is the JVM?

The Java Virtual Machine, ususally abbrieviated to "JVM", is the environment for running Java objects.
Those objects are "bytecode structures" which are interpreted, or (partly) compiled at runtime, and executed by the JVM.
It's quite important to understand, that whenever you run java programs, a JVM is involved which supports the execution
of the programs.
If you wish, you may visualize the JVM as a "container" holding all program objects.

As the JVM truly is a virtual machine, it is also designed to insulate us from any host machine's peculiarities.

Since JVM's are available for almost all platforms (unix, windows, mainframes etc..), Java programs are highly portable
across platforms. But, that's really in large part "thanks to the JVM".

Another important acronym often heard is the "JRE" or "Java Runtime Environment".
Actually, a running instance of a JRE is a Java Virtual Machine.
That's why, on a certain computer system, you will probably find binaries and support files in some
"jre" or "java" directory. When that stuff "inside there" is running, you have a running JVM.
In most circumstances, the Java application launcher is the "java" executable.

But it's probably not sufficient to talk about the JVM alone. The "java platform" where your java app will live in,
comes with a set of api's or (class) libraries as well.

You could easily have multiple "jre's" installed on your system, in different directories. Since many programs,
for example on Unix, will use 'environment variables' (say shell variables), you could have multiple running programs
in different JVM's. It's often just a matter of which "path" is used, and which environment variables are set.
This is true for Windows too. Programs may just read a ".conf" file first, or read environment variables, or take
values from the Registry etc..
In fact, any time you start the java executable, you have a JVM.
Sometimes it's not wise to have multiple JVM's (each with their own heap), but sometimes it quite clever to do so.
It just depend on the situation. Later more on this subject.


2. Logical structure of the Java Heap:

Garbage Collector process:

The Java Heap is also under the control of the "Garbage collector" (GC). This is an important process for managing
objects in the heap. For example, since Java does not have an explicit destructor method to delete an object
from memory, the Garbage collector runs periodically to identify, definalize, and to destruct obsolete objects.

Logical view of the Heap with "Generations" regions:

From a logical view, the Java Heap has the following structure:

Fig 1. Organization of the Java Heap



New objects are created in the "Eden" sction. As the GC periodically "sweeps" the Heap, it discovers
that some objects may cleared, while others have a longer life expectancy. The longer living objects
may be transferred to the "survivor" section.
In figure 1, the region "Eden+Survivor" is also called the "Young Collection", since all objects living there
are relatively young.

As more and more visits of the GC occur, objects are destroyed if they are not referenced anymore, and
those which still are, are relocated to region called the "Old Generation".

The "Permanent Generation" region, contains stuff as "metadata" (data about classes and method),
metadata about the regions, as well as some internal JVM objects.

Now, some articles and java experts, only call the collections of all regions all up to the "Old Generation",
as the "java heap". Indeed, this is what you can set as the max heapsize using the -Xmx parameter.
In figure 1, this is illustrated by the "red arrow" (the middle arrow).
Other experts call all regions, all up including the "Permanent Generation", as the "java heap".
I would say: don't worry too much about it. Fact is, that the size of the Permanent Generation can be set as well,
using the "XX:MaxPermSize" parameter (max size), and the "XX:Permsize" parameter (initial size).
Since you can influence all regions all up including the Permanent one, I would say that the Java Heap is all
added up, including the Permanent Generation.

Anyway, what you can conclude from the text above, in some cases, there really might be some tuning needed
before an application runs smoothly.

One or more JVM's?:

Not one universal answer is possible. If you use a large J2EE (java) Application Server, you typically
create "server" objects/instances, and deploy applications on the server instances.
So, a single server instance might support for example 3 Business applications, and it all uses one JVM.

It might also be possible, that you have two (or more) server instances, where each supports its Business apps, and
each has it's own JVM (and thus each server uses it's own Heap)

Such application Servers usually have a "rich" Administrative Console (webbased) where you can set many options
per Server instance, like independent JVM's, the heap size parameters, how the GC should log etc..


3. Setting memory parameters of the Java Heap:

Let's consider one Java Heap, and see how we can obtain a "good" sizing for your Java apps.
However, one answer covering all cases, is not possible ofcourse.
It's evident, we are dealing with memory sizing of the different sections as shown in Chapter 2.

In advanced Java documents, you will find many parameters in order to "tweak" the Java Heap.
However, adjusting just a few important ones, already can have a huge impact on performance
and avoiding the dreaded "OutOfMemoryError" message.

Some important parameters;

Some important parameters are:

-XmsSIZE: Initial Java heap size (Young generation+Old generation),
-XmxSIZE: Maximum Java heap size (Young generation+Old generation),
XX:MaxPermSize=SIZE: Max Size of the Permanent Generation,
XX:Permsize=SIZE: Initial Size of the Permanent Generation

Note that some parameters can be issued from the (startup) commandline, while others are usually
"Environment variables". Also, for certain apps, they will just simply read a ".conf" file in which
the parameters are specified. From that .conf file (or similar file) then, a startup command is constructed.
Other possibilities exists as well, as for example storing parameters in some .xml file,
or storing parameters as Registry values in Windows.

Here is a simple example of starting a JVM and your application:

C:\apps\jre\bin> java -Xms768m -Xmx1248m myapp

Where we have an initial heap of 768M and a max heap of 1248M.
So, Is it all that simple? No, there maybe a few other parameters out of dozens,
which might prove to be critical for your environment.
But in general, the listed parameters above, can already have a large impact.

A few words on available memory on a 32bit and 64bit system:

Let's start with this question: What OS do you use: 32 bit, 64 bit? Is it Unix, Windows etc..?

Especially a 32bit system can be a bit of a "pain in the ass" (sorry), if the application pressure is high.
Whether you have a large Database system, or large application pressure, it's very relevant to have
the resources (like cpu, memory, disksystem, network) scaled properly.
Especially with respect to available memory, the difference between a 32bit and 64 bit system, is like
having running java on "wooden shoes" or "running shoes".

⇒ 32bit system:

Depending on the OS, like for example Unix systems, using some smart environment variables
and/or kernel parameters, you can go as far as (say) a 3,2G heap size.

For 32bit Windows, it is just a maximum of 2GB (1.5G - 1.8G actually). There are several considerations as to why
this is true. One of them is this one: As you know, under the virtual memory system,
any process gets a virtual address space of 4GB. However, the OS itself claims 2GB (like all Win libraries),
which means that any process has a max of 2GB memoryspace (not regarding any bootswitches).
So, here you end up with a max of (a actually a bit less than) 2GB.

If the machine is shared with other apps as well (like Database server), the overall tuning
can be a daunting task (like avoiding swapping etc..).

⇒ 64bit system:

Although the address space is very large, in some cases tuning is still required. This, however,
generally is OS dependent.


4. OutOfMemoryError:

Different sorts of errors may arise with your application. On only one of those, we will spend a few words.
In some cases, you might be confronted with the dreaded JVM : OutOfMemoryError error.
If no other cause, for example, like a memory leak, is troubling your application, then maybe we should take the
message literally: insufficient memory.

If you take a look at figure 1 again, there are two "main" regions, namely The Young and The Old generations
on one hand, and the Permanent Generation on the other hand.
So, actually maybe just one of them, or both, are a bit too small.

Which one is too small, might be distilled from a logfile. Maybe you can find a clue like:

OutOfMemoryError: PermGen space which probably indicates that the size of the Permanent Generation is too small
OutOfMemoryError: Java heap size which probably indicates that the size of the Young and Old generations
(the heap) is too small

From section 3, you then know which parameter you can try: -Xmx for the heap, and XX:MaxPermSize for the Permanent generation.
That the heap may run out of memory might not be a surprise. Maybe it was not sized properly, and maybe at a certain moment
more objects need to be instantiated compared to the objects that the GC could destruct.
This is why the heap (Young and Old generations) should certainly be a bit "oversized".

Also, the Permanent Generation stores permanent objects. The number of objects thus might increase as time passes,
which might even result in a OutOfMemory error. So, the Permanent region should be a bit oversized as well.

So keep in mind that you might tune one out of two regions, or maybe even both.

Note: Some parameters change the behaviour of the GC, which could also have an effect on the occurence
of this particular error. So, if changing the memory parameters did not help, you have to dig deeper.