Monday, June 6, 2022

JVM Details

Whether you have used Java to develop programs or not, you might have heard about the Java Virtual Machine (JVM) at some point or another.

JVM is the core of the Java ecosystem, and makes it possible for Java-based software programs to follow the "write once, run anywhere" approach. You can write Java code on one machine, and run it on any other machine using the JVM.

JVM was initially designed to support only Java. However, over the time, many other languages such as Scala, Kotlin and Groovy were adopted on the Java platform. All of these languages are collectively known as JVM languages.

In this article, we will learn more about the JVM, how it works, and the various components that it is made of.


What is a Virtual Machine?




Before we jump into the JVM, let's revisit the concept of a Virtual Machine (VM).

A virtual machine is a virtual representation of a physical computer. We can call the virtual machine the guest machine, and the physical computer it runs on is the host machine.

A single physical machine can run multiple virtual machines, each with their own operating system and applications. These virtual machines are isolated from each other.

What is the Java Virtual Machine?


In programming languages like C and C++, the code is first compiled into platform-specific machine code. These languages are called compiled languages.

On the other hand, in languages like JavaScript and Python, the computer executes the instructions directly without having to compile them. These languages are called interpreted languages.

Java uses a combination of both techniques. Java code is first compiled into byte code to generate a class file. This class file is then interpreted by the Java Virtual Machine for the underlying platform. The same class file can be executed on any version of JVM running on any platform and operating system.

Similar to virtual machines, the JVM creates an isolated space on a host machine. This space can be used to execute Java programs irrespective of the platform or operating system of the machine.

Java Virtual Machine Architecture


The JVM consists of three distinct components:

  1. Class Loader
  2. Runtime Memory/Data Area
  3. Execution Engine

Class Loader


When you compile a .java source file, it is converted into byte code as a .class file. When you try to use this class in your program, the class loader loads it into the main memory.

The first class to be loaded into memory is usually the class that contains the main() method.

There are three phases in the class loading process: loading, linking, and initialization.

  1. Bootstrap ClassLoader: This is the first classloader which is the super class of Extension classloader. It loads the rt.jar file which contains all class files of Java Standard Edition like java.lang package classes, java.net package classes, java.util package classes, java.io package classes, java.sql package classes etc.
  2. Extension ClassLoader: This is the child classloader of Bootstrap and parent classloader of System classloader. It loades the jar files located inside $JAVA_HOME/jre/lib/ext directory.
  3. System/Application ClassLoader: This is the child classloader of Extension classloader. It loads the classfiles from classpath. By default, classpath is set to current directory. You can change the classpath using "-cp" or "-classpath" switch. It is also known as Application classloader.

Runtime Data Area


The JVM spec defines certain run-time data areas that are needed during the execution of the program. Some of them are created while the JVM starts up. Others are local to threads and are created only when a thread is created (and destroyed when the thread is destroyed). These are listed below −

PC (Program Counter) Register

It is local to each thread and contains the address of the JVM instruction that the thread is currently executing.

Stack

It is local to each thread and stores parameters, local variables and return addresses during method calls. A StackOverflow error can occur if a thread demands more stack space than is permitted. If the stack is dynamically expandable, it can still throw OutOfMemoryError.

Heap

It is shared among all the threads and contains objects, classes’ metadata, arrays, etc., that are created during run-time. It is created when the JVM starts and is destroyed when the JVM shuts down. You can control the amount of heap your JVM demands from the OS using certain flags (more on this later). Care has to be taken not to demand too less or too much of the memory, as it has important performance implications. Further, the GC manages this space and continually removes dead objects to free up the space.

Method Area

This run-time area is common to all threads and is created when the JVM starts up. It stores per-class structures such as the constant pool (more on this later), the code for constructors and methods, method data, etc. The JLS does not specify if this area needs to be garbage collected, and hence, implementations of the JVM may choose to ignore GC. Further, this may or may not expand as per the application’s needs. The JLS does not mandate anything with regard to this.

Run-Time Constant Pool

The JVM maintains a per-class/per-type data structure that acts as the symbol table (one of its many roles) while linking the loaded classes.

Native Method Stacks

When a thread invokes a native method, it enters a new world in which the structures and security restrictions of the Java virtual machine no longer hamper its freedom. A native method can likely access the runtime data areas of the virtual machine (it depends upon the native method interface), but can also do anything else it wants.

Garbage Collection

The JVM manages the entire lifecycle of objects in Java. Once an object is created, the developer need not worry about it anymore. In case the object becomes dead (that is, there is no reference to it anymore), it is ejected from the heap by the GC using one of the many algorithms – serial GC, CMS, G1, etc.

During the GC process, objects are moved in memory. Hence, those objects are not usable while the process is going on. The entire application has to be stopped for the duration of the process. Such pauses are called ‘stop-the-world’ pauses and are a huge overhead. GC algorithms aim primarily to reduce this time. We shall discuss this in great detail in the following chapters.

Thanks to the GC, memory leaks are very rare in Java, but they can happen. We will see in the later chapters how to create a memory leak in Java.


Execution Engine


Once the bytecode has been loaded into the main memory, and details are available in the runtime data area, the next step is to run the program. The Execution Engine handles this by executing the code present in each class.

However, before executing the program, the bytecode needs to be converted into machine language instructions. The JVM can use an interpreter or a JIT compiler for the execution engine.

Execution Engine helps in executing the byte code by converting it into machine code and also interact with the memory area. So here are the components involved in executing:

  • Interpreter: It is responsible to read, interpret, and execute java program line by line. So if any method is called multiple times new interpretation required which is the main disadvantage of it.
  • JIT Compiler (Just In Time): The concept of the JIT compiler works only for a repeated method not for every method. To overcome the problem that the interpreter faced while interpreting the byte code, whenever it finds the repeated code, JIT Compiler compiles the entire bytecode and changes it to machine code. This machine code will be used directly for repeated method calls.
  • Intermediate Code Generator
  • Code Optimizer
  • Target Code Generator
  • Profiler

Note: Profiler is there to identify hotspot repeated methods inside JIT.

In order to provide native libraries to the Execution Engine, we have JNI that connects with Native libraries and the Execution Engine.

JNI(Java Native Interface): It acts as a bridge between Execution Engine and Native Method Libraries for providing libraries while executing.

Native Method Libraries: This contains the libraries which are required by the execution engine while executing the byte code.


No comments:

Post a Comment

JIT Details

Every programming language uses a compiler to convert the high-level language code into machine level binary code, because the system unders...