Understanding JVM Architecture | How does JVM works?

Understanding JVM Architecture | How does JVM works?

We know the main feature of JVM is to convert the java byte code into machine code. But what black box magic is happening inside the Java Virtual Machine? That quest to find the answer to that question decided me write this article. As we know JVM is a specification and different vendors might have different implementations of JVM but here we will understand the architecture defined by JVM specification.

Class Loader  Subsystem
The main role of the class loader subsystem is to read all the class files and store the corresponding binary information in the Method area. The information such as the fully qualified name of the class, method information, variable information, constructor/modifier information, whether a file represents class, enum, or interface are stored in the Method area. The main task of the class loader is loading, linking, and initialization.

Class loading loads compiled .class file into memory. There are three types of the class loader in java
Bootstrap Class Loader:  It is parent of all class loader and loads all the core java API from rt.jar which is located at …jre/lib/ directory. Some examples include java.util.String, java.lang.Thread, java.lang.ArrayList etc. The class loader is implemented in a native language like C/C++.
Extension Class Loader:  Extension class loader delegates class loading request to its parent class that is Bootstrap class loader and if loading is unsuccessful then it loads class from ../jre/lib/ext location. It is a child loader of Extension Class Loader.
System/Application Class Loader: It is a child classloader of the Extension classloader. It loads application-specific classes from the system class path.

After the class is loaded, Linking will verify and prepare the loaded class as necessary.  Linking occurs in three main stages.
Verification: verification ensures that the binary representation of a class is structurally correct, such as .class file is generated by a valid compiler or not, .class file is properly formatted or not, methods respect access control keywords, final method/classes are not overridden. If verification fails, it will throw a runtime error (java.lang.VerifyError)
Preparation: JVM will allocate memory with default values for class-level static variables. Original values of static variables will be assigned in the initialization phase.
Resolution: In resolution stage, program friendly symbols will be converted to machine friendly symbols and and all the symbolic references used in our class will be replaced with original direct reference from method area.

This is a final stage where all the static variables are assigned with their original values defined in the code and the if there are any static block, they will be executed. The initialize stage is also responsible to execute the initialization logic of each loaded class or interface(e,g. calling the constructor)

Runtime Data Area
The Runtime Data areas are the memory areas that are assigned when JVM is running on the OS. The memory is assigned during the start of JVM and also some memories are created only when the thread is created and destroyed when the thread is destroyed. This list of memories are
Method Area
As we discussed earlier the class loader sub-system stores corresponding binary information inside the method area. The information such as the fully qualified name of the class, method information, variable information, constructor/modifier information, whether a file represents class, enum, or interface are stored in the Method area. All the JVM threads share the same method area.

Heap Area
The Heap area is also a shared resource-like method area so all threads will share the same heap area. Heap stores information on all objects and their corresponding instance variables and arrays.

Stack Area
Unlike the Heap area and method area, Stack Area doesn’t share resources.  Each thread has a private JVM stack. Whenever a method is invoked or called, one entry will be pushed to the JVM stack which is called Stack Frame and stack frame is destroyed when the method invocation is completed. Each stack frame holds local primitive values, references to other objects stored in the heap that is being used by the method.

PC Registers
PC registers hold the address of currently executing instruction when the thread is started. Each thread has its own PC register.

Native Method Stack
Native method stacks are allocated per thread. They contain native methods. A native method is a method that is written in a language other than a java programming language.

Execution Engine
The execution engine communicates with different memory areas of JVM and executes the instructions. Also, each thread that is running in a application is a distinct instance of the execution engine. The execution engine has three main subcomponents that are:
Interpreter: The interpreter interprets the bytecode and executes the instructions one by one. One of the major problems is that when a method is called multiple times, each time there will be new interpretation. This reduces the performance of the system. To overcome this problem JIT is introduced.
Just-In-Time(JIT) Compiler: JIT compiler compiles the entire bytecode to native code (machine code ). It is used to improve the performance of JVM.