This is going to be a short post about an issue that I faced in my office project.
Context: I was working on a project that generates an output file based on the data that is read by some input files. Now the size of the input files can vary and in my case, it went up to 50GB. So when the program was executing on the cloud environment (will call ENV further in post), there wasn’t enough space left in the PVC (Persistant Volume Claim) for this file and the program crashed due to full PVC. As output file delivery was a priority thing, so I pulled the file locally and tried to generate the file using Integration test. For local generation, I wrote a test, that stubs the file paths that my batchClient downloads.
BatchClient client = mock(BatchClient.class);
when(client.getFirstFile()).thenReturn(firstFileLocalPath());
This handles stubbing main file, while processing the main file some constant data is required, thus a data provider was stubbed which reads constant data files and exposes methods to provide that constant data.
ConstantDataProvider dp = mock(ConstantDataProvider.class);
when(dp.getData(anyString(), anyString())).thenAnswer(
// getting data from some map prepared by reading local files.
String param1 = invocation.getArgumentAt(0,String.class);
String param2 = invocation.getArgumentAt(1,String.class);
localMap.get(param1, param2);
);
This isolated my test from all network related tasks, now my test had everything to generate the file locally.
Some numbers to gauge the scenario :
System RAM = 12GB total (~10GB available to JVM) Main file size = 50GB (line count ~780K)
Constant files cumulative size ~ 500mb
Problem:
when the test ran locally, it resulted in OutOfMemory error/ heap space memory error at almost 700K line. When encountered with this error, my first thought was that there was memory leak somewhere in my code because I was reading big 50GB file, so there might be leak somewhere during processing. (I wrote the code that reads and processes file line by line, all objects associated with single line processing are available for GC after that line is processed, theoretically there should not be any memory leak). Meanwhile, I was trying to generate file locally, ENV volume was cleared, so I triggered file generation workflow there also. Surprisingly, file was successfully generated on the ENV, this increased confusion and resulted in deductions/questions like :
- On ENV there is OpenJDK and locally its Oracle JDK, so JVM differences.
- Can persistent volume be behaving differently on ENV and local?
- Objects were not Garbage collected GCed locally but GCed on ENV..?
Exploring the issue:
For investigation, I took heap dump of the program using jvisualvm and analysed the dump using JHAT (Java Heap Dump Analysis Tool). Both jvisualvm and jhat are available in jdk/bin. Upon analyzing the heap dump, I saw a long list of objects which were call stack of dp.getData() function that I mocked. Looking further, I found that mockito was storing invocation history for behavioral testing. In my case this function was called ~100 times in a single line and there were 780K lines, storing all this invocation details in-memory was causing the test to fail giving OOM error.
Solution:
While initialising mock object, specify stubOnly() which prevents storing any invocation history.
ConstantDataProvider dp = mock(ConstantDataProvider.class, withSettings().stubOnly());