High memory usage with Files.lines





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I've found a few other questions on SO that are close to what I need but I can't figure this out. I'm reading a text file line by line and getting an out of memory error. Here's the code:



System.out.println("Total memory before read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
String wp_posts = new String();
try(Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)){
wp_posts = stream
.filter(line -> line.startsWith("INSERT INTO `wp_posts`"))
.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
} catch (Exception e1) {
System.out.println(e1.getMessage());
e1.printStackTrace();
}

try {
System.out.println("wp_posts Mega bytes: " + wp_posts.getBytes("UTF-8").length/1000000);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println("Total memory after read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");


Output is like (when run in an environment with more memory):



Total memory before read: 255MB
wp_posts Mega bytes: 18
Total memory after read: 1035MB


Note than in my production environment, I cannot increase the memory heap.



I've tried explicitly closing the stream, doing a gc, and putting stream in parallel mode (consumed more memory).



My questions are:
Is this amount of memory usage expected?
Is there a way to use less memory?










share|improve this question


















  • 7





    You should expect memory usage to be correlated with the size of the content you're loading in memory. If it's a problem to load the entire file in memory the way you're doing, you may need to simply process the file in smaller chunks/batches.

    – ernest_k
    Nov 16 '18 at 18:10








  • 1





    I suspect the StringBuilder is causing the issue. Also how big is your file exactly?

    – Nicholas K
    Nov 16 '18 at 18:10








  • 1





    @ScottBrodersen You need to redesign your method to make it process reasonable numbers of lines at a time. Maybe including the actual processing of the string builder in the question can get you practical answers/solutions.

    – ernest_k
    Nov 16 '18 at 18:19






  • 3





    It does indeed read the file line by line. You are then accumulating all of those lines into a StringBuilder, so naturally they continue to take up memory.

    – VGR
    Nov 16 '18 at 18:36






  • 2





    @ScottBrodersen Files.lines reads the file line by line indeed, but you are collecting the entire content in an in-memory buffer, which will end up using memory for the entire file

    – ernest_k
    Nov 16 '18 at 18:36




















1















I've found a few other questions on SO that are close to what I need but I can't figure this out. I'm reading a text file line by line and getting an out of memory error. Here's the code:



System.out.println("Total memory before read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
String wp_posts = new String();
try(Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)){
wp_posts = stream
.filter(line -> line.startsWith("INSERT INTO `wp_posts`"))
.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
} catch (Exception e1) {
System.out.println(e1.getMessage());
e1.printStackTrace();
}

try {
System.out.println("wp_posts Mega bytes: " + wp_posts.getBytes("UTF-8").length/1000000);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println("Total memory after read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");


Output is like (when run in an environment with more memory):



Total memory before read: 255MB
wp_posts Mega bytes: 18
Total memory after read: 1035MB


Note than in my production environment, I cannot increase the memory heap.



I've tried explicitly closing the stream, doing a gc, and putting stream in parallel mode (consumed more memory).



My questions are:
Is this amount of memory usage expected?
Is there a way to use less memory?










share|improve this question


















  • 7





    You should expect memory usage to be correlated with the size of the content you're loading in memory. If it's a problem to load the entire file in memory the way you're doing, you may need to simply process the file in smaller chunks/batches.

    – ernest_k
    Nov 16 '18 at 18:10








  • 1





    I suspect the StringBuilder is causing the issue. Also how big is your file exactly?

    – Nicholas K
    Nov 16 '18 at 18:10








  • 1





    @ScottBrodersen You need to redesign your method to make it process reasonable numbers of lines at a time. Maybe including the actual processing of the string builder in the question can get you practical answers/solutions.

    – ernest_k
    Nov 16 '18 at 18:19






  • 3





    It does indeed read the file line by line. You are then accumulating all of those lines into a StringBuilder, so naturally they continue to take up memory.

    – VGR
    Nov 16 '18 at 18:36






  • 2





    @ScottBrodersen Files.lines reads the file line by line indeed, but you are collecting the entire content in an in-memory buffer, which will end up using memory for the entire file

    – ernest_k
    Nov 16 '18 at 18:36
















1












1








1


0






I've found a few other questions on SO that are close to what I need but I can't figure this out. I'm reading a text file line by line and getting an out of memory error. Here's the code:



System.out.println("Total memory before read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
String wp_posts = new String();
try(Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)){
wp_posts = stream
.filter(line -> line.startsWith("INSERT INTO `wp_posts`"))
.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
} catch (Exception e1) {
System.out.println(e1.getMessage());
e1.printStackTrace();
}

try {
System.out.println("wp_posts Mega bytes: " + wp_posts.getBytes("UTF-8").length/1000000);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println("Total memory after read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");


Output is like (when run in an environment with more memory):



Total memory before read: 255MB
wp_posts Mega bytes: 18
Total memory after read: 1035MB


Note than in my production environment, I cannot increase the memory heap.



I've tried explicitly closing the stream, doing a gc, and putting stream in parallel mode (consumed more memory).



My questions are:
Is this amount of memory usage expected?
Is there a way to use less memory?










share|improve this question














I've found a few other questions on SO that are close to what I need but I can't figure this out. I'm reading a text file line by line and getting an out of memory error. Here's the code:



System.out.println("Total memory before read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
String wp_posts = new String();
try(Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)){
wp_posts = stream
.filter(line -> line.startsWith("INSERT INTO `wp_posts`"))
.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
} catch (Exception e1) {
System.out.println(e1.getMessage());
e1.printStackTrace();
}

try {
System.out.println("wp_posts Mega bytes: " + wp_posts.getBytes("UTF-8").length/1000000);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println("Total memory after read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");


Output is like (when run in an environment with more memory):



Total memory before read: 255MB
wp_posts Mega bytes: 18
Total memory after read: 1035MB


Note than in my production environment, I cannot increase the memory heap.



I've tried explicitly closing the stream, doing a gc, and putting stream in parallel mode (consumed more memory).



My questions are:
Is this amount of memory usage expected?
Is there a way to use less memory?







java






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 '18 at 18:05









ScottBroScottBro

1559




1559








  • 7





    You should expect memory usage to be correlated with the size of the content you're loading in memory. If it's a problem to load the entire file in memory the way you're doing, you may need to simply process the file in smaller chunks/batches.

    – ernest_k
    Nov 16 '18 at 18:10








  • 1





    I suspect the StringBuilder is causing the issue. Also how big is your file exactly?

    – Nicholas K
    Nov 16 '18 at 18:10








  • 1





    @ScottBrodersen You need to redesign your method to make it process reasonable numbers of lines at a time. Maybe including the actual processing of the string builder in the question can get you practical answers/solutions.

    – ernest_k
    Nov 16 '18 at 18:19






  • 3





    It does indeed read the file line by line. You are then accumulating all of those lines into a StringBuilder, so naturally they continue to take up memory.

    – VGR
    Nov 16 '18 at 18:36






  • 2





    @ScottBrodersen Files.lines reads the file line by line indeed, but you are collecting the entire content in an in-memory buffer, which will end up using memory for the entire file

    – ernest_k
    Nov 16 '18 at 18:36
















  • 7





    You should expect memory usage to be correlated with the size of the content you're loading in memory. If it's a problem to load the entire file in memory the way you're doing, you may need to simply process the file in smaller chunks/batches.

    – ernest_k
    Nov 16 '18 at 18:10








  • 1





    I suspect the StringBuilder is causing the issue. Also how big is your file exactly?

    – Nicholas K
    Nov 16 '18 at 18:10








  • 1





    @ScottBrodersen You need to redesign your method to make it process reasonable numbers of lines at a time. Maybe including the actual processing of the string builder in the question can get you practical answers/solutions.

    – ernest_k
    Nov 16 '18 at 18:19






  • 3





    It does indeed read the file line by line. You are then accumulating all of those lines into a StringBuilder, so naturally they continue to take up memory.

    – VGR
    Nov 16 '18 at 18:36






  • 2





    @ScottBrodersen Files.lines reads the file line by line indeed, but you are collecting the entire content in an in-memory buffer, which will end up using memory for the entire file

    – ernest_k
    Nov 16 '18 at 18:36










7




7





You should expect memory usage to be correlated with the size of the content you're loading in memory. If it's a problem to load the entire file in memory the way you're doing, you may need to simply process the file in smaller chunks/batches.

– ernest_k
Nov 16 '18 at 18:10







You should expect memory usage to be correlated with the size of the content you're loading in memory. If it's a problem to load the entire file in memory the way you're doing, you may need to simply process the file in smaller chunks/batches.

– ernest_k
Nov 16 '18 at 18:10






1




1





I suspect the StringBuilder is causing the issue. Also how big is your file exactly?

– Nicholas K
Nov 16 '18 at 18:10







I suspect the StringBuilder is causing the issue. Also how big is your file exactly?

– Nicholas K
Nov 16 '18 at 18:10






1




1





@ScottBrodersen You need to redesign your method to make it process reasonable numbers of lines at a time. Maybe including the actual processing of the string builder in the question can get you practical answers/solutions.

– ernest_k
Nov 16 '18 at 18:19





@ScottBrodersen You need to redesign your method to make it process reasonable numbers of lines at a time. Maybe including the actual processing of the string builder in the question can get you practical answers/solutions.

– ernest_k
Nov 16 '18 at 18:19




3




3





It does indeed read the file line by line. You are then accumulating all of those lines into a StringBuilder, so naturally they continue to take up memory.

– VGR
Nov 16 '18 at 18:36





It does indeed read the file line by line. You are then accumulating all of those lines into a StringBuilder, so naturally they continue to take up memory.

– VGR
Nov 16 '18 at 18:36




2




2





@ScottBrodersen Files.lines reads the file line by line indeed, but you are collecting the entire content in an in-memory buffer, which will end up using memory for the entire file

– ernest_k
Nov 16 '18 at 18:36







@ScottBrodersen Files.lines reads the file line by line indeed, but you are collecting the entire content in an in-memory buffer, which will end up using memory for the entire file

– ernest_k
Nov 16 '18 at 18:36














3 Answers
3






active

oldest

votes


















1














Your problem is in collect(StringBuilder::new, StringBuilder::append, StringBuilder::append). When you add smth to the StringBuilder and it has not enough internal array, then it double it and copy part from previous one.



Do new StringBuilder(int size) to predefine size of internal array.



Second problem, is that you have a big file, but as result you put it into a StringBuilder. This is very strange to me. Actually this is same as read whole file into a String without using Stream.






share|improve this answer
























  • I'm filtering the lines before adding to the StringBuilder, so only a subset of the lines are being stored. I changed the code to process each line as it is read instead of storing them in the StringBuilder, but this had no effect. As soon as the stream is done with the file, the used memory doubles.

    – ScottBro
    Nov 19 '18 at 14:25



















0














Your Runtime.totalMemory() calculation is pointless if you are allowing JVM to resize the heap. Java will allocate heap memory as needed as long as it doesn't exceed -Xmx value. Since JVM is smart it won't allocate heap memory by 1 byte at a time because it would be very expensive. Instead JVM will request a larger amount of memory at a time (actual value is platform and JVM implementation specific).



Your code is currently loading the content of the file into memory so there will be objects created on the heap. Because of that JVM most likely will request memory from the OS and you will observer increased Runtime.totalMemory() value.



Try running your program with strictly sized heap e.g. by adding -Xms300m -Xmx300m options. If you won't get OutOfMemoryError then decrease the heap until you get it. However you also need to pay attention to GC cycles, these things go hand in had and are a trade off.



Alternatively you can create a heap dump after the file is processed and then explore the data with MemoryAnalyzer.






share|improve this answer


























  • I'm running this in my test env -- spitting out the totalMemory is only to get an idea of how much memory is being used and when. I indicated that in my production environment I cannot adjust the heap size (sorry if that wasn't clear). Thank you for the suggestion of analyzing the heap dump.

    – ScottBro
    Nov 17 '18 at 15:34



















0














The way you calculated memory is incorrect due to the following reasons:




  1. You have taken the total memory (not the used memory). JVM allocates memory lazily and when it does, it does it in chunks. So, when it needs an additional 1 byte memory, it may allocate 1MB memory (provided the total memory does not exceed the configured max heap size). Thus a good portion of allocated heap memory may remain unused. Therefore, you need to calculate the used memory: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()

  2. A good portion of the memory you see with the above formula maybe ready for garbage collection. JVM would definitely do the garbage collection before saying OutOfMemory. Therefore, to get an idea, you should do a System.gc() before calculating used memory. Ofcourse, you don't call gc in production and also calling gc does not guarantee that JVM would indeed trigger garbage collection. But for testing purpose, I think it works well.

  3. You got the OutOfMemory when the stream processing was in progress. At that time the String was not formed and the StringBuilder had strong reference. You should call the capacity() method of StringBuilder to get the actual number of char elements in the array within StringBuilder and then multiply it by 2 to get the number of bytes because Java internally uses UTF16 which needs 2 bytes to store an ASCII character.

  4. Finally, the way your code is written (i.e. not specifying a big enough size for StringBuilder initially), every time your StringBuilder runs out of space, it double the size of the internal array by creating a new array and copying the content. This means there will be triple the size allocated at a time than the actual String. This you cannot measure because it happens within the StringBuilder class and when the control comes out of StringBuilder class the old array is ready for garbage collection. So, there is a high chance that when you get the OutOfMemory error, you get it at that point in StringBuilder when it tries to allocate a double sized array, or more specifically in the Arrays.copyOf method


How much memory is expected to be consumed by your program as is? (A rough estimate)



Let's consider the program which is similar to yours.



public static void main(String arg) {
// Initialize the arraylist to emulate a
// file with 32 lines each containing
// 1000 ASCII characters
List<String> strList = new ArrayList<String>(32);
for (Integer i = 0; i < 32; i++) {
strList.add(String.format("%01000d", i));
}


StringBuilder str = new StringBuilder();
strList.stream().map(element -> {
// Print the number of char
// reserved by the StringBuilder
System.out.print(str.capacity() + ", ");
return element;
}).collect(() -> {
return str;
}, (response, element) -> {
response.append(element);
}, (response, element) -> {
response.append(element);
}).toString();
}


Here after every append, I'm printing the capacity of the StringBuilder.



The output of the program is as follows:



16, 1000, 2002, 4006, 4006, 8014, 8014, 8014, 8014, 
16030, 16030, 16030, 16030, 16030, 16030, 16030, 16030,
32062, 32062, 32062, 32062, 32062, 32062, 32062, 32062,
32062, 32062, 32062, 32062, 32062, 32062, 32062,


If your file has "n" lines (where n is a power of 2) and each line has an average "m" ASCII characters, the capacity of the StringBuilder at the end of the program execution will be: (n * m + 2 ^ (a + 1) ) where (2 ^ a = n).



E.g. if your file has 256 lines and an average of 1500 ASCII characters per line, the total capacity of the StringBuilder at the end of program will be: (256 * 1500 + 2 ^ 9) = 384512 characters.



Assuming, you have only ASCII characters in you file, each character will occupy 2 bytes in UTF-16 representation. Additionally, everytime when the StringBuilder array runs out of space, a new bigger array twice the size of original is created (see the capacity growth numbers above) and the content of the old array is copied to the new array. The old array is then left for garbage collection. Therefore, if you add another 2 ^ (a+1) or 2 ^ 9 characters, the StringBuilder would create a new array for holding (n * m + 2 ^ (a + 1) ) * 2 + 2 characters and start copying the content of old array into the new array. Thus, there will be two big sized arrays within the StringBuilder as the copying activity goes on.



thus the total memory will be: 384512 * 2 + (384512 * 2 + 2 ) * 2 = 23,07,076 = 2.2 MB (approx.) to hold only 0.7 MB data.



I have ignored the other memory consuming items like array header, object header, references etc. as those will be negligible or constant compared to the array size.



So, in conclusion, 256 lines with 1500 characters each, consumes 2.2 MB (approx.) to hold only 0.7 MB data (one-third data).



If you had initialized the StringBuilder with the size 3,84,512 at the beginning, you could have accommodated the same number of characters in one-third memory and also there would have been much less work for CPU in terms of array copy and garbage collection



What you may consider doing instead



Finally, in such kind of problems, you may want to do it in chunks where you would write the content of your StringBuilder in a file or database as soon as it has processed 1000 records (say), clear the StringBuilder and start over again for the next batch of records. Thus you'd never hold more than 1000 (say) record worth of data in memory.






share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53343196%2fhigh-memory-usage-with-files-lines%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Your problem is in collect(StringBuilder::new, StringBuilder::append, StringBuilder::append). When you add smth to the StringBuilder and it has not enough internal array, then it double it and copy part from previous one.



    Do new StringBuilder(int size) to predefine size of internal array.



    Second problem, is that you have a big file, but as result you put it into a StringBuilder. This is very strange to me. Actually this is same as read whole file into a String without using Stream.






    share|improve this answer
























    • I'm filtering the lines before adding to the StringBuilder, so only a subset of the lines are being stored. I changed the code to process each line as it is read instead of storing them in the StringBuilder, but this had no effect. As soon as the stream is done with the file, the used memory doubles.

      – ScottBro
      Nov 19 '18 at 14:25
















    1














    Your problem is in collect(StringBuilder::new, StringBuilder::append, StringBuilder::append). When you add smth to the StringBuilder and it has not enough internal array, then it double it and copy part from previous one.



    Do new StringBuilder(int size) to predefine size of internal array.



    Second problem, is that you have a big file, but as result you put it into a StringBuilder. This is very strange to me. Actually this is same as read whole file into a String without using Stream.






    share|improve this answer
























    • I'm filtering the lines before adding to the StringBuilder, so only a subset of the lines are being stored. I changed the code to process each line as it is read instead of storing them in the StringBuilder, but this had no effect. As soon as the stream is done with the file, the used memory doubles.

      – ScottBro
      Nov 19 '18 at 14:25














    1












    1








    1







    Your problem is in collect(StringBuilder::new, StringBuilder::append, StringBuilder::append). When you add smth to the StringBuilder and it has not enough internal array, then it double it and copy part from previous one.



    Do new StringBuilder(int size) to predefine size of internal array.



    Second problem, is that you have a big file, but as result you put it into a StringBuilder. This is very strange to me. Actually this is same as read whole file into a String without using Stream.






    share|improve this answer













    Your problem is in collect(StringBuilder::new, StringBuilder::append, StringBuilder::append). When you add smth to the StringBuilder and it has not enough internal array, then it double it and copy part from previous one.



    Do new StringBuilder(int size) to predefine size of internal array.



    Second problem, is that you have a big file, but as result you put it into a StringBuilder. This is very strange to me. Actually this is same as read whole file into a String without using Stream.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 16 '18 at 19:48









    oleg.cherednikoleg.cherednik

    7,20921219




    7,20921219













    • I'm filtering the lines before adding to the StringBuilder, so only a subset of the lines are being stored. I changed the code to process each line as it is read instead of storing them in the StringBuilder, but this had no effect. As soon as the stream is done with the file, the used memory doubles.

      – ScottBro
      Nov 19 '18 at 14:25



















    • I'm filtering the lines before adding to the StringBuilder, so only a subset of the lines are being stored. I changed the code to process each line as it is read instead of storing them in the StringBuilder, but this had no effect. As soon as the stream is done with the file, the used memory doubles.

      – ScottBro
      Nov 19 '18 at 14:25

















    I'm filtering the lines before adding to the StringBuilder, so only a subset of the lines are being stored. I changed the code to process each line as it is read instead of storing them in the StringBuilder, but this had no effect. As soon as the stream is done with the file, the used memory doubles.

    – ScottBro
    Nov 19 '18 at 14:25





    I'm filtering the lines before adding to the StringBuilder, so only a subset of the lines are being stored. I changed the code to process each line as it is read instead of storing them in the StringBuilder, but this had no effect. As soon as the stream is done with the file, the used memory doubles.

    – ScottBro
    Nov 19 '18 at 14:25













    0














    Your Runtime.totalMemory() calculation is pointless if you are allowing JVM to resize the heap. Java will allocate heap memory as needed as long as it doesn't exceed -Xmx value. Since JVM is smart it won't allocate heap memory by 1 byte at a time because it would be very expensive. Instead JVM will request a larger amount of memory at a time (actual value is platform and JVM implementation specific).



    Your code is currently loading the content of the file into memory so there will be objects created on the heap. Because of that JVM most likely will request memory from the OS and you will observer increased Runtime.totalMemory() value.



    Try running your program with strictly sized heap e.g. by adding -Xms300m -Xmx300m options. If you won't get OutOfMemoryError then decrease the heap until you get it. However you also need to pay attention to GC cycles, these things go hand in had and are a trade off.



    Alternatively you can create a heap dump after the file is processed and then explore the data with MemoryAnalyzer.






    share|improve this answer


























    • I'm running this in my test env -- spitting out the totalMemory is only to get an idea of how much memory is being used and when. I indicated that in my production environment I cannot adjust the heap size (sorry if that wasn't clear). Thank you for the suggestion of analyzing the heap dump.

      – ScottBro
      Nov 17 '18 at 15:34
















    0














    Your Runtime.totalMemory() calculation is pointless if you are allowing JVM to resize the heap. Java will allocate heap memory as needed as long as it doesn't exceed -Xmx value. Since JVM is smart it won't allocate heap memory by 1 byte at a time because it would be very expensive. Instead JVM will request a larger amount of memory at a time (actual value is platform and JVM implementation specific).



    Your code is currently loading the content of the file into memory so there will be objects created on the heap. Because of that JVM most likely will request memory from the OS and you will observer increased Runtime.totalMemory() value.



    Try running your program with strictly sized heap e.g. by adding -Xms300m -Xmx300m options. If you won't get OutOfMemoryError then decrease the heap until you get it. However you also need to pay attention to GC cycles, these things go hand in had and are a trade off.



    Alternatively you can create a heap dump after the file is processed and then explore the data with MemoryAnalyzer.






    share|improve this answer


























    • I'm running this in my test env -- spitting out the totalMemory is only to get an idea of how much memory is being used and when. I indicated that in my production environment I cannot adjust the heap size (sorry if that wasn't clear). Thank you for the suggestion of analyzing the heap dump.

      – ScottBro
      Nov 17 '18 at 15:34














    0












    0








    0







    Your Runtime.totalMemory() calculation is pointless if you are allowing JVM to resize the heap. Java will allocate heap memory as needed as long as it doesn't exceed -Xmx value. Since JVM is smart it won't allocate heap memory by 1 byte at a time because it would be very expensive. Instead JVM will request a larger amount of memory at a time (actual value is platform and JVM implementation specific).



    Your code is currently loading the content of the file into memory so there will be objects created on the heap. Because of that JVM most likely will request memory from the OS and you will observer increased Runtime.totalMemory() value.



    Try running your program with strictly sized heap e.g. by adding -Xms300m -Xmx300m options. If you won't get OutOfMemoryError then decrease the heap until you get it. However you also need to pay attention to GC cycles, these things go hand in had and are a trade off.



    Alternatively you can create a heap dump after the file is processed and then explore the data with MemoryAnalyzer.






    share|improve this answer















    Your Runtime.totalMemory() calculation is pointless if you are allowing JVM to resize the heap. Java will allocate heap memory as needed as long as it doesn't exceed -Xmx value. Since JVM is smart it won't allocate heap memory by 1 byte at a time because it would be very expensive. Instead JVM will request a larger amount of memory at a time (actual value is platform and JVM implementation specific).



    Your code is currently loading the content of the file into memory so there will be objects created on the heap. Because of that JVM most likely will request memory from the OS and you will observer increased Runtime.totalMemory() value.



    Try running your program with strictly sized heap e.g. by adding -Xms300m -Xmx300m options. If you won't get OutOfMemoryError then decrease the heap until you get it. However you also need to pay attention to GC cycles, these things go hand in had and are a trade off.



    Alternatively you can create a heap dump after the file is processed and then explore the data with MemoryAnalyzer.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 16 '18 at 22:02

























    answered Nov 16 '18 at 21:57









    Karol DowbeckiKarol Dowbecki

    26.8k93860




    26.8k93860













    • I'm running this in my test env -- spitting out the totalMemory is only to get an idea of how much memory is being used and when. I indicated that in my production environment I cannot adjust the heap size (sorry if that wasn't clear). Thank you for the suggestion of analyzing the heap dump.

      – ScottBro
      Nov 17 '18 at 15:34



















    • I'm running this in my test env -- spitting out the totalMemory is only to get an idea of how much memory is being used and when. I indicated that in my production environment I cannot adjust the heap size (sorry if that wasn't clear). Thank you for the suggestion of analyzing the heap dump.

      – ScottBro
      Nov 17 '18 at 15:34

















    I'm running this in my test env -- spitting out the totalMemory is only to get an idea of how much memory is being used and when. I indicated that in my production environment I cannot adjust the heap size (sorry if that wasn't clear). Thank you for the suggestion of analyzing the heap dump.

    – ScottBro
    Nov 17 '18 at 15:34





    I'm running this in my test env -- spitting out the totalMemory is only to get an idea of how much memory is being used and when. I indicated that in my production environment I cannot adjust the heap size (sorry if that wasn't clear). Thank you for the suggestion of analyzing the heap dump.

    – ScottBro
    Nov 17 '18 at 15:34











    0














    The way you calculated memory is incorrect due to the following reasons:




    1. You have taken the total memory (not the used memory). JVM allocates memory lazily and when it does, it does it in chunks. So, when it needs an additional 1 byte memory, it may allocate 1MB memory (provided the total memory does not exceed the configured max heap size). Thus a good portion of allocated heap memory may remain unused. Therefore, you need to calculate the used memory: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()

    2. A good portion of the memory you see with the above formula maybe ready for garbage collection. JVM would definitely do the garbage collection before saying OutOfMemory. Therefore, to get an idea, you should do a System.gc() before calculating used memory. Ofcourse, you don't call gc in production and also calling gc does not guarantee that JVM would indeed trigger garbage collection. But for testing purpose, I think it works well.

    3. You got the OutOfMemory when the stream processing was in progress. At that time the String was not formed and the StringBuilder had strong reference. You should call the capacity() method of StringBuilder to get the actual number of char elements in the array within StringBuilder and then multiply it by 2 to get the number of bytes because Java internally uses UTF16 which needs 2 bytes to store an ASCII character.

    4. Finally, the way your code is written (i.e. not specifying a big enough size for StringBuilder initially), every time your StringBuilder runs out of space, it double the size of the internal array by creating a new array and copying the content. This means there will be triple the size allocated at a time than the actual String. This you cannot measure because it happens within the StringBuilder class and when the control comes out of StringBuilder class the old array is ready for garbage collection. So, there is a high chance that when you get the OutOfMemory error, you get it at that point in StringBuilder when it tries to allocate a double sized array, or more specifically in the Arrays.copyOf method


    How much memory is expected to be consumed by your program as is? (A rough estimate)



    Let's consider the program which is similar to yours.



    public static void main(String arg) {
    // Initialize the arraylist to emulate a
    // file with 32 lines each containing
    // 1000 ASCII characters
    List<String> strList = new ArrayList<String>(32);
    for (Integer i = 0; i < 32; i++) {
    strList.add(String.format("%01000d", i));
    }


    StringBuilder str = new StringBuilder();
    strList.stream().map(element -> {
    // Print the number of char
    // reserved by the StringBuilder
    System.out.print(str.capacity() + ", ");
    return element;
    }).collect(() -> {
    return str;
    }, (response, element) -> {
    response.append(element);
    }, (response, element) -> {
    response.append(element);
    }).toString();
    }


    Here after every append, I'm printing the capacity of the StringBuilder.



    The output of the program is as follows:



    16, 1000, 2002, 4006, 4006, 8014, 8014, 8014, 8014, 
    16030, 16030, 16030, 16030, 16030, 16030, 16030, 16030,
    32062, 32062, 32062, 32062, 32062, 32062, 32062, 32062,
    32062, 32062, 32062, 32062, 32062, 32062, 32062,


    If your file has "n" lines (where n is a power of 2) and each line has an average "m" ASCII characters, the capacity of the StringBuilder at the end of the program execution will be: (n * m + 2 ^ (a + 1) ) where (2 ^ a = n).



    E.g. if your file has 256 lines and an average of 1500 ASCII characters per line, the total capacity of the StringBuilder at the end of program will be: (256 * 1500 + 2 ^ 9) = 384512 characters.



    Assuming, you have only ASCII characters in you file, each character will occupy 2 bytes in UTF-16 representation. Additionally, everytime when the StringBuilder array runs out of space, a new bigger array twice the size of original is created (see the capacity growth numbers above) and the content of the old array is copied to the new array. The old array is then left for garbage collection. Therefore, if you add another 2 ^ (a+1) or 2 ^ 9 characters, the StringBuilder would create a new array for holding (n * m + 2 ^ (a + 1) ) * 2 + 2 characters and start copying the content of old array into the new array. Thus, there will be two big sized arrays within the StringBuilder as the copying activity goes on.



    thus the total memory will be: 384512 * 2 + (384512 * 2 + 2 ) * 2 = 23,07,076 = 2.2 MB (approx.) to hold only 0.7 MB data.



    I have ignored the other memory consuming items like array header, object header, references etc. as those will be negligible or constant compared to the array size.



    So, in conclusion, 256 lines with 1500 characters each, consumes 2.2 MB (approx.) to hold only 0.7 MB data (one-third data).



    If you had initialized the StringBuilder with the size 3,84,512 at the beginning, you could have accommodated the same number of characters in one-third memory and also there would have been much less work for CPU in terms of array copy and garbage collection



    What you may consider doing instead



    Finally, in such kind of problems, you may want to do it in chunks where you would write the content of your StringBuilder in a file or database as soon as it has processed 1000 records (say), clear the StringBuilder and start over again for the next batch of records. Thus you'd never hold more than 1000 (say) record worth of data in memory.






    share|improve this answer






























      0














      The way you calculated memory is incorrect due to the following reasons:




      1. You have taken the total memory (not the used memory). JVM allocates memory lazily and when it does, it does it in chunks. So, when it needs an additional 1 byte memory, it may allocate 1MB memory (provided the total memory does not exceed the configured max heap size). Thus a good portion of allocated heap memory may remain unused. Therefore, you need to calculate the used memory: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()

      2. A good portion of the memory you see with the above formula maybe ready for garbage collection. JVM would definitely do the garbage collection before saying OutOfMemory. Therefore, to get an idea, you should do a System.gc() before calculating used memory. Ofcourse, you don't call gc in production and also calling gc does not guarantee that JVM would indeed trigger garbage collection. But for testing purpose, I think it works well.

      3. You got the OutOfMemory when the stream processing was in progress. At that time the String was not formed and the StringBuilder had strong reference. You should call the capacity() method of StringBuilder to get the actual number of char elements in the array within StringBuilder and then multiply it by 2 to get the number of bytes because Java internally uses UTF16 which needs 2 bytes to store an ASCII character.

      4. Finally, the way your code is written (i.e. not specifying a big enough size for StringBuilder initially), every time your StringBuilder runs out of space, it double the size of the internal array by creating a new array and copying the content. This means there will be triple the size allocated at a time than the actual String. This you cannot measure because it happens within the StringBuilder class and when the control comes out of StringBuilder class the old array is ready for garbage collection. So, there is a high chance that when you get the OutOfMemory error, you get it at that point in StringBuilder when it tries to allocate a double sized array, or more specifically in the Arrays.copyOf method


      How much memory is expected to be consumed by your program as is? (A rough estimate)



      Let's consider the program which is similar to yours.



      public static void main(String arg) {
      // Initialize the arraylist to emulate a
      // file with 32 lines each containing
      // 1000 ASCII characters
      List<String> strList = new ArrayList<String>(32);
      for (Integer i = 0; i < 32; i++) {
      strList.add(String.format("%01000d", i));
      }


      StringBuilder str = new StringBuilder();
      strList.stream().map(element -> {
      // Print the number of char
      // reserved by the StringBuilder
      System.out.print(str.capacity() + ", ");
      return element;
      }).collect(() -> {
      return str;
      }, (response, element) -> {
      response.append(element);
      }, (response, element) -> {
      response.append(element);
      }).toString();
      }


      Here after every append, I'm printing the capacity of the StringBuilder.



      The output of the program is as follows:



      16, 1000, 2002, 4006, 4006, 8014, 8014, 8014, 8014, 
      16030, 16030, 16030, 16030, 16030, 16030, 16030, 16030,
      32062, 32062, 32062, 32062, 32062, 32062, 32062, 32062,
      32062, 32062, 32062, 32062, 32062, 32062, 32062,


      If your file has "n" lines (where n is a power of 2) and each line has an average "m" ASCII characters, the capacity of the StringBuilder at the end of the program execution will be: (n * m + 2 ^ (a + 1) ) where (2 ^ a = n).



      E.g. if your file has 256 lines and an average of 1500 ASCII characters per line, the total capacity of the StringBuilder at the end of program will be: (256 * 1500 + 2 ^ 9) = 384512 characters.



      Assuming, you have only ASCII characters in you file, each character will occupy 2 bytes in UTF-16 representation. Additionally, everytime when the StringBuilder array runs out of space, a new bigger array twice the size of original is created (see the capacity growth numbers above) and the content of the old array is copied to the new array. The old array is then left for garbage collection. Therefore, if you add another 2 ^ (a+1) or 2 ^ 9 characters, the StringBuilder would create a new array for holding (n * m + 2 ^ (a + 1) ) * 2 + 2 characters and start copying the content of old array into the new array. Thus, there will be two big sized arrays within the StringBuilder as the copying activity goes on.



      thus the total memory will be: 384512 * 2 + (384512 * 2 + 2 ) * 2 = 23,07,076 = 2.2 MB (approx.) to hold only 0.7 MB data.



      I have ignored the other memory consuming items like array header, object header, references etc. as those will be negligible or constant compared to the array size.



      So, in conclusion, 256 lines with 1500 characters each, consumes 2.2 MB (approx.) to hold only 0.7 MB data (one-third data).



      If you had initialized the StringBuilder with the size 3,84,512 at the beginning, you could have accommodated the same number of characters in one-third memory and also there would have been much less work for CPU in terms of array copy and garbage collection



      What you may consider doing instead



      Finally, in such kind of problems, you may want to do it in chunks where you would write the content of your StringBuilder in a file or database as soon as it has processed 1000 records (say), clear the StringBuilder and start over again for the next batch of records. Thus you'd never hold more than 1000 (say) record worth of data in memory.






      share|improve this answer




























        0












        0








        0







        The way you calculated memory is incorrect due to the following reasons:




        1. You have taken the total memory (not the used memory). JVM allocates memory lazily and when it does, it does it in chunks. So, when it needs an additional 1 byte memory, it may allocate 1MB memory (provided the total memory does not exceed the configured max heap size). Thus a good portion of allocated heap memory may remain unused. Therefore, you need to calculate the used memory: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()

        2. A good portion of the memory you see with the above formula maybe ready for garbage collection. JVM would definitely do the garbage collection before saying OutOfMemory. Therefore, to get an idea, you should do a System.gc() before calculating used memory. Ofcourse, you don't call gc in production and also calling gc does not guarantee that JVM would indeed trigger garbage collection. But for testing purpose, I think it works well.

        3. You got the OutOfMemory when the stream processing was in progress. At that time the String was not formed and the StringBuilder had strong reference. You should call the capacity() method of StringBuilder to get the actual number of char elements in the array within StringBuilder and then multiply it by 2 to get the number of bytes because Java internally uses UTF16 which needs 2 bytes to store an ASCII character.

        4. Finally, the way your code is written (i.e. not specifying a big enough size for StringBuilder initially), every time your StringBuilder runs out of space, it double the size of the internal array by creating a new array and copying the content. This means there will be triple the size allocated at a time than the actual String. This you cannot measure because it happens within the StringBuilder class and when the control comes out of StringBuilder class the old array is ready for garbage collection. So, there is a high chance that when you get the OutOfMemory error, you get it at that point in StringBuilder when it tries to allocate a double sized array, or more specifically in the Arrays.copyOf method


        How much memory is expected to be consumed by your program as is? (A rough estimate)



        Let's consider the program which is similar to yours.



        public static void main(String arg) {
        // Initialize the arraylist to emulate a
        // file with 32 lines each containing
        // 1000 ASCII characters
        List<String> strList = new ArrayList<String>(32);
        for (Integer i = 0; i < 32; i++) {
        strList.add(String.format("%01000d", i));
        }


        StringBuilder str = new StringBuilder();
        strList.stream().map(element -> {
        // Print the number of char
        // reserved by the StringBuilder
        System.out.print(str.capacity() + ", ");
        return element;
        }).collect(() -> {
        return str;
        }, (response, element) -> {
        response.append(element);
        }, (response, element) -> {
        response.append(element);
        }).toString();
        }


        Here after every append, I'm printing the capacity of the StringBuilder.



        The output of the program is as follows:



        16, 1000, 2002, 4006, 4006, 8014, 8014, 8014, 8014, 
        16030, 16030, 16030, 16030, 16030, 16030, 16030, 16030,
        32062, 32062, 32062, 32062, 32062, 32062, 32062, 32062,
        32062, 32062, 32062, 32062, 32062, 32062, 32062,


        If your file has "n" lines (where n is a power of 2) and each line has an average "m" ASCII characters, the capacity of the StringBuilder at the end of the program execution will be: (n * m + 2 ^ (a + 1) ) where (2 ^ a = n).



        E.g. if your file has 256 lines and an average of 1500 ASCII characters per line, the total capacity of the StringBuilder at the end of program will be: (256 * 1500 + 2 ^ 9) = 384512 characters.



        Assuming, you have only ASCII characters in you file, each character will occupy 2 bytes in UTF-16 representation. Additionally, everytime when the StringBuilder array runs out of space, a new bigger array twice the size of original is created (see the capacity growth numbers above) and the content of the old array is copied to the new array. The old array is then left for garbage collection. Therefore, if you add another 2 ^ (a+1) or 2 ^ 9 characters, the StringBuilder would create a new array for holding (n * m + 2 ^ (a + 1) ) * 2 + 2 characters and start copying the content of old array into the new array. Thus, there will be two big sized arrays within the StringBuilder as the copying activity goes on.



        thus the total memory will be: 384512 * 2 + (384512 * 2 + 2 ) * 2 = 23,07,076 = 2.2 MB (approx.) to hold only 0.7 MB data.



        I have ignored the other memory consuming items like array header, object header, references etc. as those will be negligible or constant compared to the array size.



        So, in conclusion, 256 lines with 1500 characters each, consumes 2.2 MB (approx.) to hold only 0.7 MB data (one-third data).



        If you had initialized the StringBuilder with the size 3,84,512 at the beginning, you could have accommodated the same number of characters in one-third memory and also there would have been much less work for CPU in terms of array copy and garbage collection



        What you may consider doing instead



        Finally, in such kind of problems, you may want to do it in chunks where you would write the content of your StringBuilder in a file or database as soon as it has processed 1000 records (say), clear the StringBuilder and start over again for the next batch of records. Thus you'd never hold more than 1000 (say) record worth of data in memory.






        share|improve this answer















        The way you calculated memory is incorrect due to the following reasons:




        1. You have taken the total memory (not the used memory). JVM allocates memory lazily and when it does, it does it in chunks. So, when it needs an additional 1 byte memory, it may allocate 1MB memory (provided the total memory does not exceed the configured max heap size). Thus a good portion of allocated heap memory may remain unused. Therefore, you need to calculate the used memory: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()

        2. A good portion of the memory you see with the above formula maybe ready for garbage collection. JVM would definitely do the garbage collection before saying OutOfMemory. Therefore, to get an idea, you should do a System.gc() before calculating used memory. Ofcourse, you don't call gc in production and also calling gc does not guarantee that JVM would indeed trigger garbage collection. But for testing purpose, I think it works well.

        3. You got the OutOfMemory when the stream processing was in progress. At that time the String was not formed and the StringBuilder had strong reference. You should call the capacity() method of StringBuilder to get the actual number of char elements in the array within StringBuilder and then multiply it by 2 to get the number of bytes because Java internally uses UTF16 which needs 2 bytes to store an ASCII character.

        4. Finally, the way your code is written (i.e. not specifying a big enough size for StringBuilder initially), every time your StringBuilder runs out of space, it double the size of the internal array by creating a new array and copying the content. This means there will be triple the size allocated at a time than the actual String. This you cannot measure because it happens within the StringBuilder class and when the control comes out of StringBuilder class the old array is ready for garbage collection. So, there is a high chance that when you get the OutOfMemory error, you get it at that point in StringBuilder when it tries to allocate a double sized array, or more specifically in the Arrays.copyOf method


        How much memory is expected to be consumed by your program as is? (A rough estimate)



        Let's consider the program which is similar to yours.



        public static void main(String arg) {
        // Initialize the arraylist to emulate a
        // file with 32 lines each containing
        // 1000 ASCII characters
        List<String> strList = new ArrayList<String>(32);
        for (Integer i = 0; i < 32; i++) {
        strList.add(String.format("%01000d", i));
        }


        StringBuilder str = new StringBuilder();
        strList.stream().map(element -> {
        // Print the number of char
        // reserved by the StringBuilder
        System.out.print(str.capacity() + ", ");
        return element;
        }).collect(() -> {
        return str;
        }, (response, element) -> {
        response.append(element);
        }, (response, element) -> {
        response.append(element);
        }).toString();
        }


        Here after every append, I'm printing the capacity of the StringBuilder.



        The output of the program is as follows:



        16, 1000, 2002, 4006, 4006, 8014, 8014, 8014, 8014, 
        16030, 16030, 16030, 16030, 16030, 16030, 16030, 16030,
        32062, 32062, 32062, 32062, 32062, 32062, 32062, 32062,
        32062, 32062, 32062, 32062, 32062, 32062, 32062,


        If your file has "n" lines (where n is a power of 2) and each line has an average "m" ASCII characters, the capacity of the StringBuilder at the end of the program execution will be: (n * m + 2 ^ (a + 1) ) where (2 ^ a = n).



        E.g. if your file has 256 lines and an average of 1500 ASCII characters per line, the total capacity of the StringBuilder at the end of program will be: (256 * 1500 + 2 ^ 9) = 384512 characters.



        Assuming, you have only ASCII characters in you file, each character will occupy 2 bytes in UTF-16 representation. Additionally, everytime when the StringBuilder array runs out of space, a new bigger array twice the size of original is created (see the capacity growth numbers above) and the content of the old array is copied to the new array. The old array is then left for garbage collection. Therefore, if you add another 2 ^ (a+1) or 2 ^ 9 characters, the StringBuilder would create a new array for holding (n * m + 2 ^ (a + 1) ) * 2 + 2 characters and start copying the content of old array into the new array. Thus, there will be two big sized arrays within the StringBuilder as the copying activity goes on.



        thus the total memory will be: 384512 * 2 + (384512 * 2 + 2 ) * 2 = 23,07,076 = 2.2 MB (approx.) to hold only 0.7 MB data.



        I have ignored the other memory consuming items like array header, object header, references etc. as those will be negligible or constant compared to the array size.



        So, in conclusion, 256 lines with 1500 characters each, consumes 2.2 MB (approx.) to hold only 0.7 MB data (one-third data).



        If you had initialized the StringBuilder with the size 3,84,512 at the beginning, you could have accommodated the same number of characters in one-third memory and also there would have been much less work for CPU in terms of array copy and garbage collection



        What you may consider doing instead



        Finally, in such kind of problems, you may want to do it in chunks where you would write the content of your StringBuilder in a file or database as soon as it has processed 1000 records (say), clear the StringBuilder and start over again for the next batch of records. Thus you'd never hold more than 1000 (say) record worth of data in memory.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 24 '18 at 7:24

























        answered Nov 23 '18 at 19:44









        Saptarshi BasuSaptarshi Basu

        2,20721827




        2,20721827






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53343196%2fhigh-memory-usage-with-files-lines%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Danny Elfman

            Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues