“java.lang.OutOfMemoryError: Java heap space” when running “ga.nlp.annotate” using GraphAware NLP











up vote
2
down vote

favorite












Windows 10

32Gb RAM

8 core Xeon processor at 3.4GHz



Neo4j 3.4.7
Neo4j Browser 3.2.13
apoc-3.4.0.3.jar
graphaware-nlp-3.4.7.52.13.jar
graphaware-server-community-all-3.4.7.52.jar
nlp-stanfordnlp-3.4.7.52.13.jar
stanford-english-corenlp-2018-10-05-models.jar


Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.



I'm using the following query to do this:



CALL apoc.periodic.iterate(
"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})


...and am getting the following error:



java.lang.OutOfMemoryError: Java heap space


I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!










share|improve this question




























    up vote
    2
    down vote

    favorite












    Windows 10

    32Gb RAM

    8 core Xeon processor at 3.4GHz



    Neo4j 3.4.7
    Neo4j Browser 3.2.13
    apoc-3.4.0.3.jar
    graphaware-nlp-3.4.7.52.13.jar
    graphaware-server-community-all-3.4.7.52.jar
    nlp-stanfordnlp-3.4.7.52.13.jar
    stanford-english-corenlp-2018-10-05-models.jar


    Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.



    I'm using the following query to do this:



    CALL apoc.periodic.iterate(
    "MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
    "CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
    YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})


    ...and am getting the following error:



    java.lang.OutOfMemoryError: Java heap space


    I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!










    share|improve this question


























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      Windows 10

      32Gb RAM

      8 core Xeon processor at 3.4GHz



      Neo4j 3.4.7
      Neo4j Browser 3.2.13
      apoc-3.4.0.3.jar
      graphaware-nlp-3.4.7.52.13.jar
      graphaware-server-community-all-3.4.7.52.jar
      nlp-stanfordnlp-3.4.7.52.13.jar
      stanford-english-corenlp-2018-10-05-models.jar


      Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.



      I'm using the following query to do this:



      CALL apoc.periodic.iterate(
      "MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
      "CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
      YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})


      ...and am getting the following error:



      java.lang.OutOfMemoryError: Java heap space


      I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!










      share|improve this question















      Windows 10

      32Gb RAM

      8 core Xeon processor at 3.4GHz



      Neo4j 3.4.7
      Neo4j Browser 3.2.13
      apoc-3.4.0.3.jar
      graphaware-nlp-3.4.7.52.13.jar
      graphaware-server-community-all-3.4.7.52.jar
      nlp-stanfordnlp-3.4.7.52.13.jar
      stanford-english-corenlp-2018-10-05-models.jar


      Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.



      I'm using the following query to do this:



      CALL apoc.periodic.iterate(
      "MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
      "CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
      YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})


      ...and am getting the following error:



      java.lang.OutOfMemoryError: Java heap space


      I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!







      neo4j






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 11 at 11:31









      Zoe

      10.3k73475




      10.3k73475










      asked Nov 10 at 17:13









      Doug

      818




      818
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          6
          down vote



          accepted










          The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
          As suggested here:



          https://github.com/graphaware/neo4j-nlp



          Change your neo4j.conf file in the following way:



          dbms.memory.heap.initial_size=3000m
          dbms.memory.heap.max_size=5000m


          Although considering your RAM availability I would suggest 5GB for both values.






          share|improve this answer























          • Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
            – Doug
            Nov 12 at 12:21










          • yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
            – Alessandro Negro
            Nov 12 at 12:39










          • Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
            – Doug
            Nov 12 at 13:54










          • Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
            – Doug
            Nov 13 at 23:40










          • in the neo4j log there should be something useful to understand what’s going on
            – Alessandro Negro
            Nov 13 at 23:45











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241408%2fjava-lang-outofmemoryerror-java-heap-space-when-running-ga-nlp-annotate-usi%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          6
          down vote



          accepted










          The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
          As suggested here:



          https://github.com/graphaware/neo4j-nlp



          Change your neo4j.conf file in the following way:



          dbms.memory.heap.initial_size=3000m
          dbms.memory.heap.max_size=5000m


          Although considering your RAM availability I would suggest 5GB for both values.






          share|improve this answer























          • Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
            – Doug
            Nov 12 at 12:21










          • yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
            – Alessandro Negro
            Nov 12 at 12:39










          • Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
            – Doug
            Nov 12 at 13:54










          • Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
            – Doug
            Nov 13 at 23:40










          • in the neo4j log there should be something useful to understand what’s going on
            – Alessandro Negro
            Nov 13 at 23:45















          up vote
          6
          down vote



          accepted










          The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
          As suggested here:



          https://github.com/graphaware/neo4j-nlp



          Change your neo4j.conf file in the following way:



          dbms.memory.heap.initial_size=3000m
          dbms.memory.heap.max_size=5000m


          Although considering your RAM availability I would suggest 5GB for both values.






          share|improve this answer























          • Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
            – Doug
            Nov 12 at 12:21










          • yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
            – Alessandro Negro
            Nov 12 at 12:39










          • Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
            – Doug
            Nov 12 at 13:54










          • Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
            – Doug
            Nov 13 at 23:40










          • in the neo4j log there should be something useful to understand what’s going on
            – Alessandro Negro
            Nov 13 at 23:45













          up vote
          6
          down vote



          accepted







          up vote
          6
          down vote



          accepted






          The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
          As suggested here:



          https://github.com/graphaware/neo4j-nlp



          Change your neo4j.conf file in the following way:



          dbms.memory.heap.initial_size=3000m
          dbms.memory.heap.max_size=5000m


          Although considering your RAM availability I would suggest 5GB for both values.






          share|improve this answer














          The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
          As suggested here:



          https://github.com/graphaware/neo4j-nlp



          Change your neo4j.conf file in the following way:



          dbms.memory.heap.initial_size=3000m
          dbms.memory.heap.max_size=5000m


          Although considering your RAM availability I would suggest 5GB for both values.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 11 at 11:30









          Zoe

          10.3k73475




          10.3k73475










          answered Nov 11 at 11:28









          Alessandro Negro

          50926




          50926












          • Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
            – Doug
            Nov 12 at 12:21










          • yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
            – Alessandro Negro
            Nov 12 at 12:39










          • Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
            – Doug
            Nov 12 at 13:54










          • Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
            – Doug
            Nov 13 at 23:40










          • in the neo4j log there should be something useful to understand what’s going on
            – Alessandro Negro
            Nov 13 at 23:45


















          • Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
            – Doug
            Nov 12 at 12:21










          • yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
            – Alessandro Negro
            Nov 12 at 12:39










          • Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
            – Doug
            Nov 12 at 13:54










          • Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
            – Doug
            Nov 13 at 23:40










          • in the neo4j log there should be something useful to understand what’s going on
            – Alessandro Negro
            Nov 13 at 23:45
















          Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
          – Doug
          Nov 12 at 12:21




          Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
          – Doug
          Nov 12 at 12:21












          yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
          – Alessandro Negro
          Nov 12 at 12:39




          yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
          – Alessandro Negro
          Nov 12 at 12:39












          Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
          – Doug
          Nov 12 at 13:54




          Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
          – Doug
          Nov 12 at 13:54












          Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
          – Doug
          Nov 13 at 23:40




          Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
          – Doug
          Nov 13 at 23:40












          in the neo4j log there should be something useful to understand what’s going on
          – Alessandro Negro
          Nov 13 at 23:45




          in the neo4j log there should be something useful to understand what’s going on
          – Alessandro Negro
          Nov 13 at 23:45


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241408%2fjava-lang-outofmemoryerror-java-heap-space-when-running-ga-nlp-annotate-usi%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Error while running script in elastic search , gateway timeout

          Adding quotations to stringified JSON object values