“java.lang.OutOfMemoryError: Java heap space” when running “ga.nlp.annotate” using GraphAware NLP
up vote
2
down vote
favorite
Windows 10
32Gb RAM
8 core Xeon processor at 3.4GHz
Neo4j 3.4.7
Neo4j Browser 3.2.13
apoc-3.4.0.3.jar
graphaware-nlp-3.4.7.52.13.jar
graphaware-server-community-all-3.4.7.52.jar
nlp-stanfordnlp-3.4.7.52.13.jar
stanford-english-corenlp-2018-10-05-models.jar
Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.
I'm using the following query to do this:
CALL apoc.periodic.iterate(
"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})
...and am getting the following error:
java.lang.OutOfMemoryError: Java heap space
I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!
neo4j
add a comment |
up vote
2
down vote
favorite
Windows 10
32Gb RAM
8 core Xeon processor at 3.4GHz
Neo4j 3.4.7
Neo4j Browser 3.2.13
apoc-3.4.0.3.jar
graphaware-nlp-3.4.7.52.13.jar
graphaware-server-community-all-3.4.7.52.jar
nlp-stanfordnlp-3.4.7.52.13.jar
stanford-english-corenlp-2018-10-05-models.jar
Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.
I'm using the following query to do this:
CALL apoc.periodic.iterate(
"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})
...and am getting the following error:
java.lang.OutOfMemoryError: Java heap space
I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!
neo4j
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Windows 10
32Gb RAM
8 core Xeon processor at 3.4GHz
Neo4j 3.4.7
Neo4j Browser 3.2.13
apoc-3.4.0.3.jar
graphaware-nlp-3.4.7.52.13.jar
graphaware-server-community-all-3.4.7.52.jar
nlp-stanfordnlp-3.4.7.52.13.jar
stanford-english-corenlp-2018-10-05-models.jar
Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.
I'm using the following query to do this:
CALL apoc.periodic.iterate(
"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})
...and am getting the following error:
java.lang.OutOfMemoryError: Java heap space
I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!
neo4j
Windows 10
32Gb RAM
8 core Xeon processor at 3.4GHz
Neo4j 3.4.7
Neo4j Browser 3.2.13
apoc-3.4.0.3.jar
graphaware-nlp-3.4.7.52.13.jar
graphaware-server-community-all-3.4.7.52.jar
nlp-stanfordnlp-3.4.7.52.13.jar
stanford-english-corenlp-2018-10-05-models.jar
Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.
I'm using the following query to do this:
CALL apoc.periodic.iterate(
"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",
"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})
...and am getting the following error:
java.lang.OutOfMemoryError: Java heap space
I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!
neo4j
neo4j
edited Nov 11 at 11:31
Zoe
10.3k73475
10.3k73475
asked Nov 10 at 17:13
Doug
818
818
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
6
down vote
accepted
The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:
https://github.com/graphaware/neo4j-nlp
Change your neo4j.conf file in the following way:
dbms.memory.heap.initial_size=3000m
dbms.memory.heap.max_size=5000m
Although considering your RAM availability I would suggest 5GB for both values.
Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21
yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39
Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54
Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40
in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45
|
show 6 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:
https://github.com/graphaware/neo4j-nlp
Change your neo4j.conf file in the following way:
dbms.memory.heap.initial_size=3000m
dbms.memory.heap.max_size=5000m
Although considering your RAM availability I would suggest 5GB for both values.
Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21
yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39
Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54
Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40
in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45
|
show 6 more comments
up vote
6
down vote
accepted
The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:
https://github.com/graphaware/neo4j-nlp
Change your neo4j.conf file in the following way:
dbms.memory.heap.initial_size=3000m
dbms.memory.heap.max_size=5000m
Although considering your RAM availability I would suggest 5GB for both values.
Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21
yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39
Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54
Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40
in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45
|
show 6 more comments
up vote
6
down vote
accepted
up vote
6
down vote
accepted
The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:
https://github.com/graphaware/neo4j-nlp
Change your neo4j.conf file in the following way:
dbms.memory.heap.initial_size=3000m
dbms.memory.heap.max_size=5000m
Although considering your RAM availability I would suggest 5GB for both values.
The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:
https://github.com/graphaware/neo4j-nlp
Change your neo4j.conf file in the following way:
dbms.memory.heap.initial_size=3000m
dbms.memory.heap.max_size=5000m
Although considering your RAM availability I would suggest 5GB for both values.
edited Nov 11 at 11:30
Zoe
10.3k73475
10.3k73475
answered Nov 11 at 11:28
Alessandro Negro
50926
50926
Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21
yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39
Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54
Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40
in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45
|
show 6 more comments
Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21
yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39
Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54
Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40
in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45
Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21
Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21
yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39
yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39
Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54
Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54
Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40
Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40
in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45
in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45
|
show 6 more comments
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241408%2fjava-lang-outofmemoryerror-java-heap-space-when-running-ga-nlp-annotate-usi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown