“java.lang.OutOfMemoryError: Java heap space” when running “ga.nlp.annotate” using GraphAware NLP

up vote
2
down vote

favorite

Windows 10

32Gb RAM

8 core Xeon processor at 3.4GHz

Neo4j 3.4.7

Neo4j Browser 3.2.13

apoc-3.4.0.3.jar

graphaware-nlp-3.4.7.52.13.jar

graphaware-server-community-all-3.4.7.52.jar

nlp-stanfordnlp-3.4.7.52.13.jar

stanford-english-corenlp-2018-10-05-models.jar

Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.

I'm using the following query to do this:

CALL apoc.periodic.iterate(

"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",

"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})

YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})

...and am getting the following error:

java.lang.OutOfMemoryError: Java heap space

I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!

edited Nov 11 at 11:31

Zoe

10.3k73475

asked Nov 10 at 17:13

Doug

818

add a comment |

up vote
2
down vote

favorite

Windows 10

32Gb RAM

8 core Xeon processor at 3.4GHz

Neo4j 3.4.7

Neo4j Browser 3.2.13

apoc-3.4.0.3.jar

graphaware-nlp-3.4.7.52.13.jar

graphaware-server-community-all-3.4.7.52.jar

nlp-stanfordnlp-3.4.7.52.13.jar

stanford-english-corenlp-2018-10-05-models.jar

Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.

I'm using the following query to do this:

CALL apoc.periodic.iterate(

"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",

"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})

YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})

...and am getting the following error:

java.lang.OutOfMemoryError: Java heap space

I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!

edited Nov 11 at 11:31

Zoe

10.3k73475

asked Nov 10 at 17:13

Doug

818

add a comment |

up vote
2
down vote

favorite

Windows 10

32Gb RAM

8 core Xeon processor at 3.4GHz

Neo4j 3.4.7

Neo4j Browser 3.2.13

apoc-3.4.0.3.jar

graphaware-nlp-3.4.7.52.13.jar

graphaware-server-community-all-3.4.7.52.jar

nlp-stanfordnlp-3.4.7.52.13.jar

stanford-english-corenlp-2018-10-05-models.jar

Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.

I'm using the following query to do this:

CALL apoc.periodic.iterate(

"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",

"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})

YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})

...and am getting the following error:

java.lang.OutOfMemoryError: Java heap space

I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!

edited Nov 11 at 11:31

Zoe

10.3k73475

asked Nov 10 at 17:13

Doug

818

Windows 10

32Gb RAM

8 core Xeon processor at 3.4GHz

Neo4j 3.4.7

Neo4j Browser 3.2.13

apoc-3.4.0.3.jar

graphaware-nlp-3.4.7.52.13.jar

graphaware-server-community-all-3.4.7.52.jar

nlp-stanfordnlp-3.4.7.52.13.jar

stanford-english-corenlp-2018-10-05-models.jar

Hi. I am trying to annotate all the text fields in my database. There are 25532 nodes with text values.

I'm using the following query to do this:

CALL apoc.periodic.iterate(

"MATCH (n:FreeTextResponse) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() RETURN n",

"CALL ga.nlp.annotate({text: n.fullSentenceString, id: id(n), checkLanguage: false})

YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})

...and am getting the following error:

java.lang.OutOfMemoryError: Java heap space

I'm sure this is just a settings change somewhere, but I'm not sure what or where. Sorry if this is a bit of a newbie question!

neo4j

edited Nov 11 at 11:31

Zoe

10.3k73475

asked Nov 10 at 17:13

Doug

818

edited Nov 11 at 11:31

Zoe

10.3k73475

asked Nov 10 at 17:13

Doug

818

edited Nov 11 at 11:31

Zoe

10.3k73475

edited Nov 11 at 11:31

Zoe

10.3k73475

edited Nov 11 at 11:31

Zoe

10.3k73475

asked Nov 10 at 17:13

Doug

818

asked Nov 10 at 17:13

Doug

818

asked Nov 10 at 17:13

Doug

818

add a comment |

1 Answer
1

active

oldest

votes

up vote
6
down vote

accepted

The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:

https://github.com/graphaware/neo4j-nlp

Change your neo4j.conf file in the following way:

dbms.memory.heap.initial_size=3000m

dbms.memory.heap.max_size=5000m

Although considering your RAM availability I would suggest 5GB for both values.

edited Nov 11 at 11:30

Zoe

10.3k73475

answered Nov 11 at 11:28

Alessandro Negro

50926

Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21

yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39

Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54

Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40

in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45

|
show 6 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241408%2fjava-lang-outofmemoryerror-java-heap-space-when-running-ga-nlp-annotate-usi%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
6
down vote

accepted

The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:

https://github.com/graphaware/neo4j-nlp

Change your neo4j.conf file in the following way:

dbms.memory.heap.initial_size=3000m

dbms.memory.heap.max_size=5000m

Although considering your RAM availability I would suggest 5GB for both values.

edited Nov 11 at 11:30

Zoe

10.3k73475

answered Nov 11 at 11:28

Alessandro Negro

50926

Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21

yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39

Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54

Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40

in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45

|
show 6 more comments

up vote
6
down vote

accepted

The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:

https://github.com/graphaware/neo4j-nlp

Change your neo4j.conf file in the following way:

dbms.memory.heap.initial_size=3000m

dbms.memory.heap.max_size=5000m

Although considering your RAM availability I would suggest 5GB for both values.

edited Nov 11 at 11:30

Zoe

10.3k73475

answered Nov 11 at 11:28

Alessandro Negro

50926

Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21

yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39

Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54

Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40

in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45

|
show 6 more comments

up vote
6
down vote

accepted

The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:

https://github.com/graphaware/neo4j-nlp

Change your neo4j.conf file in the following way:

dbms.memory.heap.initial_size=3000m

dbms.memory.heap.max_size=5000m

Although considering your RAM availability I would suggest 5GB for both values.

edited Nov 11 at 11:30

Zoe

10.3k73475

answered Nov 11 at 11:28

Alessandro Negro

50926

The default configuration is 512MB and it is not enough for the models used by Stanford NLP.
As suggested here:

https://github.com/graphaware/neo4j-nlp

Change your neo4j.conf file in the following way:

dbms.memory.heap.initial_size=3000m

dbms.memory.heap.max_size=5000m

Although considering your RAM availability I would suggest 5GB for both values.

edited Nov 11 at 11:30

Zoe

10.3k73475

answered Nov 11 at 11:28

Alessandro Negro

50926

edited Nov 11 at 11:30

Zoe

10.3k73475

edited Nov 11 at 11:30

Zoe

10.3k73475

edited Nov 11 at 11:30

Zoe

10.3k73475

answered Nov 11 at 11:28

Alessandro Negro

50926

answered Nov 11 at 11:28

Alessandro Negro

50926

answered Nov 11 at 11:28

Alessandro Negro

50926

Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21

yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39

Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54

Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40

in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45

|
show 6 more comments

Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21

yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39

Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54

Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40

in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45

Thank you very much. Greatly appreciated. That seems to have done the trick. I am currently processing the co-sine similarity having successfully processed the annotations. Annotations processed in about 5 hours creating around 21 000 Annotated text nodes. I set the similarity running and it's still running 17 hours later. It's using 100% of the CPU capacity. Does that sound normal to you? Thanks very much indeed!
– Doug
Nov 12 at 12:21

yes it is normal. In the Enterprise Edition we are developing optimization fro cosine similarity computation. Consider that in the "basic" version it is computing 21000x21000 similarities. Even though we improved it quite a lot it is a huge number of computation.
– Alessandro Negro
Nov 12 at 12:39

Thanks, that's good to know. I'll just cross my fingers and hope it completes without issues then :) Any idea how long I might expect it to take, or is that a how long is a piece of string question? It's a great plugin by the way! Thank you.
– Doug
Nov 12 at 13:54

Just following this question up @Allessandro as my UPS is having problems and I am unsure whether to kill the query or leave it running. Project deadline has now passed and it's been running for 58 hours. Is there any way to 1) know how long the query will take / how far through it is or 2) stop the query, restart the computer and continue it where it left off? Thanks!
– Doug
Nov 13 at 23:40

in the neo4j log there should be something useful to understand what’s going on
– Alessandro Negro
Nov 13 at 23:45

|
show 6 more comments

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ypk,FyHvdJzI5 GIQi6Y hPYa,DlNRx7N3D8 qstgWoxMYqWXkR 5bcmuU,Bg79RM,WKFACSn,Z4YiC9SQ,liJZbsJjfBL4c8d6nmleyR

搜尋此網誌

Ndtyjky