Bulk loading: Making sure all BulkProcessor jobs are completed (Java Client API)
I want to make a process that bulk loads data to ES so that
- There are two indices: index_1, index_2 and an alias that points to index_1 or index_2
- The data is bulk loaded to index_1 or index_2
- If all data is loaded without failures, the alias is changed
I'm using the Java Client API.
I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.
In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.
But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?
elasticsearch
add a comment |
I want to make a process that bulk loads data to ES so that
- There are two indices: index_1, index_2 and an alias that points to index_1 or index_2
- The data is bulk loaded to index_1 or index_2
- If all data is loaded without failures, the alias is changed
I'm using the Java Client API.
I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.
In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.
But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?
elasticsearch
did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…
– Behzad Dadashpour
Nov 13 '18 at 8:07
@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.
– Esko Piirainen
Nov 13 '18 at 9:41
in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?
– Behzad Dadashpour
Nov 13 '18 at 10:15
@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed
– Esko Piirainen
Nov 13 '18 at 10:37
But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.
– Esko Piirainen
Nov 13 '18 at 10:39
add a comment |
I want to make a process that bulk loads data to ES so that
- There are two indices: index_1, index_2 and an alias that points to index_1 or index_2
- The data is bulk loaded to index_1 or index_2
- If all data is loaded without failures, the alias is changed
I'm using the Java Client API.
I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.
In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.
But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?
elasticsearch
I want to make a process that bulk loads data to ES so that
- There are two indices: index_1, index_2 and an alias that points to index_1 or index_2
- The data is bulk loaded to index_1 or index_2
- If all data is loaded without failures, the alias is changed
I'm using the Java Client API.
I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.
In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.
But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?
elasticsearch
elasticsearch
asked Nov 13 '18 at 7:49
Esko PiirainenEsko Piirainen
661716
661716
did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…
– Behzad Dadashpour
Nov 13 '18 at 8:07
@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.
– Esko Piirainen
Nov 13 '18 at 9:41
in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?
– Behzad Dadashpour
Nov 13 '18 at 10:15
@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed
– Esko Piirainen
Nov 13 '18 at 10:37
But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.
– Esko Piirainen
Nov 13 '18 at 10:39
add a comment |
did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…
– Behzad Dadashpour
Nov 13 '18 at 8:07
@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.
– Esko Piirainen
Nov 13 '18 at 9:41
in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?
– Behzad Dadashpour
Nov 13 '18 at 10:15
@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed
– Esko Piirainen
Nov 13 '18 at 10:37
But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.
– Esko Piirainen
Nov 13 '18 at 10:39
did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…
– Behzad Dadashpour
Nov 13 '18 at 8:07
did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…
– Behzad Dadashpour
Nov 13 '18 at 8:07
@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.
– Esko Piirainen
Nov 13 '18 at 9:41
@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.
– Esko Piirainen
Nov 13 '18 at 9:41
in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?
– Behzad Dadashpour
Nov 13 '18 at 10:15
in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?
– Behzad Dadashpour
Nov 13 '18 at 10:15
@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed
– Esko Piirainen
Nov 13 '18 at 10:37
@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed
– Esko Piirainen
Nov 13 '18 at 10:37
But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.
– Esko Piirainen
Nov 13 '18 at 10:39
But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.
– Esko Piirainen
Nov 13 '18 at 10:39
add a comment |
1 Answer
1
active
oldest
votes
There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276174%2fbulk-loading-making-sure-all-bulkprocessor-jobs-are-completed-java-client-api%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.
add a comment |
There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.
add a comment |
There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.
There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.
answered Dec 17 '18 at 13:55
Esko PiirainenEsko Piirainen
661716
661716
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276174%2fbulk-loading-making-sure-all-bulkprocessor-jobs-are-completed-java-client-api%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…
– Behzad Dadashpour
Nov 13 '18 at 8:07
@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.
– Esko Piirainen
Nov 13 '18 at 9:41
in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?
– Behzad Dadashpour
Nov 13 '18 at 10:15
@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed
– Esko Piirainen
Nov 13 '18 at 10:37
But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.
– Esko Piirainen
Nov 13 '18 at 10:39