Bulk loading: Making sure all BulkProcessor jobs are completed (Java Client API)












0















I want to make a process that bulk loads data to ES so that




  1. There are two indices: index_1, index_2 and an alias that points to index_1 or index_2

  2. The data is bulk loaded to index_1 or index_2

  3. If all data is loaded without failures, the alias is changed


I'm using the Java Client API.



I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.



In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.



But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?










share|improve this question























  • did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…

    – Behzad Dadashpour
    Nov 13 '18 at 8:07













  • @BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.

    – Esko Piirainen
    Nov 13 '18 at 9:41











  • in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?

    – Behzad Dadashpour
    Nov 13 '18 at 10:15











  • @BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed

    – Esko Piirainen
    Nov 13 '18 at 10:37











  • But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.

    – Esko Piirainen
    Nov 13 '18 at 10:39
















0















I want to make a process that bulk loads data to ES so that




  1. There are two indices: index_1, index_2 and an alias that points to index_1 or index_2

  2. The data is bulk loaded to index_1 or index_2

  3. If all data is loaded without failures, the alias is changed


I'm using the Java Client API.



I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.



In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.



But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?










share|improve this question























  • did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…

    – Behzad Dadashpour
    Nov 13 '18 at 8:07













  • @BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.

    – Esko Piirainen
    Nov 13 '18 at 9:41











  • in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?

    – Behzad Dadashpour
    Nov 13 '18 at 10:15











  • @BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed

    – Esko Piirainen
    Nov 13 '18 at 10:37











  • But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.

    – Esko Piirainen
    Nov 13 '18 at 10:39














0












0








0








I want to make a process that bulk loads data to ES so that




  1. There are two indices: index_1, index_2 and an alias that points to index_1 or index_2

  2. The data is bulk loaded to index_1 or index_2

  3. If all data is loaded without failures, the alias is changed


I'm using the Java Client API.



I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.



In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.



But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?










share|improve this question














I want to make a process that bulk loads data to ES so that




  1. There are two indices: index_1, index_2 and an alias that points to index_1 or index_2

  2. The data is bulk loaded to index_1 or index_2

  3. If all data is loaded without failures, the alias is changed


I'm using the Java Client API.



I would like to be sure that when I add data to BulkProcessor it has completed all jobs before I continue to evaluate if there were any failures. I keep track of failures in BulkProcessor.Listener.afterBulk.



In my current test implementation, when all data is pushed to BulkProcessor, I call BulkProcessor.flush() and then I have added a timeout (just to be sure) before I check if afterBulk has recorded any failures.



But the question is: What can I do to make sure the BulkProcessor doesn't have any jobs left and all pushed IndexRequests have been completed?







elasticsearch






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 7:49









Esko PiirainenEsko Piirainen

661716




661716













  • did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…

    – Behzad Dadashpour
    Nov 13 '18 at 8:07













  • @BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.

    – Esko Piirainen
    Nov 13 '18 at 9:41











  • in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?

    – Behzad Dadashpour
    Nov 13 '18 at 10:15











  • @BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed

    – Esko Piirainen
    Nov 13 '18 at 10:37











  • But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.

    – Esko Piirainen
    Nov 13 '18 at 10:39



















  • did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…

    – Behzad Dadashpour
    Nov 13 '18 at 8:07













  • @BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.

    – Esko Piirainen
    Nov 13 '18 at 9:41











  • in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?

    – Behzad Dadashpour
    Nov 13 '18 at 10:15











  • @BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed

    – Esko Piirainen
    Nov 13 '18 at 10:37











  • But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.

    – Esko Piirainen
    Nov 13 '18 at 10:39

















did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…

– Behzad Dadashpour
Nov 13 '18 at 8:07







did you try to set refresh key as true? elastic.co/guide/en/elasticsearch/reference/current/…

– Behzad Dadashpour
Nov 13 '18 at 8:07















@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.

– Esko Piirainen
Nov 13 '18 at 9:41





@BehzadDadashpour: I think that options has to do with "when changes made by this request are made visible to search". I'd like to know if the BulkProcessor has any work left.

– Esko Piirainen
Nov 13 '18 at 9:41













in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?

– Behzad Dadashpour
Nov 13 '18 at 10:15





in my case, i get count of inserted data after bulk insertion, after this change (refresh = true) i get true value, is your bulk data too big?

– Behzad Dadashpour
Nov 13 '18 at 10:15













@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed

– Esko Piirainen
Nov 13 '18 at 10:37





@BehzadDadashpour Ok.. That is a possible solution, I happen to known the count. Also I was considering that I could have a set of all ids to be stored and in afterBulk remove all successful ids from the set -> when the set is empty, all jobs are completed

– Esko Piirainen
Nov 13 '18 at 10:37













But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.

– Esko Piirainen
Nov 13 '18 at 10:39





But anyway, I was wondering if there is something more specific that could be done. It is a bit strange if there is no support for this, for I think this should be a quite common requirement.

– Esko Piirainen
Nov 13 '18 at 10:39












1 Answer
1






active

oldest

votes


















0














There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276174%2fbulk-loading-making-sure-all-bulkprocessor-jobs-are-completed-java-client-api%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.






    share|improve this answer




























      0














      There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.






      share|improve this answer


























        0












        0








        0







        There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.






        share|improve this answer













        There is no mechanism in Java Client API (v <= 7.0) to check the size of bulk queue. You can keep track of added ids and ids marked ready (afterBulk) yourself.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 17 '18 at 13:55









        Esko PiirainenEsko Piirainen

        661716




        661716






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276174%2fbulk-loading-making-sure-all-bulkprocessor-jobs-are-completed-java-client-api%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Error while running script in elastic search , gateway timeout

            Adding quotations to stringified JSON object values