How to adopt multiple different loss functions in each steps of LSTM in Keras












0















I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:



(X =Tomorrow is a good day, Y = 0.9)


I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:



(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)


When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.










share|improve this question





























    0















    I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:



    (X =Tomorrow is a good day, Y = 0.9)


    I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:



    (x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)


    When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.










    share|improve this question



























      0












      0








      0








      I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:



      (X =Tomorrow is a good day, Y = 0.9)


      I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:



      (x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)


      When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.










      share|improve this question
















      I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:



      (X =Tomorrow is a good day, Y = 0.9)


      I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:



      (x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)


      When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.







      keras lstm






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 '18 at 9:23









      Amir

      7,25763972




      7,25763972










      asked Nov 13 '18 at 8:43









      Kevin SunKevin Sun

      1259




      1259
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Keras support multiple loss functions as well:



             model = Model(inputs=inputs,
          outputs=[lang_model, sent_model])

          model.compile(optimizer='sgd',
          loss=['categorical_crossentropy', 'mse'],
          metrics=['accuracy'], loss_weights=[1., 1.])


          Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).



          To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.



          This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/






          share|improve this answer


























          • Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?

            – Kevin Sun
            Nov 13 '18 at 20:21











          • @KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).

            – Amir
            Nov 13 '18 at 21:04











          • Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?

            – Kevin Sun
            Nov 13 '18 at 21:28











          • Your welcome. You could but I am unsure about the convergence of the model.

            – Amir
            Nov 13 '18 at 21:44











          • Is your first reply to my question with the same meaning that I asked you in the previous post?

            – Kevin Sun
            Nov 13 '18 at 21:53











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276981%2fhow-to-adopt-multiple-different-loss-functions-in-each-steps-of-lstm-in-keras%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Keras support multiple loss functions as well:



             model = Model(inputs=inputs,
          outputs=[lang_model, sent_model])

          model.compile(optimizer='sgd',
          loss=['categorical_crossentropy', 'mse'],
          metrics=['accuracy'], loss_weights=[1., 1.])


          Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).



          To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.



          This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/






          share|improve this answer


























          • Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?

            – Kevin Sun
            Nov 13 '18 at 20:21











          • @KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).

            – Amir
            Nov 13 '18 at 21:04











          • Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?

            – Kevin Sun
            Nov 13 '18 at 21:28











          • Your welcome. You could but I am unsure about the convergence of the model.

            – Amir
            Nov 13 '18 at 21:44











          • Is your first reply to my question with the same meaning that I asked you in the previous post?

            – Kevin Sun
            Nov 13 '18 at 21:53
















          1














          Keras support multiple loss functions as well:



             model = Model(inputs=inputs,
          outputs=[lang_model, sent_model])

          model.compile(optimizer='sgd',
          loss=['categorical_crossentropy', 'mse'],
          metrics=['accuracy'], loss_weights=[1., 1.])


          Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).



          To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.



          This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/






          share|improve this answer


























          • Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?

            – Kevin Sun
            Nov 13 '18 at 20:21











          • @KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).

            – Amir
            Nov 13 '18 at 21:04











          • Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?

            – Kevin Sun
            Nov 13 '18 at 21:28











          • Your welcome. You could but I am unsure about the convergence of the model.

            – Amir
            Nov 13 '18 at 21:44











          • Is your first reply to my question with the same meaning that I asked you in the previous post?

            – Kevin Sun
            Nov 13 '18 at 21:53














          1












          1








          1







          Keras support multiple loss functions as well:



             model = Model(inputs=inputs,
          outputs=[lang_model, sent_model])

          model.compile(optimizer='sgd',
          loss=['categorical_crossentropy', 'mse'],
          metrics=['accuracy'], loss_weights=[1., 1.])


          Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).



          To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.



          This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/






          share|improve this answer















          Keras support multiple loss functions as well:



             model = Model(inputs=inputs,
          outputs=[lang_model, sent_model])

          model.compile(optimizer='sgd',
          loss=['categorical_crossentropy', 'mse'],
          metrics=['accuracy'], loss_weights=[1., 1.])


          Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).



          To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.



          This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 '18 at 21:04

























          answered Nov 13 '18 at 9:07









          AmirAmir

          7,25763972




          7,25763972













          • Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?

            – Kevin Sun
            Nov 13 '18 at 20:21











          • @KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).

            – Amir
            Nov 13 '18 at 21:04











          • Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?

            – Kevin Sun
            Nov 13 '18 at 21:28











          • Your welcome. You could but I am unsure about the convergence of the model.

            – Amir
            Nov 13 '18 at 21:44











          • Is your first reply to my question with the same meaning that I asked you in the previous post?

            – Kevin Sun
            Nov 13 '18 at 21:53



















          • Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?

            – Kevin Sun
            Nov 13 '18 at 20:21











          • @KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).

            – Amir
            Nov 13 '18 at 21:04











          • Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?

            – Kevin Sun
            Nov 13 '18 at 21:28











          • Your welcome. You could but I am unsure about the convergence of the model.

            – Amir
            Nov 13 '18 at 21:44











          • Is your first reply to my question with the same meaning that I asked you in the previous post?

            – Kevin Sun
            Nov 13 '18 at 21:53

















          Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?

          – Kevin Sun
          Nov 13 '18 at 20:21





          Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?

          – Kevin Sun
          Nov 13 '18 at 20:21













          @KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).

          – Amir
          Nov 13 '18 at 21:04





          @KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).

          – Amir
          Nov 13 '18 at 21:04













          Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?

          – Kevin Sun
          Nov 13 '18 at 21:28





          Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?

          – Kevin Sun
          Nov 13 '18 at 21:28













          Your welcome. You could but I am unsure about the convergence of the model.

          – Amir
          Nov 13 '18 at 21:44





          Your welcome. You could but I am unsure about the convergence of the model.

          – Amir
          Nov 13 '18 at 21:44













          Is your first reply to my question with the same meaning that I asked you in the previous post?

          – Kevin Sun
          Nov 13 '18 at 21:53





          Is your first reply to my question with the same meaning that I asked you in the previous post?

          – Kevin Sun
          Nov 13 '18 at 21:53


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276981%2fhow-to-adopt-multiple-different-loss-functions-in-each-steps-of-lstm-in-keras%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Danny Elfman

          Lugert, Oklahoma