Why does model.fit() raise ValueError with tf.train.AdamOptimizer using categorical_crossentropy loss...












0















I'm following the TensorFlow basic classification example with the Keras API provided in the "Getting Started" docs. I get through the tutorial as-is just fine, but if I change the loss function from sparse_categorical_crossentropy to categorical_crossentropy, the code below:



model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer=tf.train.AdamOptimizer(),
loss='categorical_crossentropy',
metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5)


fails during the training/fitting step with the following error:



ValueError: Error when checking target: expected dense_1 to have shape (10,) but got array with shape (1,)


The documentation on the loss functions doesn't delve much into expected input and output. Obviously there is a dimensionality issue here, but if any experts can give a detailed explanation, what is it about this loss function or any other loss function that raises this ValueError?










share|improve this question





























    0















    I'm following the TensorFlow basic classification example with the Keras API provided in the "Getting Started" docs. I get through the tutorial as-is just fine, but if I change the loss function from sparse_categorical_crossentropy to categorical_crossentropy, the code below:



    model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
    ])

    model.compile(optimizer=tf.train.AdamOptimizer(),
    loss='categorical_crossentropy',
    metrics=['accuracy'])

    model.fit(train_images, train_labels, epochs=5)


    fails during the training/fitting step with the following error:



    ValueError: Error when checking target: expected dense_1 to have shape (10,) but got array with shape (1,)


    The documentation on the loss functions doesn't delve much into expected input and output. Obviously there is a dimensionality issue here, but if any experts can give a detailed explanation, what is it about this loss function or any other loss function that raises this ValueError?










    share|improve this question



























      0












      0








      0








      I'm following the TensorFlow basic classification example with the Keras API provided in the "Getting Started" docs. I get through the tutorial as-is just fine, but if I change the loss function from sparse_categorical_crossentropy to categorical_crossentropy, the code below:



      model = keras.Sequential([
      keras.layers.Flatten(input_shape=(28, 28)),
      keras.layers.Dense(128, activation=tf.nn.relu),
      keras.layers.Dense(10, activation=tf.nn.softmax)
      ])

      model.compile(optimizer=tf.train.AdamOptimizer(),
      loss='categorical_crossentropy',
      metrics=['accuracy'])

      model.fit(train_images, train_labels, epochs=5)


      fails during the training/fitting step with the following error:



      ValueError: Error when checking target: expected dense_1 to have shape (10,) but got array with shape (1,)


      The documentation on the loss functions doesn't delve much into expected input and output. Obviously there is a dimensionality issue here, but if any experts can give a detailed explanation, what is it about this loss function or any other loss function that raises this ValueError?










      share|improve this question
















      I'm following the TensorFlow basic classification example with the Keras API provided in the "Getting Started" docs. I get through the tutorial as-is just fine, but if I change the loss function from sparse_categorical_crossentropy to categorical_crossentropy, the code below:



      model = keras.Sequential([
      keras.layers.Flatten(input_shape=(28, 28)),
      keras.layers.Dense(128, activation=tf.nn.relu),
      keras.layers.Dense(10, activation=tf.nn.softmax)
      ])

      model.compile(optimizer=tf.train.AdamOptimizer(),
      loss='categorical_crossentropy',
      metrics=['accuracy'])

      model.fit(train_images, train_labels, epochs=5)


      fails during the training/fitting step with the following error:



      ValueError: Error when checking target: expected dense_1 to have shape (10,) but got array with shape (1,)


      The documentation on the loss functions doesn't delve much into expected input and output. Obviously there is a dimensionality issue here, but if any experts can give a detailed explanation, what is it about this loss function or any other loss function that raises this ValueError?







      python tensorflow machine-learning keras neural-network






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 '18 at 7:02









      today

      10.6k21536




      10.6k21536










      asked Nov 13 '18 at 6:29









      nmurthynmurthy

      322417




      322417
























          1 Answer
          1






          active

          oldest

          votes


















          4














          sparse_categorical_crossentropy loss expects the provided labels to be integers like 0, 1, 2 and so on, where each integer indicates a particular class. For example class 0 might be dogs, class 1 might be cats and class 2 might be lions. On the other hand, categorical_crossentropy loss takes one-hot encoded labels such as [1,0,0], [0,1,0], [0,0,1] and they are interpreted such that the index of 1 indicates the class of the sample. For example [0,0,1] means this sample belongs to class 2 (i.e. lions). Further, in the context of classification models, since the output is usually a probability distribution produced by the output of softmax layer, this form of labels also corresponds to a probability distribution and match with the output of the model. Again, [0,0,1] means that with probability of one we know that this sample belongs to class two.



          sparse_categorical_crossentropy is almost a convenient way to use categorical_crossentropy as the loss function where Keras (or its backend) would handle the integer labels internally and you don't need to manually convert labels to one-hot encoded form. However, if the labels you provide are one-hot encoded then you must use categorical_crossentropy as the loss function.



          Also you might be interested to look at this answer as well, where I have explained briefly about the activation and loss functions and the format of labels used in the context of different kinds of classification tasks.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275045%2fwhy-does-model-fit-raise-valueerror-with-tf-train-adamoptimizer-using-categori%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4














            sparse_categorical_crossentropy loss expects the provided labels to be integers like 0, 1, 2 and so on, where each integer indicates a particular class. For example class 0 might be dogs, class 1 might be cats and class 2 might be lions. On the other hand, categorical_crossentropy loss takes one-hot encoded labels such as [1,0,0], [0,1,0], [0,0,1] and they are interpreted such that the index of 1 indicates the class of the sample. For example [0,0,1] means this sample belongs to class 2 (i.e. lions). Further, in the context of classification models, since the output is usually a probability distribution produced by the output of softmax layer, this form of labels also corresponds to a probability distribution and match with the output of the model. Again, [0,0,1] means that with probability of one we know that this sample belongs to class two.



            sparse_categorical_crossentropy is almost a convenient way to use categorical_crossentropy as the loss function where Keras (or its backend) would handle the integer labels internally and you don't need to manually convert labels to one-hot encoded form. However, if the labels you provide are one-hot encoded then you must use categorical_crossentropy as the loss function.



            Also you might be interested to look at this answer as well, where I have explained briefly about the activation and loss functions and the format of labels used in the context of different kinds of classification tasks.






            share|improve this answer






























              4














              sparse_categorical_crossentropy loss expects the provided labels to be integers like 0, 1, 2 and so on, where each integer indicates a particular class. For example class 0 might be dogs, class 1 might be cats and class 2 might be lions. On the other hand, categorical_crossentropy loss takes one-hot encoded labels such as [1,0,0], [0,1,0], [0,0,1] and they are interpreted such that the index of 1 indicates the class of the sample. For example [0,0,1] means this sample belongs to class 2 (i.e. lions). Further, in the context of classification models, since the output is usually a probability distribution produced by the output of softmax layer, this form of labels also corresponds to a probability distribution and match with the output of the model. Again, [0,0,1] means that with probability of one we know that this sample belongs to class two.



              sparse_categorical_crossentropy is almost a convenient way to use categorical_crossentropy as the loss function where Keras (or its backend) would handle the integer labels internally and you don't need to manually convert labels to one-hot encoded form. However, if the labels you provide are one-hot encoded then you must use categorical_crossentropy as the loss function.



              Also you might be interested to look at this answer as well, where I have explained briefly about the activation and loss functions and the format of labels used in the context of different kinds of classification tasks.






              share|improve this answer




























                4












                4








                4







                sparse_categorical_crossentropy loss expects the provided labels to be integers like 0, 1, 2 and so on, where each integer indicates a particular class. For example class 0 might be dogs, class 1 might be cats and class 2 might be lions. On the other hand, categorical_crossentropy loss takes one-hot encoded labels such as [1,0,0], [0,1,0], [0,0,1] and they are interpreted such that the index of 1 indicates the class of the sample. For example [0,0,1] means this sample belongs to class 2 (i.e. lions). Further, in the context of classification models, since the output is usually a probability distribution produced by the output of softmax layer, this form of labels also corresponds to a probability distribution and match with the output of the model. Again, [0,0,1] means that with probability of one we know that this sample belongs to class two.



                sparse_categorical_crossentropy is almost a convenient way to use categorical_crossentropy as the loss function where Keras (or its backend) would handle the integer labels internally and you don't need to manually convert labels to one-hot encoded form. However, if the labels you provide are one-hot encoded then you must use categorical_crossentropy as the loss function.



                Also you might be interested to look at this answer as well, where I have explained briefly about the activation and loss functions and the format of labels used in the context of different kinds of classification tasks.






                share|improve this answer















                sparse_categorical_crossentropy loss expects the provided labels to be integers like 0, 1, 2 and so on, where each integer indicates a particular class. For example class 0 might be dogs, class 1 might be cats and class 2 might be lions. On the other hand, categorical_crossentropy loss takes one-hot encoded labels such as [1,0,0], [0,1,0], [0,0,1] and they are interpreted such that the index of 1 indicates the class of the sample. For example [0,0,1] means this sample belongs to class 2 (i.e. lions). Further, in the context of classification models, since the output is usually a probability distribution produced by the output of softmax layer, this form of labels also corresponds to a probability distribution and match with the output of the model. Again, [0,0,1] means that with probability of one we know that this sample belongs to class two.



                sparse_categorical_crossentropy is almost a convenient way to use categorical_crossentropy as the loss function where Keras (or its backend) would handle the integer labels internally and you don't need to manually convert labels to one-hot encoded form. However, if the labels you provide are one-hot encoded then you must use categorical_crossentropy as the loss function.



                Also you might be interested to look at this answer as well, where I have explained briefly about the activation and loss functions and the format of labels used in the context of different kinds of classification tasks.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 13 '18 at 10:11

























                answered Nov 13 '18 at 6:49









                todaytoday

                10.6k21536




                10.6k21536






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275045%2fwhy-does-model-fit-raise-valueerror-with-tf-train-adamoptimizer-using-categori%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Florida Star v. B. J. F.

                    Error while running script in elastic search , gateway timeout

                    Adding quotations to stringified JSON object values