What is the difference between SGD and back-propagation?












17















Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?










share|improve this question





























    17















    Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?










    share|improve this question



























      17












      17








      17


      12






      Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?










      share|improve this question
















      Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?







      machine-learning artificial-intelligence difference backpropagation gradient-descent






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 21:29









      nbro

      5,68384996




      5,68384996










      asked Jun 21 '16 at 20:02









      Влад КонцевичВлад Концевич

      8815




      8815
























          3 Answers
          3






          active

          oldest

          votes


















          44














          Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute all required partial derivatives in linear time in terms of the graph size (while naive gradient computations would scale exponentially with depth).



          SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates. You could also apply SGD to gradients obtained in a different way (from sampling, numerical approximators etc.). Symmetrically you can use other optimization techniques with backprop as well, everything that can use gradient/jacobian.



          This common misconception comes from the fact, that for simplicity people sometimes say "trained with backprop", what actually means (if they do not specify optimizer) "trained with SGD using backprop as a gradient computing technique". Also, in old textbooks you can find things like "delta rule" and other a bit confusing terms, which describe exactly the same thing (as neural network community was for a long time a bit independent from general optimization community).



          Thus you have two layers of abstraction:




          • gradient computation - where backprop comes to play

          • optimization level - where techniques like SGD, Adam, Rprop, BFGS etc. come into play, which (if they are first order or higher) use gradient computed above






          share|improve this answer

































            8














            Stochastic gradient descent (SGD) is an optimization method used e.g. to minimize a loss function.



            In the SGD, you use 1 example, at each iteration, to update the weights of your model, depending on the error due to this example, instead of using the average of the errors of all examples (as in "simple" gradient descent), at each iteration. To do so, SGD needs to compute the "gradient of your model".



            Backpropagation is an efficient technique to compute this "gradient" that SGD uses.






            share|improve this answer

































              1














              Back-propagation is just a method for calculating multi-variable derivatives of your model, whereas SGD is the method of locating the minimum of your loss/cost function.






              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f37953585%2fwhat-is-the-difference-between-sgd-and-back-propagation%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                44














                Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute all required partial derivatives in linear time in terms of the graph size (while naive gradient computations would scale exponentially with depth).



                SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates. You could also apply SGD to gradients obtained in a different way (from sampling, numerical approximators etc.). Symmetrically you can use other optimization techniques with backprop as well, everything that can use gradient/jacobian.



                This common misconception comes from the fact, that for simplicity people sometimes say "trained with backprop", what actually means (if they do not specify optimizer) "trained with SGD using backprop as a gradient computing technique". Also, in old textbooks you can find things like "delta rule" and other a bit confusing terms, which describe exactly the same thing (as neural network community was for a long time a bit independent from general optimization community).



                Thus you have two layers of abstraction:




                • gradient computation - where backprop comes to play

                • optimization level - where techniques like SGD, Adam, Rprop, BFGS etc. come into play, which (if they are first order or higher) use gradient computed above






                share|improve this answer






























                  44














                  Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute all required partial derivatives in linear time in terms of the graph size (while naive gradient computations would scale exponentially with depth).



                  SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates. You could also apply SGD to gradients obtained in a different way (from sampling, numerical approximators etc.). Symmetrically you can use other optimization techniques with backprop as well, everything that can use gradient/jacobian.



                  This common misconception comes from the fact, that for simplicity people sometimes say "trained with backprop", what actually means (if they do not specify optimizer) "trained with SGD using backprop as a gradient computing technique". Also, in old textbooks you can find things like "delta rule" and other a bit confusing terms, which describe exactly the same thing (as neural network community was for a long time a bit independent from general optimization community).



                  Thus you have two layers of abstraction:




                  • gradient computation - where backprop comes to play

                  • optimization level - where techniques like SGD, Adam, Rprop, BFGS etc. come into play, which (if they are first order or higher) use gradient computed above






                  share|improve this answer




























                    44












                    44








                    44







                    Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute all required partial derivatives in linear time in terms of the graph size (while naive gradient computations would scale exponentially with depth).



                    SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates. You could also apply SGD to gradients obtained in a different way (from sampling, numerical approximators etc.). Symmetrically you can use other optimization techniques with backprop as well, everything that can use gradient/jacobian.



                    This common misconception comes from the fact, that for simplicity people sometimes say "trained with backprop", what actually means (if they do not specify optimizer) "trained with SGD using backprop as a gradient computing technique". Also, in old textbooks you can find things like "delta rule" and other a bit confusing terms, which describe exactly the same thing (as neural network community was for a long time a bit independent from general optimization community).



                    Thus you have two layers of abstraction:




                    • gradient computation - where backprop comes to play

                    • optimization level - where techniques like SGD, Adam, Rprop, BFGS etc. come into play, which (if they are first order or higher) use gradient computed above






                    share|improve this answer















                    Backpropagation is an efficient method of computing gradients in directed graphs of computations, such as neural networks. This is not a learning method, but rather a nice computational trick which is often used in learning methods. This is actually a simple implementation of chain rule of derivatives, which simply gives you the ability to compute all required partial derivatives in linear time in terms of the graph size (while naive gradient computations would scale exponentially with depth).



                    SGD is one of many optimization methods, namely first order optimizer, meaning, that it is based on analysis of the gradient of the objective. Consequently, in terms of neural networks it is often applied together with backprop to make efficient updates. You could also apply SGD to gradients obtained in a different way (from sampling, numerical approximators etc.). Symmetrically you can use other optimization techniques with backprop as well, everything that can use gradient/jacobian.



                    This common misconception comes from the fact, that for simplicity people sometimes say "trained with backprop", what actually means (if they do not specify optimizer) "trained with SGD using backprop as a gradient computing technique". Also, in old textbooks you can find things like "delta rule" and other a bit confusing terms, which describe exactly the same thing (as neural network community was for a long time a bit independent from general optimization community).



                    Thus you have two layers of abstraction:




                    • gradient computation - where backprop comes to play

                    • optimization level - where techniques like SGD, Adam, Rprop, BFGS etc. come into play, which (if they are first order or higher) use gradient computed above







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Jun 19 '18 at 6:22









                    QINGYUAN FENG

                    194




                    194










                    answered Jun 21 '16 at 20:22









                    lejlotlejlot

                    47.4k482110




                    47.4k482110

























                        8














                        Stochastic gradient descent (SGD) is an optimization method used e.g. to minimize a loss function.



                        In the SGD, you use 1 example, at each iteration, to update the weights of your model, depending on the error due to this example, instead of using the average of the errors of all examples (as in "simple" gradient descent), at each iteration. To do so, SGD needs to compute the "gradient of your model".



                        Backpropagation is an efficient technique to compute this "gradient" that SGD uses.






                        share|improve this answer






























                          8














                          Stochastic gradient descent (SGD) is an optimization method used e.g. to minimize a loss function.



                          In the SGD, you use 1 example, at each iteration, to update the weights of your model, depending on the error due to this example, instead of using the average of the errors of all examples (as in "simple" gradient descent), at each iteration. To do so, SGD needs to compute the "gradient of your model".



                          Backpropagation is an efficient technique to compute this "gradient" that SGD uses.






                          share|improve this answer




























                            8












                            8








                            8







                            Stochastic gradient descent (SGD) is an optimization method used e.g. to minimize a loss function.



                            In the SGD, you use 1 example, at each iteration, to update the weights of your model, depending on the error due to this example, instead of using the average of the errors of all examples (as in "simple" gradient descent), at each iteration. To do so, SGD needs to compute the "gradient of your model".



                            Backpropagation is an efficient technique to compute this "gradient" that SGD uses.






                            share|improve this answer















                            Stochastic gradient descent (SGD) is an optimization method used e.g. to minimize a loss function.



                            In the SGD, you use 1 example, at each iteration, to update the weights of your model, depending on the error due to this example, instead of using the average of the errors of all examples (as in "simple" gradient descent), at each iteration. To do so, SGD needs to compute the "gradient of your model".



                            Backpropagation is an efficient technique to compute this "gradient" that SGD uses.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 14 '18 at 21:38









                            nbro

                            5,68384996




                            5,68384996










                            answered Feb 3 '18 at 12:53









                            mohamed_18mohamed_18

                            11017




                            11017























                                1














                                Back-propagation is just a method for calculating multi-variable derivatives of your model, whereas SGD is the method of locating the minimum of your loss/cost function.






                                share|improve this answer






























                                  1














                                  Back-propagation is just a method for calculating multi-variable derivatives of your model, whereas SGD is the method of locating the minimum of your loss/cost function.






                                  share|improve this answer




























                                    1












                                    1








                                    1







                                    Back-propagation is just a method for calculating multi-variable derivatives of your model, whereas SGD is the method of locating the minimum of your loss/cost function.






                                    share|improve this answer















                                    Back-propagation is just a method for calculating multi-variable derivatives of your model, whereas SGD is the method of locating the minimum of your loss/cost function.







                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Nov 14 '18 at 21:46









                                    nbro

                                    5,68384996




                                    5,68384996










                                    answered Mar 1 '18 at 4:41









                                    lf2225lf2225

                                    18915




                                    18915






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f37953585%2fwhat-is-the-difference-between-sgd-and-back-propagation%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Florida Star v. B. J. F.

                                        Danny Elfman

                                        Lugert, Oklahoma