How to pre-train a deep neural network (or RNN) with unlabeled data?












0















Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).



Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?



Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!










share|improve this question

























  • I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.

    – user3390629
    Nov 14 '18 at 23:54
















0















Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).



Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?



Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!










share|improve this question

























  • I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.

    – user3390629
    Nov 14 '18 at 23:54














0












0








0








Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).



Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?



Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!










share|improve this question
















Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).



Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?



Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!







neural-network deep-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 23:47







Aaron_Geng

















asked Nov 14 '18 at 23:33









Aaron_GengAaron_Geng

566




566













  • I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.

    – user3390629
    Nov 14 '18 at 23:54



















  • I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.

    – user3390629
    Nov 14 '18 at 23:54

















I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.

– user3390629
Nov 14 '18 at 23:54





I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.

– user3390629
Nov 14 '18 at 23:54












1 Answer
1






active

oldest

votes


















2














There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.



More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.



I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.






share|improve this answer
























  • Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network

    – Aaron_Geng
    Nov 15 '18 at 0:15











  • If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.

    – user3390629
    Nov 15 '18 at 14:55











  • Yeah, you are right.

    – Aaron_Geng
    Nov 15 '18 at 23:36











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310347%2fhow-to-pre-train-a-deep-neural-network-or-rnn-with-unlabeled-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.



More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.



I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.






share|improve this answer
























  • Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network

    – Aaron_Geng
    Nov 15 '18 at 0:15











  • If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.

    – user3390629
    Nov 15 '18 at 14:55











  • Yeah, you are right.

    – Aaron_Geng
    Nov 15 '18 at 23:36
















2














There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.



More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.



I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.






share|improve this answer
























  • Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network

    – Aaron_Geng
    Nov 15 '18 at 0:15











  • If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.

    – user3390629
    Nov 15 '18 at 14:55











  • Yeah, you are right.

    – Aaron_Geng
    Nov 15 '18 at 23:36














2












2








2







There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.



More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.



I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.






share|improve this answer













There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.



More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.



I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 14 '18 at 23:53









user3390629user3390629

554512




554512













  • Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network

    – Aaron_Geng
    Nov 15 '18 at 0:15











  • If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.

    – user3390629
    Nov 15 '18 at 14:55











  • Yeah, you are right.

    – Aaron_Geng
    Nov 15 '18 at 23:36



















  • Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network

    – Aaron_Geng
    Nov 15 '18 at 0:15











  • If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.

    – user3390629
    Nov 15 '18 at 14:55











  • Yeah, you are right.

    – Aaron_Geng
    Nov 15 '18 at 23:36

















Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network

– Aaron_Geng
Nov 15 '18 at 0:15





Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network

– Aaron_Geng
Nov 15 '18 at 0:15













If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.

– user3390629
Nov 15 '18 at 14:55





If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.

– user3390629
Nov 15 '18 at 14:55













Yeah, you are right.

– Aaron_Geng
Nov 15 '18 at 23:36





Yeah, you are right.

– Aaron_Geng
Nov 15 '18 at 23:36




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310347%2fhow-to-pre-train-a-deep-neural-network-or-rnn-with-unlabeled-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues