How to pre-train a deep neural network (or RNN) with unlabeled data?
Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
neural-network deep-learning
add a comment |
Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
neural-network deep-learning
I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.
– user3390629
Nov 14 '18 at 23:54
add a comment |
Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
neural-network deep-learning
Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
neural-network deep-learning
neural-network deep-learning
edited Nov 14 '18 at 23:47
Aaron_Geng
asked Nov 14 '18 at 23:33
Aaron_GengAaron_Geng
566
566
I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.
– user3390629
Nov 14 '18 at 23:54
add a comment |
I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.
– user3390629
Nov 14 '18 at 23:54
I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.
– user3390629
Nov 14 '18 at 23:54
I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.
– user3390629
Nov 14 '18 at 23:54
add a comment |
1 Answer
1
active
oldest
votes
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network
– Aaron_Geng
Nov 15 '18 at 0:15
If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.
– user3390629
Nov 15 '18 at 14:55
Yeah, you are right.
– Aaron_Geng
Nov 15 '18 at 23:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310347%2fhow-to-pre-train-a-deep-neural-network-or-rnn-with-unlabeled-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network
– Aaron_Geng
Nov 15 '18 at 0:15
If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.
– user3390629
Nov 15 '18 at 14:55
Yeah, you are right.
– Aaron_Geng
Nov 15 '18 at 23:36
add a comment |
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network
– Aaron_Geng
Nov 15 '18 at 0:15
If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.
– user3390629
Nov 15 '18 at 14:55
Yeah, you are right.
– Aaron_Geng
Nov 15 '18 at 23:36
add a comment |
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
answered Nov 14 '18 at 23:53
user3390629user3390629
554512
554512
Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network
– Aaron_Geng
Nov 15 '18 at 0:15
If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.
– user3390629
Nov 15 '18 at 14:55
Yeah, you are right.
– Aaron_Geng
Nov 15 '18 at 23:36
add a comment |
Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network
– Aaron_Geng
Nov 15 '18 at 0:15
If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.
– user3390629
Nov 15 '18 at 14:55
Yeah, you are right.
– Aaron_Geng
Nov 15 '18 at 23:36
Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network
– Aaron_Geng
Nov 15 '18 at 0:15
Thanks, It sounds weird but make sense, so, it means using the auto-encoders to train each layer separately and then take the weights as initial weight of the original neural network
– Aaron_Geng
Nov 15 '18 at 0:15
If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.
– user3390629
Nov 15 '18 at 14:55
If you were to use an auto-encoder as a pre-training technique, I don't think you would use it to train each layer separately. Rather you would train an auto-encoder, then grab the encoder portion of the network and re-use those layers in another architecture. In that case, those encoder layers have been trained jointly, not separately. Again, I'm not sure I've ever seen a paper take that approach, but it wouldn't surprise me.
– user3390629
Nov 15 '18 at 14:55
Yeah, you are right.
– Aaron_Geng
Nov 15 '18 at 23:36
Yeah, you are right.
– Aaron_Geng
Nov 15 '18 at 23:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310347%2fhow-to-pre-train-a-deep-neural-network-or-rnn-with-unlabeled-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I've answered, but this question may be more appropriate on a site like cross-validated. Would not be surprised to see it migrated.
– user3390629
Nov 14 '18 at 23:54