How do the loss weights work in Tensorflow?












0















I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.



loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss


Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ... and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.




weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.




I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], .... This also does not provide any improvement except being slightly worse than unweighted version.



Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?










share|improve this question

























  • Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

    – jdehesa
    Nov 13 '18 at 14:38
















0















I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.



loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss


Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ... and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.




weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.




I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], .... This also does not provide any improvement except being slightly worse than unweighted version.



Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?










share|improve this question

























  • Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

    – jdehesa
    Nov 13 '18 at 14:38














0












0








0








I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.



loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss


Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ... and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.




weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.




I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], .... This also does not provide any improvement except being slightly worse than unweighted version.



Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?










share|improve this question
















I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.



loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss


Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ... and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.




weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.




I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], .... This also does not provide any improvement except being slightly worse than unweighted version.



Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?







python tensorflow machine-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 14:48







minerals

















asked Nov 13 '18 at 14:18









mineralsminerals

1,85483459




1,85483459













  • Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

    – jdehesa
    Nov 13 '18 at 14:38



















  • Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

    – jdehesa
    Nov 13 '18 at 14:38

















Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

– jdehesa
Nov 13 '18 at 14:38





Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

– jdehesa
Nov 13 '18 at 14:38












1 Answer
1






active

oldest

votes


















2














Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y, where x in the input vector, y is the output vector and W is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W by minimizing the least-squared error for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.



In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.



More generally you can do...



ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]





share|improve this answer


























  • So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

    – minerals
    Nov 13 '18 at 15:49













  • Looks backwards. I think you want [0.99, 0.01]

    – bivouac0
    Nov 13 '18 at 16:06











  • Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

    – minerals
    Nov 14 '18 at 16:23











  • This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

    – bivouac0
    Nov 14 '18 at 17:26











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283056%2fhow-do-the-loss-weights-work-in-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y, where x in the input vector, y is the output vector and W is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W by minimizing the least-squared error for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.



In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.



More generally you can do...



ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]





share|improve this answer


























  • So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

    – minerals
    Nov 13 '18 at 15:49













  • Looks backwards. I think you want [0.99, 0.01]

    – bivouac0
    Nov 13 '18 at 16:06











  • Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

    – minerals
    Nov 14 '18 at 16:23











  • This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

    – bivouac0
    Nov 14 '18 at 17:26
















2














Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y, where x in the input vector, y is the output vector and W is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W by minimizing the least-squared error for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.



In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.



More generally you can do...



ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]





share|improve this answer


























  • So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

    – minerals
    Nov 13 '18 at 15:49













  • Looks backwards. I think you want [0.99, 0.01]

    – bivouac0
    Nov 13 '18 at 16:06











  • Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

    – minerals
    Nov 14 '18 at 16:23











  • This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

    – bivouac0
    Nov 14 '18 at 17:26














2












2








2







Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y, where x in the input vector, y is the output vector and W is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W by minimizing the least-squared error for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.



In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.



More generally you can do...



ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]





share|improve this answer















Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y, where x in the input vector, y is the output vector and W is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W by minimizing the least-squared error for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.



In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.



More generally you can do...



ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 '18 at 15:30

























answered Nov 13 '18 at 15:09









bivouac0bivouac0

1,2381415




1,2381415













  • So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

    – minerals
    Nov 13 '18 at 15:49













  • Looks backwards. I think you want [0.99, 0.01]

    – bivouac0
    Nov 13 '18 at 16:06











  • Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

    – minerals
    Nov 14 '18 at 16:23











  • This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

    – bivouac0
    Nov 14 '18 at 17:26



















  • So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

    – minerals
    Nov 13 '18 at 15:49













  • Looks backwards. I think you want [0.99, 0.01]

    – bivouac0
    Nov 13 '18 at 16:06











  • Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

    – minerals
    Nov 14 '18 at 16:23











  • This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

    – bivouac0
    Nov 14 '18 at 17:26

















So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49







So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49















Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06





Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06













Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23





Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23













This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26





This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283056%2fhow-do-the-loss-weights-work-in-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues