-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Generalized dice loss for multi-class segmentation #9395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, I came across a similar issue. The dataset is imbalanced, and the region which is small compared to the whole image cannot be well segmented. I think it has nothing to do with your loss function, maybe it is due to patch-based approach. How large you choose your patch size? |
And I think the problem with your loss function is the weights are not normalized. I think a normalized weights should be what you want. And w = 1/(w**2+0.00001) maybe should be rewritten as something like w = w/(np.sum(w)+0.00001). Otherwise, the generalized loss is not 'balanced', the region which takes a larger portion of the image accounts for a relatively small part in the total loss. |
Hey xynechunc, thanks for your answer! I tried normalizing the weights, but it didn't do any difference. While it is true that the weight values are better interpretable (instead of values around 10^-10 I have now values between 0 and 1), it seems that numerically it does not change the loss behaviour. Why are you asking about the patch size? Aren't the weights supposed to cope with the unbalanced class problem? Anyway, my patch size is 56x56x56 voxel, and my objects have a diameter of 10 voxel. In my patches, I have in average 7% of voxels labeled as object, the rest is background. |
I am trying something similar for a 2D semantic segmentation project with 10 categories (label 0 is background). Before trying dice, I was using sparse categorical crossentropy with very good results. However, because label 0 was being included in the loss calculation, both training and validation accuracy were artificially high (> 0.98). My implementation of dice is based on this: https://github.com/Lasagne/Recipes/issues/99. y_true has shape (batch,m,n,1) and y_pred has shape (batch,m,n,10). Here is my version of dice:
A model trained with the above implementation of dice tends to predict 4 out of the 9 categories and the segmentation is less than ideal and much worse than I got with sparse categorical crossentropy. However, when I convert the segmentation task into a binary decision (merge all categories into one), the segmentation is pretty good. Here is the loss function changed for a binary problem:
Not sure if the binary results are better because it is an 'easier' task or because my dice loss function is wrong. |
Sorry for my late reply. The email you sent to me went to junk email.
In my understanding, you problem is the result of class imbalance. A common way to solve it is choosing a better sampling approach.
Hope it helps!
Xiaoyang
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Juan Pablo Centeno <[email protected]>
Sent: Tuesday, March 6, 2018 6:52:41 PM
To: keras-team/keras
Cc: Chen, Xiaoyang; Comment
Subject: Re: [keras-team/keras] Generalized dice loss for multi-class segmentation (#9395)
I am trying something similar for a 2D semantic segmentation project with 10 categories (label 0 is background). Before trying dice, I was using sparse categorical crossentropy with very good results. However, because label 0 was being included in the loss calculation, both training and validation accuracy were artificially high (> 0.98). My implementation of dice is based on this: Lasagne/Recipes#99<url>.
y_true has shape (batch,m,n,1) and y_pred has shape (batch,m,n,10). Here is my version of dice:
def dice_coef_9cat(y_true, y_pred, smooth=1e-7):
'''
Dice coefficient for 10 categories. Ignores background pixel label 0
Pass to model as metric during compile statement
'''
y_true_f = K.flatten(K.one_hot(K.cast(y_true, 'int32'), num_classes=10)[...,1:])
y_pred_f = K.flatten(y_pred[...,1:])
intersect = K.sum(y_true_f * y_pred_f, axis=-1)
denom = K.sum(y_true_f + y_pred_f, axis=-1)
return K.mean((2. * intersect / (denom + smooth)))
def dice_coef_9cat_loss(y_true, y_pred):
'''
Dice loss to minimize. Pass to model as loss during compile statement
'''
return 1 - dice_coef_9cat(y_true, y_pred)
A model trained with the above implementation of dice tends to predict 4 out of the 9 categories and the segmentation is less than ideal and much worse than I got with sparse categorical crossentropy.
However, when I convert the segmentation task into a binary decision (merge all categories into one), the segmentation is pretty good. Here is the loss function changed for a binary problem:
def dice_coef_binary(y_true, y_pred, smooth=1e-7):
'''
Dice coefficient for 2 categories. Ignores background pixel label 0
Pass to model as metric during compile statement
'''
y_true_f = K.flatten(K.one_hot(K.cast(y_true, 'int32'), num_classes=2)[...,1:])
y_pred_f = K.flatten(y_pred[...,1:])
intersect = K.sum(y_true_f * y_pred_f, axis=-1)
denom = K.sum(y_true_f + y_pred_f, axis=-1)
return K.mean((2. * intersect / (denom + smooth)))
def dice_coef_binary_loss(y_true, y_pred):
'''
Dice loss to minimize. Pass to model as loss during compile statement
'''
return 1 - dice_coef_binary(y_true, y_pred)
Not sure if the binary results are better because it is an 'easier' task or because my dice loss function is wrong.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#9395 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AeoH50BrTK6TulQ7vH7t1VZyG7yLCzWfks5tbyFIgaJpZM4SGok6>.
|
@xychenunc thanks for your answer, I realised also that the problem is class imbalance. I just dont understand why, because the weights in the loss function are supposed to cope for that. @jpcenteno80 does sparse-categorical-crossentropy work better than normal categorical-crossentropy in the case of segmentation? I tried it out myself, but I'm getting an error concerning array shapes: Concerning your loss function 'dice_coef_9cat_loss': I don't think it is a good idea to ignore background. Examples of "non-objects" are as important as examples of "objects", the problem is class imbalance. If you completely ignore the background, chances are that you'll get a lot of false positives. Check out the ref I cited in my original post, they describe how to implement dice loss for multiple imbalanced classes. Maybe your implementation will work, because mine doesn't and I don't understand why. |
When using sparse categorical crossentropy, you don't need to one-hot encode your target. Here is a discussion comparing crossentropy vs sparse crossentropy in stackoverflow (google: TensorFlow: what's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?) <- Copying the link wasn't working for me. I ended up getting decent results with the 'dice_coef_9cat_loss' function after all (neglecting the background label). It just took longer training and starting with lower learning rates (Nadam, 1e-5 or 1e-4). I also tried cyclical learning rates, which I think helped: https://github.com/bckenstler/CLR. I decided to neglect background in the loss calculation because my class imbalance was pretty large and I could not figure out in Keras how to use 'sample_weight' in the fit method with 2D arrays. I'm also transitioning into pytorch and I like that it seems more flexible in terms of setting up custom metrics or loss functions. I am using the tiramisu architecture for semantic segmentation which uses negative log likelihood as the loss (implementation here: https://github.com/bfortuner/pytorch_tiramisu). The results so far are great. I highly recommend using this architecture for semantic segmentation. Have not tried it in 3D though. |
I've taken the same approach as @jpcenteno80 as I am also unable to successfully implement the generalised dice loss. Would rather avoid using temporal sample weights. |
Hey guys, I found a way to implement multi-class dice loss, I get satisfying segmentations now. I implemented the loss as explained in ref : this paper describes the Tversky loss, a generalised form of dice loss, which is identical to dice loss when alpha=beta=0.5 Here is my implementation, for 3D images:
I would be curious to know if this works for your applications. To adapt from 3D images to 2D images, you should modify all "sum(...,(0,1,2,3))" to "sum(...,(0,1,2))". |
@lazyleaf, I just stumbled upon this. I am doing 3D segmentation on multiclass. I will definitely try out the proposed method and see how it works. However, I also have another solution that has worked for me in the past:
This simply calculates the dice score for each individual label, and then sums them together, and includes the background. The best dice score you will ever get is equal to |
@lazyleaf, I was also struggling to implement this loss function. But with som inspiration from your code, here is my take on it (for 2D images).
|
@lazyleaf thank you for pointing to the tversky loss. I implemented your code (had to change K.shape --> K.int_shape ) but it still complaints that "TypeError: long() argument must be a string or a number, not 'NoneType'". Do you know why this is happening, and do you see it in your own code? |
@lkshrsch To remove the compilation error i have replaced "ones = K.ones(K.shape(y_true))" with "ones = K.ones_like(y_true)". |
@kroskal Thanks for the implementation. Did you try training a network yet with this loss? |
I have the generalized_dice_coef and generalized_dice_loss now working between [0 1] for 2D images. I normalized the weights to the presence of the class in the entire dataset instead of just the batch, using the following code:
|
Has anyone successfully trained a network with @lazyleaf 's implementation of the tversky loss yet? |
I trained a RGB U-Net with 2 output classes using tversky @lazyleaf 's loss and it worked great. Got 96% accuracy in only 32 epochs. I am using Tensorflow-GPU for the backend, if you use Theano the tensors are in a different order and may work differently. It is critical that your inputs are arranged in the correct order, beware of reshapes that might not work how you expect. |
Has anyone successfully trained a network with generalized dice loss for 5-class segmentation ? It is strange that i just can segment out one of class in five classes. |
@lazyleaf |
@jpcenteno80 Can you tell me what is the activation function of the model you used for this dice_coef_binary_loss loss function, softmax?and model last layer output shape? |
@lazyleaf Hi, thank you for your code. But when I 'm trying to implement this loss to my mulit-class segmentation, which has 3 classes, the outpu are just of two classes. Do you know what may cause the problem? |
@lazyleaf Hi, thank you for this excellent general code that can be dice_coef as well as IOU loss depend on the alpha and beta values you choose. Previously tried manually assigned higher weights for classes with less number of pixels. It does solve the problem that certain classes don't appear in the prediction. However the method will cause some false positives. After I tried this dice loss, it works great. All classes gets predicted and there is no false positives. |
Closing as this is resolved |
@lazyleaf is |
@lazyleaf, when I write Since I am implementing for Caffe, I have to write the gradients manually, Is that okay that I have only calculated |
I am trying to perform semantic segmentation in TensorFlow 1.10 with eager execution with the generalized dice loss function:
However, I am struggling to get any meaningful loss which isn't always 1. What am I doing wrong here? After the initial weights (one for each class) are calculated, they contain many
which seems fine to me, though they are pretty small. The numerators (
which also looks reasonable, since they're basically the labels' respective sizes times the network's certainty about them (which is likely low in the beginning of training). The denominators (
These are large, but that is to be expected since the class probabilities of a pixel sum to 1, and therefore the sum of these denominators should more or less equal the amount of pixels with ground truth. However, summing the numerators gives a very small sum (~0.001, though occasionally it's in a single digit range) while the denominator sums to very large values. This results in my final loss being exclusively 1, or something really close to that. Does anyone know how I can mitigate this effect and obtain stable gradients? |
@gattia your solution worked for me, thanks. |
@samra-irshad, glad that it worked out for you! It is a simple method of doing it, but makes sense and seems to work. |
Since this seems to be quite active despite being closed, and where I ended up from google I'll link my solution to the same problem on stack overflow |
@gattia how would you modify your code for a 2D application? my images are 2D with channels last and have 4 classes. Thanks in advance for any insight you can provide. For reference this your original 3D code:
|
@mptorr thanks for the interest. in this line: the indices for y_true and y_pred are:
So, if you want a 2d image with channels (
So, that indexing of y_true and y_pred becomes :
and the line of code that needs to change becomes:
Hope that helps. |
@gattia thanks so much, that's extremely helpful! |
I don't know what's wrong but I got negative loss and keeping away from 0 with this implementation despite of the dice score is keeping increasing. |
@gattia @naivomah3 I also got negative loss and keeping away from 0. However, the OA and mIoU indicated that the implementation can work. So, is the negative loss right? |
Negative dice is correct. This is being done in the code at: dice -= dice_coef(y_true[:,index,:,:,:], y_pred[:,index,:,:,:]) It is purposely subtracting each of the calculated dice scores. So, a more negative number is better. |
@gattia
def loss_gt(e=1e-8): return loss_gt_ |
@sneh-debug it’s hard to follow the code and other copied items. Looks like there are defs within other defs, but I can’t really tell. If you format the code (proper indentation etc) I will try to review. Initially, looking at the loss, from training printout something seems wrong. Looks like the code tries to do 1- dice to get the loss. So, max dice should be 1 and lower should be better (lowest possible = 0). However, you are getting Dice loss >1, so this seems like a problem. |
@gattia with indentation: y_true_f = K.flatten(y_true)
def loss_gt(e=1e-8):
def loss_VAE(input_shape, z_mean, z_var, weight_L2=0.1, weight_KL=0.1):
model.compile( |
I noticed that this implementation of multi-class dice loss could lead to sub-optimal performance in some cases (probably depending on specific dataset and/or architecture design). That is, for some minority class(es), the prediction can be nothing. Have you encountered this problem in your work and how did you solve it? Thanks |
This can sometimes be resolved by tuning other parameters (learning rate, optimizer, etc.) If you have many classes that are imbalanced it can cause issues or not equally weight them. In this case, you can create weighting schemes that can sometimes help. |
I've implemented a bunch of binary and multi-class loss functions for image segmentation (one-hot encoded masks) that you might find useful: https://github.com/maxvfischer/keras-image-segmentation-loss-functions E.g.: |
Hello - I am working on 4 class segmentation problem, so I have 4 labels. I am able to get combined dice scores and losses using the functions below:
How can I get dice coefficient and dice loss per label instead of a combined dice coefficient (see below)?
|
@rohan19250 I don't know what optimizer you're using during training, but presuming that you're using a gradient-based optimizer like SGD or ADAM, you want a single loss value to be able to optimize the network. That being said, if you still want to compute You could probably try to add something like this (NOTE: I have not tested this code. Think of it as pseudo-code): def dice_coef_single_label(class_idx: int, name: str, epsilon=1e-6) -> Callable[[tf.Tensor, tf.Tensor], tf.Tensor]:
def dice_coef(y_true: tf.Tensor, y_pred: tf.Tensor) -> tf.Tensor:
# Extract single class to compute dice coef
y_true_single_class = y_true[..., class_idx]
y_pred_single_class = y_pred[..., class_idx]
intersection = K.sum(K.abs(y_true_single_class * y_pred_single_class))
return (2. * intersection) / (K.sum(K.square(y_true_single_class)) + K.sum(K.square(y_pred_single_class)) + epsilon)
dice_coef.__name__ = f"dice_coef_{name}" # Set name used to log metric
return dice_coef Then compile your model in this fashion: # The order needs to be the same as the order in your target/label tensor.
# In this case, you need to have (None, <IMG_HEIGHT>, <IMG_WIDTH>, 4)
classes = ['dog', 'cat', 'horse', 'bird']
model.compile(optimizer=<YOUR_OPTIMIZER>,
loss=<YOUR_LOSS>,
metrics=[
*[dice_coef_single_label(class_idx=idx, name=class_name)
for idx, class_name in enumerate(classes)]
]
) |
Thanks a lot @maxvfischer! This worked. So we are not summing over the last axis (excluded"axis=-1") in this function for individual labels. Could you explain this a bit? |
What But in my code, we're extracting the true values and the predictions for a single class: y_true_single_class = y_true[..., class_idx]
y_pred_single_class = y_pred[..., class_idx] The shape will go from
to
If you would keep Hope it explains why I removed it. EDIT: |
@maxvfischer Got it! this is really helpful. My Y_val is of shape (2880, 192, 192, 4). I am using the combined dice coefficient loss function for overall network, and just calculating individual dice coefficients. The functions I defined based on your above code for individual classes:
It seems the combined validation dice coefficient is good, but the individual class dice coefficients are not that good (shown first few epochs). Perhaps I should explore other loss functions and data augmentation (Y train ~13000 images)? |
@rohan19250 Impossible for me to answer by the information you've provided.
I would probably think more about your problem:
|
@maxvfischer - I have trained for 40 epochs(Adam optimizer,lr=1e-5). The individual dice coefficients converge to ~.29 for the first class, ~.46 for the second class, and ~.42 for the third class. Also, I am able to do predictions and see the ground truth and the predicted labels for some of the test images(which is a bit satisfying), but was concerned about the individual dice scores not good which definitely could be worked upon. I will explore the other suggestions you provided. |
@rohan19250 For interpretability, you might want to use intersection over union/Jaccard Index (https://en.wikipedia.org/wiki/Jaccard_index) for each class as a metric instead of dice coefficient: def iou_metric(class_idx: int, name: str = 'iou') -> Callable[[tf.Tensor, tf.Tensor], tf.Tensor]:
def iou(y_true: tf.Tensor, y_pred: tf.Tensor) -> tf.Tensor:
# Array with True where prob. is highest and False elsewhere
y_pred_bool = (K.max(x=y_pred, axis=-1, keepdims=True) == y_pred)
# Generate one-hot vector
y_pred = tf.where(y_pred_bool, 1., 0.)
# Extract single class to compute IoU over
y_true_single_class = y_true[..., class_idx]
y_pred_single_class = y_pred[..., class_idx]
# Compute IoU
intersection = K.sum(y_true_single_class * y_pred_single_class)
union = K.sum(y_true_single_class) + K.sum(y_pred_single_class) - intersection
# union = 0 means that we predicted nothing and the ground truth contained no labels.
# This will be treated as a perfect IoU-score = 1.
return K.switch(K.equal(union, 0.), 1., intersection / union)
iou.__name__ = f"iou_{name}" # Set name used to log metric
return iou Good luck! |
Won't there been a weight for each label multiplied to dice_coef, weight that depicts the number of pixels for that label ? |
What you are describing is one version of the DSC that has been called generalized-dsc. I think they inversely weighted based on the number of pixels labeled that class.
the above would replace the code in the for loop to inversely weight it based on the number of pixels labeled that particular class (in the ground truth). |
Hey guys, I just implemented the generalised dice loss (multi-class version of dice loss), as described in ref :
(my targets are defined as: (batch_size, image_dim1, image_dim2, image_dim3, nb_of_classes))
But something must be wrong. I'm working with 3D images that I have to segment for 4 classes (1 background class and 3 object classes, I have a imbalanced dataset). First odd thing: while my train loss and accuracy improve during training (and converge really fast), my validation loss/accuracy are constant trough epochs (see image). Second, when predicting on test data, only the background class is predicted: I get a constant volume.
I used the exact same data and script but with categorical cross-entropy loss and get plausible results (object classes are segmented). Which means something is wrong with my implementation. Any idea what it could be?
Plus I believe it would be usefull to the keras community to have a generalised dice loss implementation, as it seems to be used in most of recent semantic segmentation tasks (at least in the medical image community).
PS: it seems odd to me how the weights are defined; I get values around 10^-10. Anyone else has tried to implement this? I also tested my function without the weights but get same problems.
The text was updated successfully, but these errors were encountered: