Skip to content

DiegoOrtego/LabelNoiseDRPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstract:

Noisy labels are an unavoidable consequence of labeling processes and detecting them is an important step towards preventing performance degradations in Convolutional Neural Networks. Discarding noisy labels avoids a harmful memorization, while the associated image content can still be exploited in a semi-supervised learning (SSL) setup. Clean samples are usually identified using the small loss trick, i.e. they exhibit a low loss. However, we show that different noise distributions make the application of this trick less straightforward and propose to continuously relabel all images to reveal a discriminative loss against multiple distributions. SSL is then applied twice, once to improve the clean-noisy detection and again for training the final model. We design an experimental setup based on ImageNet32/64 for better understanding the consequences of representation learning with differing label noise distributions and find that non-uniform out-of-distribution noise better resembles real-world noise and that in most cases intermediate features are not affected by label noise corruption. Experiments in CIFAR-10/100, ImageNet32/64 and WebVision (real-world noise) demonstrate that the proposed label noise Distribution Robust Pseudo-Labeling (DRPL) approach gives substantial improvements over recent state-of-the-art.

Examples to run our method:

  • CIFAR-10/100: "cifar10" and "cifar100" folders contain the code to run our method with 2 different label noise distributions: uniform (random_in noise type) and non-uniform noise (real_in noise type). We provide example scripts to run our approach for both noise types: "RunScripts_cifar10.sh" and "RunScripts_cifar100.sh". Both datasets are downloaded automatically when setting "--download True". The dataset have to be placed in cifar10/data/ folder (should be done automatically).
  • ImageNet32/64: "ImageNet32_64" folder contains the code to run our method with 4 different label noise distributions: uniform and non-uniform for both in-distribution and ouy-of-distribution noise. We provide an example script "RunScripts_Im32.sh" to run the method in ImageNet32. To run it in ImageNet64, change the dataset argument from "ImageNet32" to "ImageNet64". ImageNet32 and ImageNet64 requiere download from http://www.image-net.org. After download they have to be placed in ImageNet32_64/data/ folder. To facilitate selecting the same 100 in-distribution classes used in our experiments we provide txt files with the lists of in-distribution and out-of-distribution classes and image indexes, which could be replaced by the ones selected randomly the dataset class.

Main requirements:

  • Python 3.7.7
  • Pytorch 1.5.1 (torchvision 0.6.1)
  • Numpy 1.18.5
  • scikit-learn 0.23.1
  • cuda 9.2

Examples of noisy samlpes detected in WebVision

couldn't find image

Test Accuracy

Non-uniform noise 0% 10% 30% 40%
CIFAR-10 94.47 95.70 93.65 93.14
CIFAR-100 72.27 72.40 69.30 65.86
Uniform noise 0% 20% 40% 60% 80%
CIFAR-10 94.47 94.20 92.92 89.21 64.35
CIFAR-100 72.27 71.25 73.13 68.71 53.04

Please consider citing the following paper if you find this work useful for your research.

 @article{2020_arXiv_DRPL,
  title = {Towards Robust Learning with Different Label Noise Distributions},
  authors = {Diego Ortego and Eric Arazo and Paul Albert and Noel E O'Connor and Kevin McGuinness},
  year={2020},
  journal={arXiv: 2007.11866},
 } 

Diego Ortego, Eric Arazo, Paul Albert, Noel E. O'Connor, Kevin McGuinness. "Towards Robust Learning with Different Label Noise Distributions", arXiv, 2020.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published