On the robustness and generalization of deep learning approaches for image classification and reconstruction

Gandikota, Kanchana Vaishnavi

doi:10.25819/ubsi/10505

Citation Link: https://doi.org/10.25819/ubsi/10505

On the robustness and generalization of deep learning approaches for image classification and reconstruction

Alternate Title

Zur Robustheit und Generalisierung von Deep-Learning-Ansätzen zur Bildklassifizierung und Rekonstruktion

Source Type

Doctoral Thesis

Author

Gandikota, Kanchana Vaishnavi

Institute

Department Elektrotechnik - Informatik

Subjects

Deep learning

Computer vision

Generalization

Robustness

DDC

004 Informatik

GHBS-Clases

TVUC

TVVC

TUH

Issue Date

2024

Abstract

As deep learning models begin to be deployed in real-world applications, characterizing their vulnerabilities, and improving their robustness is critical to ensure reliable performance. This thesis deals with a few aspects of robustness and generalizability of deep learning models for image classification and reconstruction.
We first address the problem of robustness and invariance of neural networks to spatial transformations that can be represented as group actions. We propose a simple strategy to achieve provable invariance with respect to group actions by choosing a unique element from the orbit of transformation group. Such a simple orbit mapping can be used with any standard network architecture and still achieve desired invariance. We investigate the robustness with respect to image rotations, provable orientation and scaling invariance of 3D point cloud classification. We demonstrate the advantages of our method in comparison with different approaches which incorporate invariance via training or architecture in terms of robustness and computational efficiency.
Next, we investigate the robustness of classical and deep learning approaches to ill-posed image recovery problems, with a focus on image deblurring and computer tomography reconstruction. We demonstrate the susceptibility of reconstruction networks to untargeted, targeted and localized adversarial attacks using norm-constrained additive perturbations and study the transferability of attacks. We find that incorporating the model knowledge can, but does not always result in improved robustness. Further, localized attacks which modify semantic meaning can still maintain a high consistency with the original measurement, which could be used to deal with the ill-posedness of image recovery.
While deep neural networks are successful in many image recovery tasks, these networks are typically trained for specific forward measurement processes, and therefore do not typically generalize to even small changes in the forward model. To deal with this, we explore the use of generative model priors for flexible image reconstruction tasks. We develop a generative autoencoder for light fields conditioned on the central view, and utilize this model as a prior for light field recovery. We adopt the approach of optimizing in the latent space of the conditional generator to minimize data discrepency with the measurement, and perform simultaneous optimization of both the latent code and the central view when the latter is unavailable. We demonstrate the applicability of this approach for generic light field recovery.
Finally, we demonstrate the use of recently proposed text conditioned image diffusion models for generic image restoration and manipulation. We demonstrate flexible image manipulation by using a simple deterministic forward and reverse processes, with reverse diffusion being conditioned on target text. For consistent image restoration, we modify the reverse diffusion process of text-to-image diffusion model to analytically enforce data consistency of the solution, and explore diverse contents of null-space using text guidance. This results in diverse solutions which are simultaneously consistent with input text and the degraded inputs.

DOI

10.25819/ubsi/10505

URN

nbn:de:hbz:467-27187

URI

https://dspace.ub.uni-siegen.de/handle/ubsi/2718

File(s)