Citation Link: https://doi.org/10.25819/ubsi/10505
On the robustness and generalization of deep learning approaches for image classification and reconstruction
Alternate Title
Zur Robustheit und Generalisierung von Deep-Learning-Ansätzen zur Bildklassifizierung und Rekonstruktion
Source Type
Doctoral Thesis
Institute
Issue Date
2024
Abstract
As deep learning models begin to be deployed in real-world applications, characterizing their vulnerabilities, and improving their robustness is critical to ensure reliable performance. This thesis deals with a few aspects of robustness and generalizability of deep learning models for image classification and reconstruction.
We first address the problem of robustness and invariance of neural networks to spatial transformations that can be represented as group actions. We propose a simple strategy to achieve provable invariance with respect to group actions by choosing a unique element from the orbit of transformation group. Such a simple orbit mapping can be used with any standard network architecture and still achieve desired invariance. We investigate the robustness with respect to image rotations, provable orientation and scaling invariance of 3D point cloud classification. We demonstrate the advantages of our method in comparison with different approaches which incorporate invariance via training or architecture in terms of robustness and computational efficiency.
Next, we investigate the robustness of classical and deep learning approaches to ill-posed image recovery problems, with a focus on image deblurring and computer tomography reconstruction. We demonstrate the susceptibility of reconstruction networks to untargeted, targeted and localized adversarial attacks using norm-constrained additive perturbations and study the transferability of attacks. We find that incorporating the model knowledge can, but does not always result in improved robustness. Further, localized attacks which modify semantic meaning can still maintain a high consistency with the original measurement, which could be used to deal with the ill-posedness of image recovery.
While deep neural networks are successful in many image recovery tasks, these networks are typically trained for specific forward measurement processes, and therefore do not typically generalize to even small changes in the forward model. To deal with this, we explore the use of generative model priors for flexible image reconstruction tasks. We develop a generative autoencoder for light fields conditioned on the central view, and utilize this model as a prior for light field recovery. We adopt the approach of optimizing in the latent space of the conditional generator to minimize data discrepency with the measurement, and perform simultaneous optimization of both the latent code and the central view when the latter is unavailable. We demonstrate the applicability of this approach for generic light field recovery.
Finally, we demonstrate the use of recently proposed text conditioned image diffusion models for generic image restoration and manipulation. We demonstrate flexible image manipulation by using a simple deterministic forward and reverse processes, with reverse diffusion being conditioned on target text. For consistent image restoration, we modify the reverse diffusion process of text-to-image diffusion model to analytically enforce data consistency of the solution, and explore diverse contents of null-space using text guidance. This results in diverse solutions which are simultaneously consistent with input text and the degraded inputs.
We first address the problem of robustness and invariance of neural networks to spatial transformations that can be represented as group actions. We propose a simple strategy to achieve provable invariance with respect to group actions by choosing a unique element from the orbit of transformation group. Such a simple orbit mapping can be used with any standard network architecture and still achieve desired invariance. We investigate the robustness with respect to image rotations, provable orientation and scaling invariance of 3D point cloud classification. We demonstrate the advantages of our method in comparison with different approaches which incorporate invariance via training or architecture in terms of robustness and computational efficiency.
Next, we investigate the robustness of classical and deep learning approaches to ill-posed image recovery problems, with a focus on image deblurring and computer tomography reconstruction. We demonstrate the susceptibility of reconstruction networks to untargeted, targeted and localized adversarial attacks using norm-constrained additive perturbations and study the transferability of attacks. We find that incorporating the model knowledge can, but does not always result in improved robustness. Further, localized attacks which modify semantic meaning can still maintain a high consistency with the original measurement, which could be used to deal with the ill-posedness of image recovery.
While deep neural networks are successful in many image recovery tasks, these networks are typically trained for specific forward measurement processes, and therefore do not typically generalize to even small changes in the forward model. To deal with this, we explore the use of generative model priors for flexible image reconstruction tasks. We develop a generative autoencoder for light fields conditioned on the central view, and utilize this model as a prior for light field recovery. We adopt the approach of optimizing in the latent space of the conditional generator to minimize data discrepency with the measurement, and perform simultaneous optimization of both the latent code and the central view when the latter is unavailable. We demonstrate the applicability of this approach for generic light field recovery.
Finally, we demonstrate the use of recently proposed text conditioned image diffusion models for generic image restoration and manipulation. We demonstrate flexible image manipulation by using a simple deterministic forward and reverse processes, with reverse diffusion being conditioned on target text. For consistent image restoration, we modify the reverse diffusion process of text-to-image diffusion model to analytically enforce data consistency of the solution, and explore diverse contents of null-space using text guidance. This results in diverse solutions which are simultaneously consistent with input text and the degraded inputs.
File(s)![Thumbnail Image]()
Loading...
Name
Dissertation_Gandikota_Kanchana_Vaishnavi.pdf
Size
106.85 MB
Format
Adobe PDF
Checksum
(MD5):237a1720e7070f350fe156b8800d53a1
Owning collection