Learning deep visual features for Minimum Cost Multicut Problem

Ho, Kalun

doi:10.25819/ubsi/10296

Citation Link: https://doi.org/10.25819/ubsi/10296

Learning deep visual features for Minimum Cost Multicut Problem

Alternate Title

Lernen tiefer visueller Merkmale für Minimum Cost Multicut Problem

Source Type

Doctoral Thesis

Author

Ho, Kalun

Institute

Department Elektrotechnik - Informatik

Subjects

Unsupervised learning

Minimum cost multicut problem

Computer Vision

DDC

004 Informatik

GHBS-Clases

TVVC

TVUC

TUH

Issue Date

2023

Abstract

Image clustering is one of the most important task of unsupervised learning in the area of computer vision. Deep learning approaches allow models to be trained on large datasets. In this thesis, image clustering objectives in the context of Triplet Loss induced embedding space are evaluated. Specifically, a simplification of the well-known Triplet Loss is proposed for learning an embedding space from data. This proposed loss function is designed for the Minimum Cost Multicut Problem. Furthermore, we highlight one key aspect of the Minimum Cost Multicut Problem in terms of scalability and propose a novel approach to overcome this issue. We show empirically, that the proposed algorithm achieves a significant speedup while preserving the clustering accuracy at the same time. The algorithm is able to cluster a dataset with approximately 100.000 images in under one minute using 40 computing threads, where the embedding space is trained with the simplified Triplet Loss. We then apply our proposed loss function on multiple person tracking problems. This problem is treated as a clustering problems on an imbalanced dataset, where each individual, unique person from the scene is considered as one cluster. We compare the tracking performance from two different approaches: the proposed Triplet Loss and an AutoEncoder architecture with reconstruction loss. Experiments show the effectiveness of the clustering task on a tracking application.
Finally, we provide an empirical study on embedding space, trained on classification models. Various state-of-the-art models are evaluated against image corruptions. Our key finding suggests to utilize clustering as a predictor for model robustness.

DOI

10.25819/ubsi/10296

URN

nbn:de:hbz:467-24893

URI

https://dspace.ub.uni-siegen.de/handle/ubsi/2489

License

http://creativecommons.org/licenses/by/4.0/

File(s)