Minimum cost multicuts for image and motion segmentation

Kardoost, Amirhossein

doi:10.25819/ubsi/10287

Citation Link: https://doi.org/10.25819/ubsi/10287

Minimum cost multicuts for image and motion segmentation

Alternate Title

Minimum Cost Multicuts für Bild- und Bewegungssegmentierung

Publication Type

Doctoral Thesis

Author

Kardoost, Amirhossein

Institute

Department Elektrotechnik - Informatik

Subjects

Image segmentation

Motion segmentation

Minimum cost multicut

Uncertainty estimation

Self-supervised learning

DDC

004 Informatik

GHBS-Clases

TVVC

TDB

TUH

Issue Date

2023

Abstract

Clustering and its application in computer vision, such as image, mesh data, video, and motion segmentation, are the main topics we discuss in this dissertation. The clustering of the entities plays a crucial role in higher-level tasks such as action recognition, robot navigation, scene understanding, and 3D reconstruction. One well-known and widely used clustering framework is the minimum cost lifted multicut problem. This framework has recently found many applications, such as image and mesh decomposition or multiple object tracking. It addresses such issues in a graph-based model, where real-valued costs are assigned to the edges between entities such that the minimum cut decomposes the graph into an optimal number of segments. Solving the multicut problem is NP-hard and computationally expensive. Therefore, we propose two variants of a heuristic solver (primal feasible heuristic), which greedily generate solutions within a bounded time. Driven by a probabilistic formulation of the minimum cost multicuts, we provide a measure for the uncertainties of the decisions made during the optimization. We argue that access to such uncertainties is crucial for many practical applications and evaluate the proposed uncertainty measure on image and motion segmentation.

To track the object masks in the video, we use low-level cues such as optical flow information and image boundaries and study the importance of such cues in providing competing and high-quality results. While high-end computer vision methods for this task rely on sequence-specific training of dedicated Convolutional Neural Network (CNN) architectures, we show the potential of a variational model based on generic video information from motion and color. The optical flow information is also used for the motion segmentation task, where observable motion in videos can give rise to the definition of objects moving with respect to the scene. This problem is usually tackled either by aggregating motion information in long, sparse point trajectories or directly producing dense segmentations per frame, relying on large amounts of training data. In this dissertation, we address the problem with the sparse motion trajectories and emphasize that generic cues such as optical flow information and image boundaries are crucial to address this and similar tasks. The complex motion patterns, such as out-of-plane rotation or scaling movement of the objects, add ambiguities to the segmentation problem. Utilizing the hyper-graphs resolve such ambiguities by modeling translational motion to Euclidean or affine transformations. We evaluate our proposed methods on well-known datasets of the addressed task and show that the integration of the low-level cues improves the result on the higher-level tasks.

DOI

10.25819/ubsi/10287

URN

nbn:de:hbz:467-24806

URI

https://dspace.ub.uni-siegen.de/handle/ubsi/2480

File(s)