Citation Link: https://doi.org/10.25819/ubsi/10287
Minimum cost multicuts for image and motion segmentation
Alternate Title
Minimum Cost Multicuts für Bild- und Bewegungssegmentierung
Source Type
Doctoral Thesis
Author
Institute
Issue Date
2023
Abstract
Clustering and its application in computer vision, such as image, mesh data, video, and motion segmentation, are the main topics we discuss in this dissertation. The clustering of the entities plays a crucial role in higher-level tasks such as action recognition, robot navigation, scene understanding, and 3D reconstruction. One well-known and widely used clustering framework is the minimum cost lifted multicut problem. This framework has recently found many applications, such as image and mesh decomposition or multiple object tracking. It addresses such issues in a graph-based model, where real-valued costs are assigned to the edges between entities such that the minimum cut decomposes the graph into an optimal number of segments. Solving the multicut problem is NP-hard and computationally expensive. Therefore, we propose two variants of a heuristic solver (primal feasible heuristic), which greedily generate solutions within a bounded time. Driven by a probabilistic formulation of the minimum cost multicuts, we provide a measure for the uncertainties of the decisions made during the optimization. We argue that access to such uncertainties is crucial for many practical applications and evaluate the proposed uncertainty measure on image and motion segmentation.
To track the object masks in the video, we use low-level cues such as optical flow information and image boundaries and study the importance of such cues in providing competing and high-quality results. While high-end computer vision methods for this task rely on sequence-specific training of dedicated Convolutional Neural Network (CNN) architectures, we show the potential of a variational model based on generic video information from motion and color. The optical flow information is also used for the motion segmentation task, where observable motion in videos can give rise to the definition of objects moving with respect to the scene. This problem is usually tackled either by aggregating motion information in long, sparse point trajectories or directly producing dense segmentations per frame, relying on large amounts of training data. In this dissertation, we address the problem with the sparse motion trajectories and emphasize that generic cues such as optical flow information and image boundaries are crucial to address this and similar tasks. The complex motion patterns, such as out-of-plane rotation or scaling movement of the objects, add ambiguities to the segmentation problem. Utilizing the hyper-graphs resolve such ambiguities by modeling translational motion to Euclidean or affine transformations. We evaluate our proposed methods on well-known datasets of the addressed task and show that the integration of the low-level cues improves the result on the higher-level tasks.
To track the object masks in the video, we use low-level cues such as optical flow information and image boundaries and study the importance of such cues in providing competing and high-quality results. While high-end computer vision methods for this task rely on sequence-specific training of dedicated Convolutional Neural Network (CNN) architectures, we show the potential of a variational model based on generic video information from motion and color. The optical flow information is also used for the motion segmentation task, where observable motion in videos can give rise to the definition of objects moving with respect to the scene. This problem is usually tackled either by aggregating motion information in long, sparse point trajectories or directly producing dense segmentations per frame, relying on large amounts of training data. In this dissertation, we address the problem with the sparse motion trajectories and emphasize that generic cues such as optical flow information and image boundaries are crucial to address this and similar tasks. The complex motion patterns, such as out-of-plane rotation or scaling movement of the objects, add ambiguities to the segmentation problem. Utilizing the hyper-graphs resolve such ambiguities by modeling translational motion to Euclidean or affine transformations. We evaluate our proposed methods on well-known datasets of the addressed task and show that the integration of the low-level cues improves the result on the higher-level tasks.
File(s)![Thumbnail Image]()
Loading...
Name
Dissertation_Kardoost_Amirhossein.pdf
Size
31.11 MB
Format
Adobe PDF
Checksum
(MD5):8a4f8ba5553fb4928e8652bad8e00e6b
Owning collection