Multidimensional Knowledge Representation through Integrative Text Mining

Zenkert, Johannes

doi:10.25819/ubsi/10255

Citation Link: https://doi.org/10.25819/ubsi/10255

Multidimensional Knowledge Representation through Integrative Text Mining

Alternate Title

Mehrdimensionale Wissensrepräsentation mittels Integrativem Text Mining

Alternate Title

a knowledge base framework for extracted information from text

Publication Type

Doctoral Thesis

Author

Zenkert, Johannes

Institute

Institut für Wissensbasierte Systeme und Wissensmanagement

Subjects

Knowledge Representation

Integrative Text Mining

Visualization

Information Extraction

DDC

004 Informatik

GHBS-Clases

TVUC

TVVK

TUH

Issue Date

2022

Abstract

Natural language processing and text mining methods can be used to identify and extract valuable information from unstructured texts. Methodically extracted data provide helpful results but can be difficult to interpret in their individuality and cannot be used directly as knowledge. Cognitively, we as humans are able to process unstructured data, such as natural language in text form, filter out extracted information, classify it semantically, or interpret it. Computer systems cannot do this without help because it requires meaningful processing and combination of the data and information. Knowledge-based approaches attempt to solve this problem by providing appropriate representations for data and information and, by implementing them as expert systems, offer the possibility of reaching conclusions through inference using the knowledge base.
A methodology for structuring and representing acquired information, which can lead to the transformation of data and information from text to knowledge, is conceptualized, implemented, and evaluated in case studies in this dissertation.
The developed approach is called Multidimensional Knowledge Representation (MKR), since the results of different analysis dimensions are combined into a common representation structure by applying individual single text mining approaches, so-called pipelines. The results of text analysis and facets of knowledge acquisition are stored multi-dimensional in a document-oriented database, which can serve as the basis for a knowledge base in knowledge-based applications.
Current systems and tools for text mining are mostly one-dimensional in their application and focus on a specific evaluation in the analysis. They usually provide insights for a previously defined question, which is methodically investigated within the text data as a linear process. In this context, the various perspectives and interpretations of the pipelines can be described as individual analysis dimensions. From the text information can be extracted, for example, after the pre-processing of the text, the named entities, the present topic, contained semantic relations or the sentiment.
The methods of knowledge extraction, such as named entity recognition, topic detection or sentiment analysis are mostly applied individualized by trained methods and deliver a result that
is finally interpreted. If the respective analysis question changes, the modified pipeline is often executed again in current state-of-the-art approaches. The core idea of MKR in contrast to current approaches is the support of multi-perspective questions by providing dimensional analysis results in the knowledge base. For example, complex questions such as the sentiment over time about a selected entity in a topic area can be answered efficiently by providing and accessing relevant data in the knowledge base.
In addition to the theoretical foundations of the dissertation project, which lead to the conceptualization and modeling of MKR, the implementation as KB:mkr Knowledge Base Maker is presented. Using specially created text corpora in German and English language, the representation structure is evaluated in an exploratory and case-based manner in various application and project examples in academic and industrial contexts.

DOI

10.25819/ubsi/10255

URN

nbn:de:hbz:467-24487

URI

https://dspace.ub.uni-siegen.de/handle/ubsi/2448

File(s)