Citation Link: https://doi.org/10.25819/ubsi/10255
Multidimensional Knowledge Representation through Integrative Text Mining
Alternate Title
Mehrdimensionale Wissensrepräsentation mittels Integrativem Text Mining
Source Type
Doctoral Thesis
Author
Zenkert, Johannes
Issue Date
2022
Abstract
Natural language processing and text mining methods can be used to identify and extract valuable information from unstructured texts. Methodically extracted data provide helpful results but can be difficult to interpret in their individuality and cannot be used directly as knowledge. Cognitively, we as humans are able to process unstructured data, such as natural language in text form, filter out extracted information, classify it semantically, or interpret it. Computer systems cannot do this without help because it requires meaningful processing and combination of the data and information. Knowledge-based approaches attempt to solve this problem by providing appropriate representations for data and information and, by implementing them as expert systems, offer the possibility of reaching conclusions through inference using the knowledge base.
A methodology for structuring and representing acquired information, which can lead to the transformation of data and information from text to knowledge, is conceptualized, implemented, and evaluated in case studies in this dissertation.
The developed approach is called Multidimensional Knowledge Representation (MKR), since the results of different analysis dimensions are combined into a common representation structure by applying individual single text mining approaches, so-called pipelines. The results of text analysis and facets of knowledge acquisition are stored multi-dimensional in a document-oriented database, which can serve as the basis for a knowledge base in knowledge-based applications.
Current systems and tools for text mining are mostly one-dimensional in their application and focus on a specific evaluation in the analysis. They usually provide insights for a previously defined question, which is methodically investigated within the text data as a linear process. In this context, the various perspectives and interpretations of the pipelines can be described as individual analysis dimensions. From the text information can be extracted, for example, after the pre-processing of the text, the named entities, the present topic, contained semantic relations or the sentiment.
The methods of knowledge extraction, such as named entity recognition, topic detection or sentiment analysis are mostly applied individualized by trained methods and deliver a result that
is finally interpreted. If the respective analysis question changes, the modified pipeline is often executed again in current state-of-the-art approaches. The core idea of MKR in contrast to current approaches is the support of multi-perspective questions by providing dimensional analysis results in the knowledge base. For example, complex questions such as the sentiment over time about a selected entity in a topic area can be answered efficiently by providing and accessing relevant data in the knowledge base.
In addition to the theoretical foundations of the dissertation project, which lead to the conceptualization and modeling of MKR, the implementation as KB:mkr Knowledge Base Maker is presented. Using specially created text corpora in German and English language, the representation structure is evaluated in an exploratory and case-based manner in various application and project examples in academic and industrial contexts.
A methodology for structuring and representing acquired information, which can lead to the transformation of data and information from text to knowledge, is conceptualized, implemented, and evaluated in case studies in this dissertation.
The developed approach is called Multidimensional Knowledge Representation (MKR), since the results of different analysis dimensions are combined into a common representation structure by applying individual single text mining approaches, so-called pipelines. The results of text analysis and facets of knowledge acquisition are stored multi-dimensional in a document-oriented database, which can serve as the basis for a knowledge base in knowledge-based applications.
Current systems and tools for text mining are mostly one-dimensional in their application and focus on a specific evaluation in the analysis. They usually provide insights for a previously defined question, which is methodically investigated within the text data as a linear process. In this context, the various perspectives and interpretations of the pipelines can be described as individual analysis dimensions. From the text information can be extracted, for example, after the pre-processing of the text, the named entities, the present topic, contained semantic relations or the sentiment.
The methods of knowledge extraction, such as named entity recognition, topic detection or sentiment analysis are mostly applied individualized by trained methods and deliver a result that
is finally interpreted. If the respective analysis question changes, the modified pipeline is often executed again in current state-of-the-art approaches. The core idea of MKR in contrast to current approaches is the support of multi-perspective questions by providing dimensional analysis results in the knowledge base. For example, complex questions such as the sentiment over time about a selected entity in a topic area can be answered efficiently by providing and accessing relevant data in the knowledge base.
In addition to the theoretical foundations of the dissertation project, which lead to the conceptualization and modeling of MKR, the implementation as KB:mkr Knowledge Base Maker is presented. Using specially created text corpora in German and English language, the representation structure is evaluated in an exploratory and case-based manner in various application and project examples in academic and industrial contexts.
File(s)![Thumbnail Image]()
Loading...
Name
Dissertation_Zenkert_Johannes.pdf
Size
18.3 MB
Format
Adobe PDF
Checksum
(MD5):a5e71a47a14c593c3be1cbd95a45cc80
Owning collection