Smart Speaker im Dialog. Sprachliche Praktiken mit Voice User Interfaces

Hector, Tim MoritzTim MoritzHector2025-08-292025-08-292025https://dspace.ub.uni-siegen.de/handle/ubsi/2920In der Monografie werden sprachliche Praktiken im Umgang mit stationären Sprachassistenzsystemen (Smart Speakern) wie Amazons Alexa, Google Home oder Apples Siri untersucht – sie sind Träger sog. Voice User Interfaces (VUIs) als stimmbasierte Schnittstellen zwischen Mensch und Maschine. Empirisch basiert die Arbeit auf Video- und Audioaufzeichnungen aus realen Haushalten, in denen Smart Speaker eingerichtet und genutzt werden. Methodisch ist die Arbeit gesprächsanalytisch ausgerichtet, ergänzt durch multimodale Video-Interaktionsanalyse und ethnografische Perspektiven. Im Fokus stehen sowohl dyadische Dialoge zwischen Mensch und VUI als auch komplexere Mehrpersonen-Interaktionen. Für die Analyse entscheidend ist ein praxeologisches Sprach- und Medienverständnis: Sprache wird als integraler Bestandteil sozialer Praktiken verstanden. Interfaces werden als situativ erzeugte Grenzfläche zwischen Nutzer*innen und digitaler Infrastruktur konzeptualisiert. Mithilfe des Domestizierungsansatzes wird auch eingeholt, wie Nutzer*innen sprachlich mit den Geräten umgehen, sie in ihren Alltag integrieren, sich an sie anpassen – und umgekehrt. Zentrale Befunde der Arbeit sind, dass sich Nutzer*innen im Umgang mit Smart Speakern an gesprächsorganisatorischen Routinen orientieren. Gleichzeitig ergeben sich spezifische Abwandlungen: So müssen rigide Sequenzstrukturen befolgt werden, damit die gewünschte Aktion erfolgreich ist, was Auswirkungen auf die Funktion der Anrede, die sprachliche Gestaltung des Turn-Takings und den Vollzug von Reparaturen hat. An diese Strukturen passen sich die Nutzer*innen an – sprachliche Praktiken werden somit zu Interface-Praktiken – auch wenn sich kein eigenes sprachliches Register ausbildet. Dennoch werden VUIs, gerade in Mehrparteien-Interaktionen, von Nutzer*innen häufig an der sprachlichen Oberfläche wie Gesprächspartner*innen behandelt. Diese Zuschreibung erweist sich allerdings brüchig und kann von Moment zu Moment wechseln. Sie ist teilweise vielmehr an menschliche Teilnehmer*innen gerichtet. Die untersuchten Nutzer*innen zeigen sprachliche Kompetenz, um zwischen der Anrede eines Menschen und einer Maschine differenzieren.The monograph researches linguistic practices in the use of stationary voice-controlled digital assistants (smart speakers) such as Amazon’s Alexa, Google Home, or Apple’s Siri – all of which operate through so-called Voice User Interfaces (VUIs), i.e., voice-based interfaces between humans and machines. Empirically, the study draws on video and audio recordings from real households, documenting both the initial setup and everyday use of smart speakers. Methodologically, the work is grounded in conversation analysis and enriched by multimodal video-based interaction analysis and ethnographic perspectives. The analysis focuses on both dyadic dialogues between humans and VUIs and more complex multi-party interactions. A praxeological understanding of language and media is central: language is conceptualized as an integral part of social practices, while interfaces are understood as situationally constituted between users and digital infrastructures. Drawing on domestication theory, the study also explores how users interact with these devices linguistically, how they integrate them into everyday routines, adapt to them—and how, in turn, the devices shape user practices. The study’s key findings show that users orient themselves to established conversational routines when interacting with smart speakers. However, specific variations emerge: rigid sequential structures often need to be followed for successful execution of commands, which impacts how users address the device, manage turn-taking, and perform repairs. Users adapt to these structures—linguistic practices thus become interface practices. Nevertheless, especially in multi-party interactions, users frequently treat VUIs at the surface level as if they were conversational participants. Yet this attribution proves unstable and can shift from moment to moment. At times, linguistic cues directed at the smart speaker may instead target human participants. The users studied demonstrate linguistic competence in distinguishing between addressing a human and addressing a machine.deAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/400 Sprache, LinguistikSprachassistentenMedienaneignungMedienlinguistikVoice User InterfacesMensch-Maschine-InteraktionVoice assistantsMedia appropriationMedia linguisticsHuman-machine interactionSprachgesteuerte BenutzeroberflächenSmart Speaker im Dialog. Sprachliche Praktiken mit Voice User InterfacesSmart Speakers in Dialogue. Linguistic Practices with Voice User InterfacesDoctoral Thesis10.1515/9783111574332Stephan Habscheidurn:nbn:de:hbz:467-29203