A Modern Little Mermaid Dilemma: Is it worth losing one’s voice by using AI if one gains human acceptance?

Invited AI Aura Article –A Modern Little Mermaid Dilemma: Is it worth Losing One’s Voice by Using AI if One Gains Human Acceptance? Written by: Vanessa Preast, PhD

Image generated by CoPilot.

It was unnerving. I listened to the podcast that my workshop leaders assigned to us as preparation for the next day’s session. If not told in advance, I doubt I would have realized this audio was entirely AI-generated. I listened over and over because I couldn’t put my finger on why I found it troubling. So caught up in analyzing the vocal tones, rhythms, and the way that the two voices interacted, I had difficulty paying attention to the content.

     I was hearing the voices of a man and a woman presenting the content in a conversational style. I could have easily been listening to an NPR Planet Money broadcast. These voices were building off each other’s points and making noises like “um hmm” to encourage the other. I heard verbal filler language such as “um,” “like,” or “right;” and the voices occasionally talked over each other. The voices would even add personal commentary on the content, as saying, “that’s a mouthful” after announcing the lengthy title of the research paper they were discussing. These were all characteristics I had noticed in radio chat show conversations. Are these the characteristics that make discussion between speakers seem “natural”?

     When I uploaded a public-domain book into Notebook LM to create an audio file to help me understand the content, the thrall these AI-generated podcasts once held over me began to fade. The spell dissipated as I noticed strange turns of phrase, odd noises from partial utterances, and an unnatural pattern to the conversation. Something was off. It was formulaic in some not-quite-definable way. The voices in my podcast sounded too similar to those in the instructor’s audio file. Perhaps I was experiencing a disconnect between the tone of the voices, the depth of the content, and the pace of delivery. I noticed I felt tense while listening, possibly because I questioned how accurately the audio represented the actual content. I usually find educational podcasts to be relaxing and enjoyable, but this experience felt like work.

     I wondered how other people would react to the audio clips. The AI-generated imitation of a human radio-show almost fooled me into believing it was two humans communicating with each other. Would the average neurotypical person notice the voices were not real people? If so, what characteristics of the audio were and were not resonating with them? With these questions, I began to see AI-generated podcasts less as a way of understanding or presenting the content I upload into it, but as a tool to explore the features of “natural” and “normal” conversation.

     I simultaneously felt like the wrong and the right person to analyze this phenomenon. I have no specialized training in linguistics, and I had no intention of engaging in a formal study. Yet, my entire life had been a careful, informal study of human behavior. As a fish in disguise, I was swimming among humans while trying to make sense of their perplexing social world. My attempts at mimicking observed human social interactions and enacting advice from self-help books have generally resulted in a passable facsimile of a normal human, but too often I’d still get it very wrong. With years of trial-and-error, I’d developed some sense of things being “off” in a social situation, but I often lacked the ability to articulate why. Painful life experiences drew my attention to human communication styles in ways that those passing seamlessly and confidently through social situations were unlikely to question. If generative AI is an algorithm predicting the most common word patterns, I wondered what generative AI could teach about the subtleties of neurotypical conversation to someone who ran on social scripts. It seemed reasonable to expect a generative AI trained on a mass of neurotypical conversations–far more than any I could hope to analyze in my lifetime–might better approximate the most socially-acceptable human response to a situation than I could with my limited life experiences and mental processing capabilities.

     Realizing that an AI system could mimic social scripts in a way that sounds convincingly human, I planned to use Microsoft CoPilot to “translate” some critical messages to be more palatable to the recipient. I wondered if I could compare what I wrote to the more empathetic language produced by the AI to find patterns that I could adapt into my own social algorithms and better connect with other humans. Whenever I had to contact someone about a touchy subject, I turned to AI for suggested language, knowing from prior experience that my own words would be inadequate. The results were almost-instantaneous draft emails expressing concern with a firm, yet empathetic reinforcement of the policies. I edited lightly, and I sent them off. The recipients responded well. If I had tried to do this work unaided, it would have taken me a couple of anxiety-filled hours to get the wording right, and I would have still felt uncertain that my words would have the desired impact.

     Unfortunately, several iterations of feeding my draft into CoPilot and observing what emerged failed to provide me with the specific insights I needed to achieve the same results unaided. I did notice two patterns. First, CoPilot insisted on adding an irksome first sentence of “I hope this message finds you well” to each email. Second, the AI generated messages added more filler language that “softened” my tone. Neither of these was my natural style. How did I feel about this?

     The AI did help me quickly and effectively connect with another human being in an asynchronous communication format, which was my foremost goal. If I could train my voice to more automatically resemble what the AI produced and this resulted in greater human connection, would this be a lamentable sacrifice of my unique voice, a laudable demonstration of growth as a writer, or something in between? I wondered what characteristics determined the range of my “authentic voice” and the degree to which AI was changing my language beyond context-based code-switching into a realm that was no longer my voice. How much is my authentic voice worth to me and to others? Abundant personal experience taught me that I had been more successful in connecting with other humans when I spoke or wrote in a style that was not natural to me. Ultimately, if I must choose between retaining my voice or enjoying human connection, sacrificing my voice to an AI sea-witch is a small price to pay for love. 

Acknowledgement: minor edits done using CoPilot.