top of page

Beyond Words

AI is Learning the Secret Language of Human Interaction

May 2024

by Anthony Capone

Have you ever chatted with a virtual assistant or received help from an online chatbot? AI communication technology is rapidly transforming how we interact with the digital world. From fielding questions and controlling smart devices to assisting with customer service, AI virtual assistants and chatbots are becoming increasingly familiar in our daily lives. AI communication technology is commonplace these days, and it still has a long way to go as it evolves further into the realm of human communication.


Forget simply understanding what people say. We now live in a world where machines can "read" our hidden messages—the flicker of an eye, the tremor in your voice, a subtle shift in posture. This is not science fiction. AI is rapidly evolving, going beyond the realm of Natural Language Processing (NLP) into the fascinating world of non-linguistic communication. 


AI is learning to "listen" through vocal cues and "see" through facial expressions and gestures, leading to a more comprehensive understanding of human emotions, intentions, and mental states. From revolutionizing customer service interactions to predicting mental health concerns, this research into non-linguistic communication holds vast potential across various fields. By deciphering the silent language accompanying spoken words, AI can unlock new avenues for human-computer interaction and a deeper understanding of ourselves.



AI is learning to decode vocal nuances 


AI goes beyond understanding the literal meaning of words in human communication. To achieve this, AI delves deeper into paralinguistic analysis, extracting information from vocal cues that accompany speech, such as emotional state, intent, and personality traits. By analyzing paralinguistic features like tone, pitch, rhythm, and speaking style, AI can identify sarcasm, deception, and emotional states.


This technology has many possible applications, some of which are already in use today. Customer service call centers use this AI during calls to gain real-time insights into how a customer feels. For example, Cogito is a company that provides call center AI solutions that combine NLP with paralinguistic analysis. This allows the AI tool to offer calming prompts or de-escalation techniques if a customer's voice becomes strained during the call. According to their website, their AI technology “drives better conversation outcomes and better customer experience. It reduces average handling times (AHT) and increases first-call resolutions (FCR). It eventually reduces your employee stress, contributing to their well-being.”[1]


“Cogito supplies call centers with machine learning software that aids managers in measuring agent performance across the board, identifies their call-related challenges/strengths, and helps guide them through their everyday customer interactions.”[1]


This new technology has other uses beyond customer service. Mental health professionals can gain valuable insights from subtle changes in pitch, tone, or speaking rate. By analyzing these vocal patterns, AI can identify markers of depression, anxiety, and other mental health conditions [7]. 


In the realm of market research, non-linguistic AI can provide a more objective perspective by analyzing vocal cues during focus groups or customer interviews. This allows researchers to gauge genuine emotional responses to products, services, or marketing messages, leading to a more accurate understanding of consumer sentiment.



AI is learning to listen with its eyes


AI is not just listening to vocal cues; it is also going beyond words by learning to visually interpret facial expressions [3] and human gestures.


Facial Expression Recognition


Similar to paralinguistic analysis, where AI reads vocal cues, facial expression recognition (FER) AI models are taught to read facial cues to identify and interpret human emotions. These machine-learning algorithms are trained on huge datasets of images, enabling machines to discern subtle nuances in facial features and detect and classify expressions such as anger, joy, sadness, confusion, and more. [3]


Developing AI-powered FER technology has not been a walk in the park. Accuracy is a major hurdle. Lighting changes, head positions, masks, and other factors will throw facial expression recognition off. Researchers tackled this challenge head-on by leveraging deep learning techniques. They fed massive, diverse datasets into these algorithms, training them to identify unique facial features and match them to expressions in vast databases. Despite the progress, maintaining accuracy is an ongoing battle. Researchers are constantly refining existing models and expanding datasets to keep FER sharp. [3]


Applications for FER could span many industries, from customer service to security. The education sector, for instance, has found exciting uses for FER technology. In one set of experiments, “a biometric sensor network (BSN) consisting of web cameras, a wall-mounted camera, and a high-performance computing machine was designed to capture students’ head position, eye gaze, body movement, and facial emotion. These low-level features are used to train an AI-based model to estimate the behavioral and emotional engagement in the class environment.” (4) By measuring student engagement, instructors can take actions to tailor how they present the material in class and identify methods that engage and disengage the students while also identifying students who are not engaged and could use some individual attention.


Companies in the retail, hospitality, and entertainment sectors are conducting trials and proof-of-concept projects to test the feasibility of using FER models to measure customer satisfaction and personalize services. Security companies are combining facial recognition systems with expression analysis to identify suspicious behavior or emotional states. This AI “has emerged as a valuable tool in the early detection and prediction of mental health disorders. These technologies, whether analyzing speech, text, facial expressions, or electronic health records, are transforming how mental health is diagnosed and managed.” (6)


Despite its numerous benefits, the widespread adoption of FER technology raises ethical considerations and privacy concerns regarding data collection, consent, and potential misuse. Companies must prioritize transparency, accountability, and user consent when implementing FER systems to ensure compliance with data protection regulations and mitigate risks of unintended consequences, such as bias or discrimination in algorithmic decision-making.



Gesture Recognition


Facial expressions are just one piece of the puzzle when it comes to non-linguistic communication. AI is also making significant progress in deciphering the meaning behind human gestures, posture shifts, and body language. 


Building powerful gesture recognition AI demands robust algorithms that can handle the chaos of the real world. Seamless human-to-robot interaction requires lightning-fast processing – that's where efficient algorithms and hardware come in, but striking the balance between accuracy and speed is tricky. Creating top-notch machine learning models involves choosing the right tools and training them on diverse datasets. Keep in mind that gestures come in all shapes and sizes – different hand sizes, finger positions, and occlusions can throw a wrench in the works. Constantly innovation is required to tackle these challenges and build gesture recognition systems that are both reliable and efficient. [5]


Despite the many difficulties that must be overcome to develop gesture recognition AI, this new technology has opened the doors to a multitude of exciting applications. In the realm of human-robot interaction, imagine robots that can not only understand spoken commands but also respond to hand signals or subtle postural cues. This paves the way for more seamless and intuitive collaboration between humans and machines. The world of entertainment is also being transformed; games become more immersive as players control characters or navigate virtual worlds through natural gestures. 


Perhaps the most impactful application lies in bridging communication gaps. AI-powered sign language interpretation can offer real-time translation, fostering greater accessibility and inclusivity [8]. These are just a few examples, and as AI's understanding of body language continues to evolve, the possibilities are limitless.




Future of non-linguistic communication AI


There are many more applications for these emerging AI technologies on the horizon:


  • Social Assistive Technology: AI-powered tools can provide support and companionship to people with autism spectrum disorder (ASD) or social anxiety by interpreting nonverbal cues and facilitating communication.

  • Real-Time Sentiment Analysis: Public spaces like airports or transportation hubs could use AI to analyze crowd emotions and respond to potential safety concerns or disruptions.

  • Personalized Learning: Educational tools can be designed to customize learning experiences for individual students by leveraging non-verbal cues to account for their emotional states and learning styles.

  • Advanced Emotion Recognition in Healthcare: AI can be used to assess pain levels, monitor patient recovery, or detect early signs of dementia based on facial expressions and vocal patterns.

  • Biometric Authentication: Combining facial recognition with other biometric data, such as gait analysis or heart rate, could create more secure and reliable authentication systems.



The journey into non-linguistic communication is just beginning


The ability of AI to decode non-linguistic communication marks a transformative shift in human-computer interaction. By delving into the hidden language of vocal cues, facial expressions, and gestures, AI unlocks a deeper understanding of human emotions, intentions, and mental states. This newfound ability has the potential to revolutionize fields ranging from healthcare and education to customer service and security. Imagine a future where AI-powered tutors tailor their approach based on a student's emotional engagement, or healthcare professionals leverage AI to detect early signs of illness through subtle vocal changes. The possibilities are boundless.


However, this journey is just beginning. As AI technology continues to evolve, groundbreaking applications will continue to emerge. Perhaps AI will one day decipher the complex tapestry of human connection, fostering deeper empathy and understanding between humans and machines. The potential for this technology to reshape our world and unlock a deeper understanding of ourselves is truly awe-inspiring.



 


About us: mXa, on the 20+ year foundation of Method360, was founded to intentionally serve fast-growth companies and the unique challenges they face. We understand that inorganic and organic growth provokes change, ambiguity, and uncertainty that can deeply burden the organizations involved. By seeking to understand the human element in M&A and fast growth environments, mXa embraces a unique, contrarian approach in advising clients that seeks to realize maximum value for them in alignment with business objectives.


Interested in learning more about our capabilities or discussing your M&A or AI story? We’re here to help.


References:

  1. https://cogitocorp.com/

  2. Depression Speech Recognition With a Three-Dimensional Convolutional Network

  3. Facial emotion recognition through artificial intelligence

  4. An Experimental Platform for Real-Time Students Engagement Measurements from Video in STEM Classrooms

  5. Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model

  6. Automatic Recognition of Posed Facial Expression of Emotion in Individuals with Autism Spectrum Disorder

  7. Unveiling the sound of the cognitive status: Machine Learning-based speech analysis in the Alzheimer’s disease spectrum

  8. https://research.google/blog/on-device-real-time-hand-tracking-with-mediapipe

Interested in learning more about our expertise?

mXa Logo
bottom of page