Suppose someone asked you to think about artificial intelligence, what would be the first thoughts that pop into your mind? Would you imagine robots capable of understanding and responding to human speech? How about robot assistants who can pull up information or perform an action on command? Would you think robots today have the capability to even hold a full-length conversation? In the present day, we see a burgeoning number of AI assistants available on the market that can interact with humans via speech. The Amazon Echo, for example, has sold 53.9 million units in 2020, 65 million units in 2021, and 82 million units in 2022 globally. The components that made Conversational AI possible stem from advances in the field of AI that enables computers to engage with human language called Natural Language Processing (NLP) and accompanying studies to understand the meaning in text - Natural Language Understanding (NLU) - and to generate a response - Natural Language Generation (NLG). When computers process conversations, they first take in human speech and use speech-to-text transformation to convert it into text. They then extract the meaning through text-to-text topic extraction, generate a response using masked language text-to-text processing, and finally convert the response back to spoken language using text-to-speech conversion.
Human beings possess the innate ability to comprehend and respond to one another's communication with speed. However, can this ability be replicated in machines? The discipline of social robotics has arisen to tackle this conundrum. Social robots are designed to facilitate seamless human-robot interaction. In the initial stages of development, social robots were equipped with rudimentary capabilities that necessitated minimal feedback from their human counterparts. For example, the Paro robot made by the Shibata company in the early 2000’s only needed to respond to touch. Today, technical capabilities allow for full-length conversations and take in audio and visual input. We can turn to Embodied as an example of how far we have come.
Embodied is a business-to-business social robotics company that aims to teach necessary soft skills to children across the neuro diverse spectrum through partnerships with educational institutions and healthcare providers.Embodied is also a business-to-consumer social robotics company that offers social robots directly to families seeking educational and therapeutic support for their children. The company’s flagship product is Moxie, the world’s first and most advanced AI robot friend. Moxie aims to help neuro diverse individuals to develop the necessary real-world skills by assigning daily missions as a user interacts with the robot. Although Moxie was originally launched directly to consumers, Embodied has seen an increase in demand from schools and clinics that the company is beginning to lean into.
Moxie’s head is packed with an abundance of advanced microphone and camera technology while its body has a range of sensors that gathers data in real-time, allowing the robot to employ machine-learning algorithms to respond verbally and non-verbally to users. Powered by the SocialX™ platform, Moxie is able to perceive, process, and respond to natural conversation, eye contact, facial expressions, and other behaviors. The robot has the capability to recognize and recall people, places, and things to create a personalized learning experience. The platform has an app that allows parents to see their child’s progress and developmental needs. SocialXChat™ is a subcomponent of the SocialX™ platform that augments conversations between the user and Moxie. It utilizes Computer vision, NLU, and NLG to assist Moxie in replying to questions from children. Visual feedback from the child tailors the emotive expressions made by Moxie. Audio feedback primes the robot’s responses prompting the direction of the conversation. It is important to note that in order to access Moxie’s full capabilities, there is a subscription requirement.
Image: Moxie by Embodied, Inc.
With the exception of an automated speech-recognition software, all data is processed by Moxie’s onboard processor. As Moxie becomes more familiar with the child's face, speech patterns, and developmental needs, each interaction becomes more sophisticated. Moxie achieves this by leveraging advances in reinforcement learning to self-adapt its responses. This multimodal capability in Moxie is a game changer because it merges advances from the NLU, NLG, Computer Vision, Emotion AI, and Reinforcement Learning disciplines all together.
Moxie is currently sold directly to consumers in the U.S. market. It is priced like an iPhone which includes the high-end hardware. The software, which is where the magic happens, requires a subscription to access the full learning platform following the first month of use. The subscription includes 12 different social-emotional themes and 50+ missions, and it costs around the same as a streaming service. Consumers are also able to rent Moxie. This option includes the full learning platform but requires a one year commitment.
Moxie’s capability set is pertinent for multiple markets beyond the U.S with a particular emphasis for U.K. and India. Of the 796 companies in the NLP space on Crunchbase, the top 3 countries in terms of number of NLP startups listed are the US with 304, India with 82, and the U.K. with 60 - fourth place is Israel with 38. Using the total funding amounts as a proxy for how VCs see the U.K. and Indian markets, we further see Moxie’s expansion strategy as an alignment with their investment partners. With 277 startups citing funding amounts, India is in 4th place with total funding around $118M and the U.K. is in 5th place at $108M already deployed. For reference, the U.S. is in 1st place with $1.5B, China in 2nd with $465M and Canada is in 3rd with $213M. To move into India and the UK gives Embodied penetration into the Asian and European where exciting work is being done. If fully fleshed out, Embodied can benefit from the technological spillovers originating from the startups growing adjacently.
With 312 startups citing their last funding type, we find that a majority are in the Seed and Series A stage of their journey. The disparity in the number of NLP startups popping up in these markets signals a growing appetite for NLP products. To further support this, check sizes by VCs have been increasing over the years elucidating optimism for the industry.