Few important terms used in this blog post:
1- Droid - (in science fiction) a robot.
2- IRL – In Real Life
3- Wake Words – They are the gateway between the user and his/her AI assistant. Our cars, phones, watches, and phones respond to our voice, which becomes their wake word and triggers them into act now mode.
4- What is XiaoIce and how to pronounce it - XiaoIce Cortana’s little sister, is a social assistant that people can add as a friend on several major Chinese social networking services including Weibo.
Pronounced as shiow-ice
The movie, the Bourne Identity is the story of a man (Matt Damon), who was rescued, almost near death, from the ocean by an Italian fishing boat. He wakes up with a blank memory; left without any identity or background he sets on a journey to learn about his identity. All he possesses as reminders from his past are his peerless talents in fighting, self-defense and his impeccable command over few languages. With a deep urge to know who he is, he sets out on a desperate search to discover who he really is, and why he's being mortally pursued by assassins who always know where he is and what his next move is going to be. Now, imagine yourself in a Matt Damon like situation which is quite possible, minus the fact that you may not suffer from total amnesia, but someone somewhere knows everything about you and also knows what your next move is going to be. Plus you always have this uneasy feeling that you are sharing your bedroom with someone, and ironically you cannot establish its physical identity, but you know for certain it is there and recording your every move and your every conversation. This is something that may be a reality very soon and we all may find ourselves in Matt Damon like situation trying to escape from this abstract spy-bot whose core mission is to record our every move in our personal and official spaces. Would you allow this to happen to you or would you resist just like anyone who loves his/her freedom as much as he/she loves to live? To know more explore this blog post on Digital Assistants and how they are evolving to possess human-like attributes.
Digital Assistants and that includes Alexa as well, tend to be one-sided and unemotional when it comes to interacting with humans in a two-way conversation. We're reacting to what is being asked, does not only depend on what one says, but how one says it without sounding monotonous and uninterested in the conversation, which in any given human conversation is considered RUDE by the listener and may be interpreted as uninteresting irreverent" Noise Pollution” to produce feelings of frustration.
We all remember the gossipy, quite conversational and very humanlike Droid that featured in “Rogue One: A Star Wars Story” (2016). But when you compare him with, GNK power droid or “Gonk” droid that featured for the first time in “Star Wars” (1977) one cannot miss the obvious and groundbreaking technological enhancements in the latest versions of droids. Gonk had a very limited screen time and did not bear its signature characteristics as were depicted by its succeeding droid series, my favorite being C1-10P, nicknamed Chopper – a bit of jerk - It was from this series that the droids started establishing a strong sense of individuality. It was "Chopper", who would protest with forceful barks and burps the moment he felt slightly underestimated or annoyed. Droid’s in the Star Wars series has evolved a lot compared to the first series R5-D4 one of the many astromech droids that surely deserves some praise for being a cool toy. And today powered by AI, robots and digital Assistants are superior versions of the R5-D4 and are only getting better with time and will be as good as human someday.
If talking and answering queries is the purpose of most of the Digital Assistants i.e. Alexa, Siri, Google Assistant, and the rest, then they appear to meet the fundamental requirement with ease. But when it comes to being conversational and being able to answer any question, then Microsoft’s latest version of Digital Assistant beats all and is certainly more conversational and adept because Microsoft claims to have achieved a major breakthrough in conversational AI development by introducing "full duplexing" which has never been accomplished before. And if Microsoft’s claims are true, then this is going to be a complete game changer as far as Digital Assistants are concerned.
Digital Assistants are able to grasp the context to a finite extent and based on this very limited understanding, they can respond to limited sequences of queries. But there are many and very evident limitations and one realizes this when interacting with any Digital Assistants, which, to a very obvious extent are unable to imitate real Life Like scenarios. Their Response sequences lack proper intonation with less awareness about the context, making the conversation distasteful and unreal. Because this limited contextual awareness does not last beyond queries which may tend to be slightly off the track although related to the primary subject of conversation. Google voice search has been responding to follow-up questions as early as 2013. The question can be anything that makes sense and Google Voice search responds with a valid and accurate response but it is unable to do so, the moment you get into the human-like mode of asking to follow up questions.
If the query is, “ Who is Michael Jordan?” you will get an answer, “American retired professional basketball player.” Now you ask a follow-up question, “ Why is he so famous?” and the answer you get, “He led the Chicago Bulls to six National Basketball Association championships and earned the NBA's Most Valuable Player Award five times.” All of a sudden when the next follow up query is something more personal and more inclined towards natural conversation, then Google Voice Search falters, “who is your favorite, Michael Jackson or Michael Jordan?” This time the response lacks any human-like attributes and returns no answer. There is pin drop silence as if you said something derogatory, whereas in actuality this query is quite correct and not derogatory, it is just that Google Voice Search does not have it registered as a log in its database.
Nevertheless, in spite of these handicaps, there has been progress and we are getting closer to making the Digital Assistants sound more natural and conversational. In a real human conversation both parties are active listeners and speakers, based on the need and situation. And one party, while saying something may also have to hear what other party said in between and then respond, adjust facial expressions, intonation, use interjections if required and lot more. So technically, humans while speaking to someone are also able to listen, comprehend and answer based on what other party said. This is where Digital Assistants fail completely because they have always had this limitation, they cannot listen to what the listener is saying while the Digital Assistant is already saying something. Meaning “ It is unable to register the input as it is speaking” this is a major roadblock in making Digital Assistants get conversational and engaging, like a human conversation.
Also, real human conversations involve both parties often listening and speaking at the same time. With interruptions as a normal attribute of any natural – casual and professional – conversation, wherein noticing ones tone, expressions, essence of query or an overall context of the conversation is important, leading to either party adjusting accordingly. In this case both parties are active listeners and active speakers as well. It is this aspect of being an active listener while being an active speaker in a conversation that is totally missing within Digital Assistants and has deterred them from sounding or creating human like conversations.
Maybe now it will be correct to say that this aspect “was totally missing” because Microsoft recently announced they have redefined conversational AI by introducing "Full Duplexing." Now if Microsoft’s next generation of AI-based Digital Assistants can have a verbal conversation, where they can speak and listen simultaneously, it will definitely create history. Because that is what all major tech companies like Amazon, Google, Apple and many others have been striving to achieve for many years now. With this achievement, the need for Wake Words has ceased and it redefines Digital Assistant space with a major and multi faceted impact!
Full Duplexing opens new avenues of technological opportunities where having interactions with Digital Assistants will be lot more natural. One can speak additional instructions while the AI is in middle of answering or executing something else – Maybe while the Digital Assistant is reading few extracts from your favorite novel, you can ask it to turn off the TV. Although there is no denying that the conversations will still retain a robotic element and feel mechanical. However, there is no doubt it will be lot closer to creating IRL like scenario. Therefore, it will be wise to say “we ought to set up proper expectation settings when it comes to engaging in a conversation with a Digital Assistant like Alexa or its advanced versions from Microsoft.” It will still take many years till we achieve Full Duplexing that introduces advanced human like approach to any conversation.
It remains important to examine the privacy related aspect of Full Duplexing. Digital Assistants with the AI capable of Full Duplexing, will be active listeners and may listen and record lot of details that may be outside the contextual scope of actual conversation. And this is exactly what is worrisome, will this data be stored, if yes, where and how, and does it help the Digital Assistant in becoming more proficient with human like conversations? This means the user or owner of a Digital Assistant will have to live with the fact that he/she is being listened to every time and all the time, irrespective of the relevancy of the context. It means a lot in terms of privacy; almost like sleeping in your bedroom knowing there is someone anonymous also sleeping next to you and you cannot do anything about it.
Many advocates of AI and Digital Assistants related technologies strongly believe “in order to avail a convenience that is more than excellent, the user will have to make few compromises, and he/she anyway will have the authority to choose privacy over convenience or the other way around.” Whichever way it maybe, one thing is for sure, future technologies will be driven by data, and plethora of it, and the source generating this data should be in a position to allow, disallow or limit its use in AI driven technologies. In future just like smart phones today, everyone would need AI driven DigitalAssistants and there will be many who would prefer to choose Privacy over convenience until they are convinced their data is not being manipulated in any way, and they do not have to chase ghosts in the dark, just like Matt Damon in the Movie, “the Bourne Identity.”
As of today none of the technology leaders in Digital Assistant space has claimed to have completely implemented Full Duplexing. And Microsoft has not infused the tech in Cortana either. However it does claim to have applied it to a Chabot called XiaoIce. That delivers the weather reports on news stations, reads extracts from books, sings songs and lots more.
At present China is at the forefront when it comes to evaluating our future with AI driven and highly sophisticated chat bots. Microsoft also has entered into a partnership with Xiaomi to facilitate the integration of XiaoIce with several smart home devices. And the chat bot already has more than 185 million users. And if Microsoft’s claims are to be believed, one of the users had a 4-hour long conversation with XiaoIce. This is like breaking the sound barrier in Digital Assistant space.
Now Digital Assistants/Chat Bots have Transcript, Wake Words , Full Duplexing and AI, as major drivers to help them create IRL like conversations, without losing perspective of main context. Nevertheless, AI will have to achieve major advancements before Full Duplexing can be implemented to produce optimum output, and many privacy related factors will have to be addressed for larger and commercial acceptance of these exceedingly sensible and chatty chat bots.
For now, in the space of Digital Assistants, Microsoft has a very obvious advantage over its competitors, the only thing that needs to be seen is how far they can take Full Duplexing when it comes to defining AI for its peak performance. At the end of the day, it is less about the conversation and more about the context, and in order to master this characteristic of conversation, chat bots or Digital Assistants need to be aware, and understand many other contributing elements. But as long as the bots are only chatting, I am sure very soon we will be conversing with Digital Assistants as our trusted, engagingly chatty and emotionally sensitive companions.
While Digital Assistants are getting smarter, HireHere is expanding its pool of highly competent and experienced team of AI freelancers. Simply because this is where the future of technology, freelancing and moreover the world’s work force is headed.
Come and experience the future of technology with HireHere.