Inside Project Debater Speech By CrowdIBM’s Effort to Extend NLP from Basic Conversations to DebatesJesus RodriguezBlockedUnblockFollowFollowingJan 8Conversational applications are becoming part of our everyday’ s lives using channels such as digital assistants or bots.
Despite the unquestionable progress made in natural language processing(NLP) in the last few years, most AI-driven conversation applications today rarely resemble human dialogs and focus on basic question-answering or command-action patterns.
A good percentage of human’s conversations deviate from that pattern of digital assistants and focus on expressing opinions about a specific subject.
Last year, IBM Research unveiled Project Debater as one of the first artificial intelligence(AI) agents that can debate complex topics at a human level.
Yesterday, IBM announced the first application of Project Debater: Speech by Crowd which crowdsources opinions about a specific arguments and formulates complex points of views about both sides of a debate.
Speech by Crowd is the materialization of an impressive list of AI Research on different NLP areas.
Debates are the ultimate expression of human dialog.
As a society, expert-debates surface different perspectives on a specific subject.
From school competitions to presidential races, debates are an intellectual duel of minds that helps build the collective knowledge around a given topic.
The contrast between debates and the semi-rigid conversation patterns built in digital assistants and chatbots is pretty obvious:Speech by CrowdSpeech by Crowd is one of IBM’s highlights at this year’s Consumer Electronics Show(CES).
Conceptually, Project Debater-Speech by Crowd is a platform for crowdsourcing decision support.
The platform collects text-based arguments from large audiences and formulates persuasive arguments on both sides of the debate.
For instance, in the subject of whether social media is harmful to society, Project Debater — Speech by Crowd formulated the following exhaustingly rich arguments:Speech By Crowd leverages the capabilities of Project Debater in a very clever way to create complex arguments from crowdsources text-based opinions.
At a high level, the workflow behind Speech by Crowd can be structured in the following stages:The following video clearly illustrates the previous workflow in a very intuitive way.
The workflow behind Project Debater — Speech by Crowd might result intuitive from the conceptual standpoint because resembles human reasoning.
However, from the AI and NLP standpoint, Project Debater — Speech by Crowd is able to achieve some impressive capabilities that many thought were unique to human cognition.
In order to enable those capabilities, Project Debater — Speech by Crowd has built on years of AI, NLP research, The research references of Project Debater include dozens of publications which, in turn, reference hundreds of other publications.
From that extensive list, there are three research building blocks that I think are particularly relevant:Concept AbstractionIn Learning Concept Abstractness Using Weak Supervision, IBM researchers introduce a weak supervised learning method that is able to abstract concepts from text-based expressions in the absence of labeled data.
Specifically, the paper introduces the concept of Abstractness Indicators as a quantifier of the degree to which an expression denotes a specific entity.
The techniques proposed in this paper go beyond simple topic detection and understand the syntaxtical and contextual representation of words in order to evaluate the level of abstractness.
For instance, the paper looks for suffixes like ity that can denote adjectives(ex: capability) or tion which can is typical used in nouns (ex: action).
The paper evaluates different classification methods such as Naïve Bayes, Nearest Neighbor or Recurrent Neural Networks(RNN) to quantify their levels of abstractness.
Argumentative Content SearchThe paper “Towards an argumentative content search engine using weak supervision” proposes a search technique for finding claims in large text corpuses.
By claims we are referring to assertions that are intended to be proved in a specific argument.
Detecting claims in textual information-sets is a key part of the argumentative process.
For small text datasets, this process is relatively trivial but most of those techniques result unpractical when applied to large volumes of unstructured, argumentative content.
In their paper, IBM researchers rely on deep neural networks(DNNs) to identify claims sentences(CS) which are sentences that affirm or rebut a given assertion.
Specifically, the CS technique looks for sections in sentences that include the token ‘that’ as a sign of a precursor to a claim followed by some restrictions about a specific concept.
For instance, from the following sentence, only S1 can be asserted as a CS as it contains an assertion about nuclear power guided by the terms ‘that’ followed by the qualifier ‘obsolete’:· S1: He believed that nuclear power would become obsolete, to be replaced by clean energy sources.
· S2 The author concludes that wind energy has the greatest potential for near-term expansion.
· S3 As Buckley writes, “If atheism was unacceptable, superstition and fanaticism were even more so”.
· S4 Any form of corporal punishment is barbaric and has no place in a civilized polity.
Listening ComprehensionIn “Listening Comprehension over Argumentative Content”, IBM proposes a machine listening comprehension(MLC) technique for enabling question-answering over large volumes of argumentative speech transcripts.
The main difference of this technique is that it achieves comprehension in real time over audio streams without access to text transcripts.
For the experiments, IBM collected 200 spontaneous speeches arguing for or against 50 controversial topics.
For each speech, the researchers formulated a question, aimed at confirming or rejecting the occurrence of potential arguments in the speech.
Labels were collected by listening to the speech and marking which arguments were mentioned by the speaker.
The experiments evaluated the MLC capabilities of different models such as word2vec, skip-thought or all-yes.
Speech by Crowd is the first practical implementation of Project Debater and certainly an important milestone to see how these new wave of conversational technologies are going to be adopted.
IBM has scheduled public debates for every day this week.. More details