CN110781666B

CN110781666B - Natural language processing text modeling based on generative antagonism network

Info

Publication number: CN110781666B
Application number: CN201910623780.8A
Authority: CN
Inventors: D.杜阿; C.N.D.桑托斯; 周伯文
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2018-07-12
Filing date: 2019-07-11
Publication date: 2023-09-05
Anticipated expiration: 2039-07-11
Also published as: CN110781666A

Abstract

Mechanisms are provided for implementing a Generative Antagonism Network (GAN) for natural language processing. With these mechanisms, the generator neural network of the GAN is configured to generate an n-gram bag output based on the noise vector input, and the discriminator neural network of the GAN is configured to receive a BoN input, wherein the BoN input is a BoN output from the generator neural network or a BoN input associated with an actual portion of natural language text. These mechanisms further configure the discriminator neural network of the GAN to output an indication of the probability of whether the input BoN is from an actual portion of natural language text or the BoN output of the generator neural network. Further, the mechanism trains the generator neural network and the discriminator neural network based on a feedback mechanism that compares an output indication from the discriminator neural network with an indication of whether the input BoN is from an actual portion of natural language text or the BoN output of the generator neural network.

Description

Natural language processing text modeling based on generative antagonism network

Technical Field

The present application relates generally to an improved data processing apparatus and method, and more particularly to providing a generated-type countermeasure network-based text model for use in performing natural language processing and a mechanism for providing question-answering capabilities using a trained generated-type countermeasure network-based text model.

Background

Natural language processing is a field of computer science, artificial intelligence, and computational linguistics that involves interactions between a computer and human (natural) language, and in particular, programming a computer to efficiently process a large natural language corpus. Challenges in natural language processing typically involve natural language understanding, natural language generation (typically from formal, machine-readable logical forms), connective language and machine perception, dialog systems, or some combination thereof.

One model commonly used for natural language processing is the bag of words (BOW) model or the continuous bag of words (CBOW) model. The bag of words model is a simplified representation for natural language processing and information retrieval (information retrieval, IR), where text such as sentences or documents is represented as its bag of words (multiple set), while ignoring grammar and even ignoring word order, but maintaining diversity. The bag of words model is typically used in document classification methods, where the frequency of occurrence of each word is used as a feature for training a classifier. The CBOW model works by predicting word probabilities given a context (e.g., where the context may be a single word or group of words), e.g., where a single context word is given, CBOW predicts a single target word.

A generative model learns the joint probability distribution p (x, y) of the input variable x (observed data value) and the output variable y (determined value). Most unsupervised generative models, such as boltzmann machines (Boltzmann Machine), deep trust networks (Deep Belief Network), etc., require complex samplers to train the generative model. However, recently proposed generation-type countermeasure network (Generative Adversarial Network, GAN) techniques reuse the min/max paradigm from game theory to generate images in an unsupervised manner. The GAN framework includes a generator and a discriminator, where the generator acts as an adversary and attempts to fool the discriminator by generating a composite image based on noise input, and the discriminator attempts to distinguish the composite image from a true image.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described herein in the detailed description. This summary is not intended to identify key factors or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In one illustrative embodiment, a method is provided in a data processing system including at least one processor and at least one memory including instructions executable by the at least one processor to configure the processor to implement a Generative Antagonism Network (GAN) for natural language processing. The method comprises the following steps: the method includes configuring a producer neural network of the GAN to generate an n-gram bag of-grammar (BoN) output based on a noise vector input, and configuring a discriminator neural network of the GAN to receive a BoN input, wherein the BoN input is a BoN output from the producer neural network or a BoN input associated with an actual portion of natural language text. The method further includes configuring the discriminator neural network of the GAN to output an indication of the probability of whether the input BoN is from an actual portion of natural language text or the BoN output of the generator neural network. Moreover, the method includes training the generator neural network and the discriminator neural network based on a feedback mechanism that compares an output indication from the discriminator neural network with an indication of whether the input BoN is from an actual portion of natural language text or a BoN output of the generator neural network.

In other illustrative embodiments, a computer program product is provided that includes a computer usable or readable medium having a computer readable program. The computer readable program, when executed on a computing device, causes the computing device to perform one of the various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.

In yet another illustrative embodiment, a system/apparatus is provided. The system/apparatus may include one or more processors and a memory coupled to the one or more processors. The memory may include instructions that, when executed by the one or more processors, cause the one or more processors to perform one of the various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.

These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the exemplary embodiments of the present invention.

Drawings

The invention, as well as a preferred mode of use and further objectives and advantages thereof, will best be understood by reference to the following detailed description of illustrative embodiments when read in conjunction with the accompanying drawings, wherein:

FIGS. 1A and 1B depict examples of GAN-based mechanisms for generating an n-gram bag (BoN) model in accordance with one illustrative embodiment;

figures 2A and 2B depict exemplary diagrams of GAN architecture configured for answer selection, according to one illustrative embodiment;

FIG. 3 depicts a schematic diagram of one illustrative embodiment of a cognitive system in a computer network;

FIG. 4 is a block diagram of an example data processing system in which aspects of the illustrative embodiments may be implemented;

FIG. 5 shows a cognitive system processing pipeline for processing natural language input to generate a response or result in accordance with one illustrative embodiment;

FIG. 6 is a flowchart outlining an exemplary operation for training a GAN to generate an n-gram bag for use in performing natural language processing operations in accordance with one illustrative embodiment; and

fig. 7 is a flowchart outlining an exemplary operation for performing answer selection using a trained GAN in accordance with one illustrative embodiment.

Detailed Description

The illustrative embodiments provide mechanisms for learning word/phrase bag models for performing natural language processing by utilizing a generative antagonism networking approach. Much effort is devoted to learning statistical language models that estimate word and phrase distributions in natural language. However, most of these statistical language models are trained differently at some point with incorrect independence assumptions (between different segments of the document) in order to make learning manageable. The earliest attempts to provide statistical language modeling were made in the area of speech recognition in the 80 s of the 20 th century. Since then, these techniques have been applied to innumerable areas such as machine translation, speech-to-text synthesis, and the like. On the core, these works involve the conversion and emission probabilities of words or phrases (conditioned on historically seen words or phrases) in a learning vocabulary (vocabolar).

Early efforts of challenge model training included spam filtering, intrusion detection, terrorism detection and computer security. For example, U.S. patent No. 8,989,297 proposes a routing protocol for transmitting messages over a network of nodes, wherein an adversary controls links between nodes. The sender requests status reports from intermediate nodes to determine which nodes are malicious. In another work, U.S. patent No. 8,984,297, a method is used to identify inappropriate text in an image to remove spam messages.

Recently, there has been increased interest in training against neural networks for unsupervised generative processes. These Generative Antagonism Networks (GANs) are a class of artificial intelligence algorithms for unsupervised machine learning implemented by two neural network systems competing under zero and gaming framework. Goodfellow et al, introduced GAN, "Generative Adversarial Networks", progress of neuro information handling systems (Advances in Neural Information Processing Systems), pages 2672-2680, 2014. As described therein, the GAN is used to generate a composite photograph or image by training the GAN's generator to generate a photograph or image that appears to a human observer as if it were an actual photograph or image, rather than a composite photograph or image generated from a noisy input, which is spoofed by the GAN's discriminator. When attempting to spoof the discriminator, the generator learns the distribution of the actual photos or images, i.e. the generator learns how to generate photos or images that will spoof the discriminator.

Since this work by Goodfellow et al, improvements have been made to generate better quality images, such as the DCGAN model described by Radford et al, "unsupervised representation learning with deep convolution generation against networks (Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks)", arXiv preprinted arXiv:1511.06434 (2015), and the laplan model described by Denton et al, "depth-generated image model using laplace pyramids against networks (Deep Generative Image Models using Laplacian Pyramid of Adversarial Networks)", "progress of neuro-information handling systems (Advances in Neural Information Processing Systems), pages 1486-1495, 2015. However, these Generative Antagonism Network (GAN) models are limited to such learning of images.

The illustrative embodiments provide mechanisms for adapting GAN model (e.g., GAN, DCGAN, LAPGAN, etc.) techniques to facilitate natural language processing of natural language content. In particular, the illustrative embodiments implement a Generative Antagonism Network (GAN) to learn co-occurrence of words (co-documents) and their locations together in a portion of natural language content (e.g., sentences, paragraphs, documents, etc.). The GAN model is used to generate a bag of n-grams, where the n-grams may be characters, words, phrases, or any other portion of natural language content. The n-gram bag is encoded as a probability distribution over the entire vocabulary V. The probability of an n-gram that does not belong to the bag is zero, while the probability of an n-gram in the bag is greater than zero.

Specifically, given a noise vector z, the generator G of GAN is trained to produce a vector G (z) of magnitude |v|, each value in vector G (z) (i.e., each vector slot) is set to a value that indicates whether the corresponding n-gram is in an n-gram bag, e.g., an n-gram having a value greater than 0 is the value encoded in the n-gram bag in vector G (z). The discriminator D of GAN receives as input the encoded n-gram bag, i.e. the encoding vector G (z) representing the n-gram bag, and outputs a score between 0 and 1. The discriminator D may perform various evaluations on the n-gram bags to calculate output scores, such as determining various statistics or feature extraction, including but not limited to term frequency, inverse document frequency, and the like. The higher the score, the more confident the discriminator D is that the encoded n-gram bag is generated from a real (actual) sentence/document, rather than being generated synthetically by the generator G from the noise input. A feedback mechanism may be provided to enable training of the generator G based on the output of the discriminator D so that the generator G learns how to spoof the discriminator D and thus generates an n-gram bag output that closely approximates the n-gram bag representation of the actual portion of natural language content.

The result is that the n-gram bag representation, which may be generated from the training generator G of GAN, provides a large unlabeled dataset that may be used to perform natural language processing operations. For example, if conditional generation is performed, an n-gram bag model (hereinafter, boN model) may be used to classify a given text, select candidate answers for answering (question answering, QA) the input questions of the system, text implication (textual entailment) operations, and so on. In one illustrative embodiment, the n-gram bag representation generated by the trained condition generator G based on the given input question and noise input vector may be used to select a response from a set of candidate responses generated by the QA system by comparing the BoN model to the n-gram in the candidate response (e.g., the BoN representation of the candidate response). The candidate answer with the highest degree of match to the BoN model generated by the conditional GAN may be selected as the correct answer for the QA system.

The GAN-based mechanism for generating an n-gram bag (BoN) model for use in performing natural language processing (natural language processing, NLP) operations has a number of advantages. First, it is an end-to-end micro-solution and can be trained with random gradient descent to arrive at a trained generator quickly. Second, for each n-gram, the generator will simply output a value that informs whether the n-gram is in the generated BoN. This is simpler and easier to optimize than trying to generate sentences themselves by directly generating Word Embedding (WE) sequences.

Furthermore, since the generator G is generating a BoN model, the generation process can be performed in parallel, i.e. the probability of each word in BoN is calculated in parallel given the input noise vector. Also, most of the operations in the discriminator (up to the last layer) can be performed in parallel.

In addition to these benefits, the use of BoN models is sufficient in many NLP tasks to obtain accurate and reliable results, and thus, mechanisms for generating BoN models from noisy inputs without having to deal with large amounts of actual natural language content are beneficial tools that greatly reduce the time and resources required to generate accurate models. Moreover, the framework of the GAN-based mechanisms of the illustrative embodiments may be readily adapted to perform different NLP tasks.

Before beginning to discuss various aspects of the illustrative embodiments in greater detail, it should be understood first that throughout the description, the term "mechanism" will be used to refer to elements of the invention that perform various operations, functions, etc. The term "mechanism" as used herein may be an implementation of the functions or aspects of the illustrative embodiments in the form of an apparatus, process or computer program product. In the case of a process, the process is implemented by one or more devices, apparatuses, computers, data processing systems, or the like. In the case of a computer program product, logic represented by computer code or instructions embodied in or on the computer program product is executed by one or more hardware devices to perform the functionality or perform the operations associated with certain "mechanisms". Thus, the mechanisms described herein may be implemented as dedicated hardware, software executing on general purpose hardware, software instructions stored on a medium such that the instructions can be readily executed by dedicated or general purpose hardware, a process or method for performing functions, or a combination of any of the foregoing.

With respect to specific features and elements of the illustrative embodiments, the description and claims may use the terms "a," an, "" at least one of, "and" one or more of. It should be understood that these terms and phrases are intended to indicate that at least one of the particular features or elements is present in certain illustrative embodiments, but more than one may be present. That is, these terms/phrases are not intended to limit the description or claims to the presence of a single feature/element or require the presence of a plurality of such features/elements. Rather, these terms/phrases require only at least a single feature/element, while the possibility of a plurality of such features/elements is within the scope of the description and claims.

Moreover, it should be appreciated that the use of the term "engine" (if used herein with respect to describing embodiments and features of the present invention) is not intended to be limited to any particular implementation for accomplishing and/or performing actions, steps, procedures, etc. attributed to and/or performed by the engine. An engine may be, but is not limited to, software, hardware, and/or firmware, or any combination thereof, for any purpose including general-purpose and/or special-purpose processors in combination with suitable software loaded or stored in a machine-readable memory and executed by the processor. Further, unless otherwise indicated, any names associated with a particular engine are for ease of reference purposes and are not intended to be limited to a particular embodiment. Additionally, any functionality attributed to an engine may be performed equally by multiple engines, incorporated into and/or combined with the functionality of another engine of the same or different type, or distributed across one or more engines of different configurations.

Moreover, it is to be understood that the following description uses a number of various examples for the various elements of the illustrative embodiments to further illustrate example implementations of the illustrative embodiments and to aid in understanding the mechanisms of the illustrative embodiments. These examples are intended to be non-limiting and are not exhaustive of the various possibilities for implementing the mechanisms of the illustrative embodiments. It will be apparent to those of ordinary skill in the art from this disclosure that many more alternative embodiments exist for these different elements that can be utilized in addition to or instead of the examples provided herein without departing from the spirit and scope of the present invention.

The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, the present invention provides a method, apparatus, system, and computer program product for configuring and implementing a Generative Antagonism Network (GAN) model to generate an n-gram bag (BoN) model for performing Natural Language Processing (NLP) operations, and NLP method, apparatus, system, and computer program product implementing a trained GAN to perform NLP operations. With the mechanisms of the illustrative embodiments, an n-gram bag is encoded as a vector of size |v| (i.e., n-gram vocabulary size). N-grams that do not belong to the coded n-gram bag (BoN) have zero values at their corresponding locations, while n-grams in BoN have values greater than zero. If the vector is normalized, these values can be interpreted as probabilities that each n-gram (e.g., word) belongs to BoN.

The actual non-normalized values in the vector slots of the BoN vector (each slot corresponding to an n-gram) may be calculated in a variety of different ways. For example, the value may be the frequency of the corresponding n-gram appearing in the input. The value may be a co-occurrence metric (metric) of the n-gram. In some demonstrative embodiments, one informative way of encoding an n-gram is to use the term frequency-inverse document frequency (TF-IDF) of the n-gram as a value in the encoding vector. Non-normalized values of the BoN vector may be generated using any metric for representing the representation and/or degree of influence of the n-gram in the input and corresponding known or later developed methods without departing from the spirit and scope of the present invention. For purposes of illustration herein, the illustrative embodiments will assume that the values of vector slots in a BoN vector are calculated as TF-IDFs for the corresponding n-gram, which are then normalized to represent the probability that the corresponding n-gram belongs to BoN.

Fig. 1A and 1B depict examples of GAN-based mechanisms for generating an n-gram bag (BoN) model in accordance with one illustrative embodiment. Fig. 1A is a representation of a generator of GAN, and fig. 1B is a representation of a discriminator of GAN.

As shown in fig. 1A and 1B, given a noise vector z 102 as input (e.g., a numerical vector in the range of 0,1 sampled from a uniform distribution), a generator neural network (generator) G100 is trained to produce an n-gram bag G (z) 120 that is encoded using the method described previously. The discriminator neural network (discriminator) D130 receives as input a coded n-gram bag 132, which coded n-gram bag 132 may be, for example, an n-gram bag G (z) 120, or an n-gram bag obtained from an actual (real) portion of natural language text, and outputs an output value D (G (z)) 150 as a fraction between 0 and 1. Generator G100 and discriminator D130 may be implemented as software and/or hardware logic of one or more computing devices, where the logic may be trained, for example, using a random gradient descent algorithm or other suitable training process for a desired implementation.

In generator G100, noise vector z (m in size) 102 is typically created by uniformly sampling m values in interval [0,1], where m may be any suitable number of values, such as 50 or 100. Each value in the uniform distribution of intervals 0,1 has an equal chance of being selected to be included in the noise vector z by sampling. Another way to generate the noise vector z is to sample from a normal distribution with an average value of 0 and a standard deviation of 1.

The noise vector Z is then projected and copied by projection and copy logic 104 |v| times to form matrix Z (of magnitude |v| x m) data structure 106. The projection is a fully connected neural network layer followed by a non-linearity (such as ReLU). Each row in matrix Z106 is identical, i.e., each row corresponds to the same vector Z, which is replicated to generate matrix Z. Projection and replication are known mathematical operations, and thus, details of how projection and replication are implemented by projection and replication logic 104 will not be set forth herein.

The generator then retrieves the embedment 110 of each n-gram in the vocabulary. The vocabulary 108, which includes a set of n-grams, is a data structure of all identified n-grams used in a particular natural language processing system (not shown) implementing the GAN. Each of the n-grams in the vocabulary 108 may be represented as a vector representation (i.e., embedded), such as a one-hot (one-hot) vector representation of a word, where the vector includes a slot for each word in the vocabulary 108, and a value of "1" exists in the slot corresponding to the word, and all other slots are set to a value of "0". In general, the embedding of an n-gram consists of a concatenation of word embedments of different words in the n-gram, e.g., a single word n-gram embeds a single word with that word. Consider an example in which a binary grammar (i.e., a combination of two words) is used and the word embedding size is d (where the embedding size is an empirically defined hyper-parameter, such as 300, etc.), the process creates a matrix a112 of size |v|x 2*d. Each of the embeddings can be represented as a row in the resulting matrix a 112.

Generator G100 using series logic 114 then concatenates matrices a and Z to generate concatenation matrix 116. As shown in fig. 1A, the concatenation matrix 116 includes rows having a first portion (unshaded) corresponding to each row of the projection and replication matrix Z106 and a second portion (shaded) corresponding to each row of the embedded n-gram matrix a.

Each row of the series matrix 116 [ projection z; the n-gram embedding is input to a Neural Network (NN) 118, which in some illustrative embodiments may be a multi-layer persistence (MLP) that uses modified linear units (Rectified Linear Unit, reLU) as the activation function of the output layer, although other activation functions, such as sigmoid activation functions or hyperbolic tangent functions, etc., may be utilized in other embodiments. NN functions to output a number that determines the likelihood that an n-gram exists in BoN, denoted by z. Because the neural network is applied to each of the rows of the series matrix, the neural network will output the number of |v| which can be interpreted as a vector G (z) representing the output BoN. In other words, the noise vector input z is mapped from z-space to BoN model space. The output vector G (z) (size |v|) is normalized so that the sum of its values is 1. As a result, the output vector G (z) represents the probability distribution over the n-gram of the vocabulary 108. In one illustrative embodiment, this is a probability distribution of TF-IDF values based on an n-gram.

As shown in fig. 1B, in discriminator D130, an input of BoN model G (z) 132 is received, which may be a BoN model G (z) 120 from generator G100 or a BoN generated by computing TF-IDF vectors from real (actual) natural language text. Furthermore, the embedding 110 of the n-gram of the vocabulary 108 (e.g., word Embedding (WE) in the depicted example) is retrieved to form a matrix of n-gram embeddings B112, similar to the corresponding operations performed in the generator G100.

After retrieving the n-gram embedment 110 and generating the matrix B112, the logic of the discriminator 130 multiplies the input encoded n-gram in the BoN 132 by the n-gram embedment matrix B112. It should be noted that after this multiplication operation, the embedding of n-grams that is not in the input encoded BoN 132 will have zero values. The result of this multiplication is projected by discriminator projection logic 140 to produce an output matrix 142. The projection may be considered as a fully connected neural network layer followed by a non-linearity (such as ReLU), e.g., x=relu (B.W), where W is a parameter matrix learned during training and each row of X is a projection of a row in the input matrix.

Next, discriminator logic 144 performs sum pooling (sum pooling), which creates a fixed-size vector r 146. Neural networks (such as MLP 148) typically operate on a single vector as input. Summation pooling or max pooling is a mechanism by which all information used to determine whether an input BoN is from a real document is compressed into a single vector that can be processed by the NN of the evaluator. Thus, the fixed size vector r 146 may be considered a summary of BoN information, e.g., a feature vector, as a single vector that may be processed by the MLP 148. Given vector r 146 as an input to a neural network 148, which may also be an MLP, the output layer of which uses a sigmoid activation function. The neural network 148 generates an output of 0 or 1, indicating whether the discriminator 130 believes that the input BoN model G (z) represents an actual portion of natural language content or is synthetic.

The proposed GAN-based architecture shown in fig. 1A and 1B can be readily adapted to generate n-gram bags that are tuned with other text (e.g., other questions) or categories (e.g., positive/negative emotion categories). For example, one may wish to generate BoN with positive emotion. To this end, the condition GAN may be trained, wherein the generator also receives as input the "category" of BoN.

In one illustrative embodiment, a GAN-based architecture may be used to implement an n-gram bag model to assist in the answer selection task of a question-answer (QA) system. Fig. 2A and 2B depict exemplary diagrams of GAN architecture configured for answer selection in accordance with one illustrative embodiment. In such an illustrative embodiment, the GAN mechanism 200 may be trained to generate n-gram bags of responses conditioned on both z 202 and Word Embedding (WE) 206 of the input question q 204. This adjustment may be accomplished by concatenating the vector representation of q 210 generated using recurrent neural network (recurrent neural network, RNN) 208 by concatenation logic 216 to matrix Z214 generated by projection and replication logic 212 and word embedding matrix |v|218 of vocabulary 220.

The resulting concatenation matrix 222 includes rows having a first portion corresponding to the corresponding row in matrix Z214, a second portion corresponding to the embedded vector 210 of problem q 204, and a third portion corresponding to the n-gram embedded matrix 218 of vocabulary 220. The neural network 224 processes the concatenation matrix 222 in a similar manner as previously described above to generate an output BoN model G (z, q) 226 that represents the probability distribution over the n-gram of the vocabulary 220 adjusted by the noise input z 202 and the embedded vector 210 of the input problem. Thus, prior to fig. 1A, the neural network receives type [ z; n-gram embedded feature vector, neural network 224 receives z; embedding a problem; n-gram embedding ] feature vectors. This means that the score generated by the neural network 224 for a given n-gram is now also dependent on the nature of the problem (embedding).

As shown in fig. 2B, in discriminator D230, an input is received for BoN model G (z, q) 232, which may be BoN model G (z, q) 226 from generator G200. In addition, word Embedding (WE) 218 of the n-gram of the search vocabulary 220 is similar to the corresponding operations performed in the generator G200. An n-gram embedding matrix 218 is again generated. After retrieving n-gram embedment 218 and generating n-gram embedment matrix 218, logic of evaluator 230 multiplies the n-gram encoded in BoN model G (z, q) by the n-gram embedment in vector matrix |v| 218. It should be noted that after this multiplication operation, the embedding of n-grams that is not in the input encoded BoN model G (z, q) will have zero values. The result of this multiplication is projected by discriminator projection logic 234 to generate an output matrix 236.

Next, discriminator logic 238 performs sum pooling, which creates a fixed-size vector r 240. The fixed size vector r 240 includes a first portion from the pooling logic 238 and a second portion of the word-embedded vector representation 210 corresponding to the input question 204. The vector r 240 is given as input to a neural network 242, which may also be an MLP, the output layer of which uses a sigmoid function. The neural network 242 generates an output of 0 or 1, indicating whether the discriminator 230 believes that the input BoN model G (z, q) represents an actual answer to the input question.

The BoN model G (z, q) 232 is used to select candidate answers to the input question q 204 generated by the QA system. That is, the operations outlined in fig. 2A and 2B basically have a GAN mechanism to learn an n-gram bag (e.g., word bag) corresponding to a response to the input question q 204. As a result, the BoN model 232 for inputting the answer to question q 204 may be used to compare the n-gram in the BoN model 232 with the n-gram in the candidate answer. If there is a sufficient match (e.g., using cosine similarity) between the BoN model 232 n-gram and the n-gram of the candidate answer, the candidate answer may be selected as the answer to the input question q 204. Alternatively, the comparison may be used as a metric for ranking the candidate answers relative to each other such that those candidate answers having a higher degree of match with BoN model 232 may be ranked relatively higher than candidate answers having a lower degree of match with BoN model 232. Such ranking may be performed alone or in combination with evaluation of other criteria, such as supporting evidence analysis, for ranking candidate answers relative to each other to select a final answer to an input question.

It should be appreciated that the GAN mechanism of the illustrative embodiments may be adapted for implementations with various other types of semantic matching tasks. For example, the GAN mechanism may be adapted to perform text modeling of tasks such as text implication and paraphrasing detection. Both cases can be expressed as similar to the answer choice case, where we adjust the generator with the input sentence and compare with the target sentence using the generated BoN.

Thus, as described above, the illustrative embodiments provide a GAN-based mechanism that provides an end-to-end micro-solution that can be trained with random gradient descent. GAN-based mechanisms are easy to optimize due to the fact that the generator only needs to indicate whether the n-gram is in an n-gram bag. The generation and authentication process of GAN-based mechanisms can be easily parallelized because there are no dependencies in the computation of each n-gram in the n-gram bag. In addition, GAN-based mechanisms are readily adaptable to a variety of natural language processing tasks, such as text classification, answer selection, and text implications.

As apparent from the foregoing, the present invention is particularly directed to implementing a computer-based tool (i.e., a Generative Antagonism Network (GAN) mechanism) to perform computer-based natural language processing on a portion of natural language content. Thus, the illustrative embodiments may be used in many different types of data processing environments. In order to provide a context for the description of specific elements and functionality of the illustrative embodiments, fig. 3-5 are provided below as example environments in which aspects of the illustrative embodiments may be implemented. It should be appreciated that fig. 3-5 are only examples and are not intended to assert or imply any limitation with regard to the environments in which aspects or embodiments of the present invention may be implemented. Many modifications to the depicted environments may be made without departing from the spirit and scope of the present invention.

Fig. 3-5 relate to example cognitive systems describing systems for processing natural language input, such as requests, questions, portions of natural language text, etc., request processing pipelines, such as question-answer (QA) pipelines (also referred to as question/answer pipelines or question and answer pipelines), request processing methods, and request processing computer program products, with which the mechanisms of the illustrative embodiments are implemented. These requests may be provided as structured or unstructured request messages, natural language questions, or any other suitable format for requesting operations performed by the awareness system. As described in more detail below, the particular application implemented in the cognitive system of the present invention is an application for questions and answers, and thus, the request processing pipeline is assumed to be a QA pipeline that operates on input questions and generates answers to the input questions.

It should be appreciated that a cognitive system (although shown in the examples below as having a single request processing pipeline) may actually have multiple request processing pipelines. Each request processing pipeline may be trained separately and/or configured to process requests associated with different domains, or to perform the same or different analysis on incoming requests (or problems in embodiments using QA pipelines), depending on the desired implementation. For example, in some cases, a first request processing pipeline may be trained to operate on input requests (or questions) related to the medical domain, while a second request processing pipeline may be trained to operate on input requests (e.g., input questions) related to the financial domain. In other cases, for example, the request processing pipeline may be configured to provide different types of cognitive functions or support different types of applications, such as one request processing pipeline for requests/questions related to treatment advice for a patient, while another request processing pipeline may process requests/questions related to drug interaction information, etc.

Further, in the above examples, each request processing pipeline may have their own associated corpus or corpuses that they ingest and operate on, e.g., one corpus for medical documents and another corpus for financial domain related documents. In some cases, the request processing pipelines may each operate on the same domain of the input problem, but may have different configurations (e.g., different annotators or differently trained annotators) such that different analytics and potential responses are generated. The cognitive system may provide additional logic for routing input problems to the appropriate request processing pipeline (such as based on a determined input request domain), combining and evaluating end results generated by processes performed by the plurality of request processing pipelines, and other control and interaction logic that facilitates utilization of the plurality of request processing pipelines.

As described above, one type of request processing pipeline with which the mechanisms of the illustrative embodiments may be utilized is a question-and-answer (QA) pipeline. The following description of example embodiments of the invention will utilize a QA pipeline as an example of a request processing pipeline that may be augmented to include mechanisms in accordance with one or more illustrative embodiments. It should be appreciated that while the present invention will be described in the context of a cognitive system implementing one or more QA pipelines operating on input problems, the illustrative embodiments are not so limited. Rather, the mechanisms of the illustrative embodiments may operate on requests that are not posed as a "problem" but rather formatted to be used by a cognitive system to perform cognitive operations on a specified input data set using an associated corpus or corpuses and specific configuration information for configuring the cognitive system. For example, rather than asking a natural language question "what diagnosis is applicable to patient P? The cognitive system may instead receive a request to "generate a diagnosis for patient P" or the like. It should be appreciated that the mechanism of the QA system pipeline may operate on requests with minor modifications in a manner similar to the way natural language questions are entered. Indeed, in some cases, the request may be converted to a natural language question for QA system pipelining, if desired by the particular implementation.

As described above, the example embodiments set forth herein relate to a QA system and QA pipeline operating on input natural language questions to generate responses to the input questions by performing natural language processing and cognitive analysis on one or more corpora of the input questions and available information. Thus, before describing how the mechanisms of the illustrative embodiments integrate and augment such a cognitive system and request QA pipeline (or QA pipeline) mechanism, it is important to first understand how the cognitive system and question-and-answer creation in a cognitive system implementing a QA pipeline is implemented. It should be appreciated that the mechanisms described in fig. 3-5 are merely examples and are not intended to state or imply any limitation as to the types of cognitive system mechanisms with which the illustrative embodiments are implemented. Many modifications to the example cognitive systems shown in fig. 3-5 may be implemented in various embodiments of the present invention without departing from the spirit and scope of the present invention.

In general, a cognitive system is a specialized computer system or collection of computer systems configured with hardware and/or software logic (in combination with hardware logic on which software is executed) to simulate human cognitive functions. These cognitive systems apply human-like characteristics to convey and manipulate ideas, which when combined with the inherent advantages of digital computing, can solve problems on a large scale with high accuracy and high resilience. The cognitive system performs one or more computer-implemented cognitive operations that approximate the human mental processes, as well as enable human and machine interaction in a more natural manner in order to extend and expand human expertise and cognition. The cognitive system includes, for example, artificial intelligence logic, such as Natural Language Processing (NLP) based logic, and machine learning logic, which may be provided as dedicated hardware, software executing on hardware, or any combination of dedicated hardware and software executing on hardware. Logic of the cognitive system implements cognitive operation(s), examples of which include, but are not limited to, questions and answers, identification of related concepts within different portions of content in a corpus, smart search algorithms (such as internet web page searches), such as medical diagnosis and therapy recommendations, and other types of recommendation generation (e.g., items of interest to a particular user, potential new contact recommendations, etc.).

IBM Watson ^TM Is an example of one such cognitive system that can process human-readable language and identify inferences between text paragraphs with high human-like accuracy at a faster rate and on a larger scale than humans. In general, such cognitive systems are capable of performing the following functions:

complexity of browsing human language and understanding

Ingest and process large volumes of structured and unstructured data

Generating and evaluating hypotheses

Trade-off and evaluate responses based on relevant evidence only

Providing situation-specific advice, insight and guidance

Knowledge and learning enhancement with each iteration and interaction through a machine learning process

Enable decision making at the point of influence (context guidance)

Task scaling

Expansion and amplification of human expertise and cognition

Identifying resonant, human-like attributes and features from natural language

Deriving language-specific or agnostic properties from natural language

Highly correlated backtracking (memory and recall) from data points (image, text, sound)

Prediction and perception with situational awareness simulating human cognition based on experience

Responding to questions based on natural language and specific evidence

In one aspect, the cognitive systems provide a mechanism for responding to questions posed to these cognitive systems using a question-and-answer pipeline or system (QA system) and/or process requests, which may or may not be posed as natural language questions. The QA pipeline or system is an artificial intelligence application executing on data processing hardware that answers questions related to a given subject matter field presented in natural language. The QA pipeline receives input from a variety of sources, including input over a network, a corpus of electronic documents or other data, data from content creators, information from one or more content users, and other such input from other possible input sources. The data storage device stores a corpus of data. The content creator creates content in the document to use the QA pipeline as part of the data corpus. The documents may include any file, text, article, or data source for use in a QA system. For example, the QA pipeline accesses a knowledge body about a domain (e.g., financial domain, medical domain, legal domain, etc.) or subject area, where the knowledge body (knowledge base) may be organized in various configurations, such as a structured repository (such as an ontology) of domain-specific information, or unstructured data related to the domain, or a collection of natural language documents about the domain.

Content users input questions to a cognitive system implementing a QA pipeline. The QA pipeline then responds to the input questions with the content in the data corpus by evaluating documents, document fragments, data portions in the corpus, etc. When a process evaluates a given piece of a document against semantic content, the process may use various conventions to query such a document from the QA pipeline, e.g., send the query to the QA pipeline as a well-formatted question, then interpret the question by the QA pipeline, and provide a response containing one or more answers to the question. Semantic content is content based on relationships between energy reference symbols (signs), such as words, phrases, symbols, and tokens, and what they represent, their extension, or connotation. In other words, the semantic content is content that interprets an expression, such as by using natural language processing.

As will be described in greater detail below, the QA pipeline receives an input question, parses the question to extract the dominant features of the question, uses the extracted features to express queries, and then applies the queries to a data corpus. Based on the application of the query to the data corpus, the QA pipeline generates a set of hypotheses or candidate answers to the input problem by looking up in the data corpus portions of the data corpus that have some potential to contain valuable responses to the input problem. The QA pipeline then uses various inference algorithms to conduct in-depth analysis of the language of the input problem and the language used in each of the portions of the data corpus found during the query application. There may be hundreds or even thousands of applied inference algorithms, each of which performs a different analysis, such as comparison, natural language analysis, lexical analysis, etc., and generates a score. For example, some inference algorithms may find matches of terms and synonyms in the language of the input question with found portions of the data corpus. Other inference algorithms may find temporal or spatial features in the language, while other algorithms may evaluate the sources of the data corpus portion and evaluate its accuracy.

The scores obtained from the various inference algorithms indicate the extent to which potential responses are inferred from input questions based on the particular focus region of the inference algorithm. Each result score is then weighted against the statistical model. The statistical model captures how well the inference algorithm performs in establishing inferences between two similar paragraphs for a particular domain during the training period of the QA pipeline. The statistical model is used to summarize the confidence that the QA pipeline has in the evidence that the potential response (i.e., candidate answer) was inferred from the question. This process is repeated for each of the candidate replies until the QA pipeline identifies candidate replies that appear significantly stronger than the other replies, and thus generates a final reply or ordered set of replies for the input question.

Fig. 3 depicts a schematic diagram of one illustrative embodiment of a cognitive system 300 implementing a request processing pipeline 308, which in some embodiments may be a question-and-answer (QA) pipeline in a computer network 302. For purposes of this description, it will be assumed that the request processing pipeline 308 is implemented as a QA pipeline that operates on structured and/or unstructured requests in the form of input questions. One example of a problem-handling operation that may be used in connection with the principles described herein is described in U.S. patent application publication 2011/0125734, which is incorporated herein by reference in its entirety. The cognitive system 300 is implemented on one or more computing devices 304A-D (including one or more processors and one or more memories, as well as potentially any other computing device elements commonly known in the art, including buses, storage devices, communication interfaces, etc.) connected to a computer network 302. For illustration purposes only, fig. 3 depicts the cognitive system 300 as being implemented on only computing device 304A, but as described above, the cognitive system 300 may be distributed across multiple computing devices (such as multiple computing devices 304A-D). The network 302 includes a plurality of computing devices 304A-D, which may operate as server computing devices, and 310-312, which may operate as client computing devices, that communicate with each other and other devices or components via one or more wired and/or wireless data communication links, where each communication link includes one or more of wires, routers, switches, transmitters, receivers, and the like. In some demonstrative embodiments, cognitive system 300 and network 302 enable the problem handling and response generation (QA) functionality for one or more cognitive system users via their respective computing devices 310-312. In other embodiments, the cognitive system 300 and network 302 may provide other types of cognitive operations, including but not limited to request processing and cognitive response generation, which may take many different forms depending on the desired implementation (e.g., cognitive information retrieval, training/guidance of the user, cognitive assessment of the data, etc.). Other embodiments of the cognitive system 300 may be used with components, systems, subsystems, and/or devices other than those described herein.

The awareness system 300 is configured to implement a request processing pipeline 308 that receives input from various sources. These requests may be presented in the form of natural language questions, natural language requests for information, natural language requests for performing cognitive operations, and the like. For example, the cognitive system 300 receives input from the network 302, one or more corpora of electronic documents 306, cognitive system users, and/or other data and other possible input sources. In one embodiment, some or all of the inputs to the cognitive system 300 are routed through the network 302. Each computing device 304A-D on the network 302 includes access points for content creators and cognitive system users. Some of computing devices 304A-D include devices for storing databases of one or more data corpora 306 (which are shown as separate entities in fig. 3 for illustrative purposes only). Portions of one or more corpora of data 306 may also be provided on one or more other network-attached storage devices, in one or more databases, or on other computing devices not explicitly shown in fig. 3. In various embodiments, the network 302 includes local network connections and remote connections such that the cognitive system 300 may operate in any size environment, including local and global environments, such as the internet.

In one embodiment, the content creator creates content in documents of one or more data corpora 306 for use as part of the data corpora using the cognitive system 300. The document includes any file, text, article, or data source for use in the cognitive system 300. A cognitive system user accesses the cognitive system 300 via a network connection or internet connection to the network 302 and inputs questions/requests to the cognitive system 300 that answer/process based on content in one or more data corpora 306. In one embodiment, the questions/requests are formed using natural language. The cognitive system 300 parses and interprets questions/requests via the pipeline 308 and provides responses to cognitive system users (e.g., the cognitive system user 310) containing one or more answers to the posed questions, responses to requests, results of processing the requests, and so forth. In some embodiments, the cognitive system 300 provides responses to the user in an ordered list of candidate answers/responses, while in other illustrative embodiments, the cognitive system 300 provides a single final answer/response or a combination of the final answer/response and an ordered list of other candidate answers/responses.

The cognitive system 300 implements a pipeline 308 that includes a plurality of stages for processing input questions/requests based on information obtained from one or more data corpora 306. Pipeline 308 generates answers/responses to input questions or requests based on the input questions/requests and the processing of one or more data corpora 306. Pipeline 308 will be described in more detail below with reference to FIG. 5.

In some illustrative embodiments, the cognitive system 300 may be an IBM Watson available from International Business machines corporation in Armonk, N.Y. ^TM A cognitive system that is augmented with the mechanisms of the illustrative embodiments described below. As previously mentioned, IBM Watson ^TM The pipeline of the cognitive system receives the input questions or requests, then parses them to extract the main features of the questions/requests, which are then used in turn to express queries applied to one or more data corpora 306. Based on the application of the query to one or more data corpora 306, a set of hypotheses or candidate answers/responses to the input question/request is generated by looking up in the one or more data corpora 306 (hereinafter referred to simply as corpora 306) portions of the one or more data corpora 306 that have some valuable responses potentially containing responses to the input question/response (hereinafter assumed to be an input question). Then, IBM Watson ^TM The pipeline 308 of the cognitive system uses various inference algorithms to conduct in-depth analysis of the language of the input problem and the language used in each of the portions of the corpus 306 found during the query application.

The scores obtained from the various inference algorithms are then weighted against a statistical model that summarizes the IBM Watson in this example ^TM The pipeline 308 of the cognitive system 300 has confidence in the evidence that potential candidate responses are inferred from the questions. The process is repeated for each of the candidate answers to generate an ordered list of candidate answers, which may then be presented to the user submitting the input question (e.g., the user of client computing device 310), or the final answer may be selected from the list and presented to the user. Regarding IBM Watson ^TM More information for pipeline 308 of cognitive system 300 may be obtained, for example, from IBM corporation's website, IBM red book, etc. For example, regarding IBM Watson ^TM Information on cognitive system pipelines can be found in the age of "Watson and healthcare" (Watson and Healthcare), IBM developerWorks,2011, and Rob High "cognitive systems by Yuan et al: in depth, IBM Watson and its working principle "(The Era of Cognitive Systems: an Inside Look at IBM Watson and How it Works), found in IBM Redbooks, 2012.

As shown in fig. 3, the cognitive system 300 is further augmented to include logic implemented in dedicated hardware, software executing on hardware, or any combination of dedicated hardware and software executing on hardware for implementing a Generated Antagonism Network (GAN) response ranking engine 320, in accordance with the mechanisms of the illustrative embodiments. The GAN answer ordering engine 320 may operate in conjunction with the awareness system 300 for use in selecting/generating responses to requests processed by the request processing pipeline 308 (e.g., answers to input questions submitted by the client computing devices 310, 312). For example, a user of a client computing device 310 may submit natural language questions to the cognitive system 300 executing on the server 304A via the network 302. The input questions are processed via pipeline 308, which operates in conjunction with GAN response generation engine 320 to generate candidate answers to the input questions and to select candidate answers based on the ordering of the candidate answers. According to an illustrative embodiment, the ordering may be based at least in part on the n-gram bags generated by the GAN response generation engine 320.

As shown in FIG. 3, the GAN response generation engine 320 includes a noise input generator 322, GAN logic 324, an n-gram bag (BoN) output data structure 326, and a BoN response comparator 328. The noise input generator 322 generates a noise vector z for training/runtime operation of the GAN 324. In the training mode of operation 324, the noise input vector z generated by the noise input generator 322 is operated in conjunction with the word embedding matrix of the vocabulary V to train the generator of GAN324 to generate the BoN vector output G (z) of the discriminator of the spoofed GAN324, as previously described above with reference to FIGS. 1A and 1B. During runtime operations, such as shown in fig. 2A and 2B above, GAN324 utilizes noise input vector z from noise input generator 322, along with a word embedded vector representation of an input request (e.g., natural language question), and a word embedded matrix based on vocabulary V of n-grams in vocabulary database 321, to generate an n-gram bag of n-grams representing actual responses to the input question for comparison with n-grams of candidate responses generated by pipeline 308.

During operation, GAN 324 generates BoN output data structure 326, such as G (z, q) in fig. 2A and 2B, based on input requests to pipeline 308. The BoN output data structure 326 is input to a BoN response comparator 328, which compares the BoN output data structure 326 to the n-gram associated with the candidate response (e.g., candidate reply) generated by the pipeline 308. The degree of match between the n-gram in the BoN output data structure 326 and the n-gram of each of the candidate responses is calculated and associated with each of the candidate responses. The BoN correspondence score calculated for each candidate response may be used alone or in combination with other candidate response ranking criteria to generate a relative ranking of candidate responses. One or more final responses may be selected based on the relative ordering, e.g., the highest ordered candidate response may be selected as the final response. The selected response(s) may then be returned to the requestor, such as the client computing device 310 from which the request (e.g., the input question) was received.

As described above, the mechanisms of the illustrative embodiments stem from the computer technology field and are implemented using logic residing in such computing or data processing systems. These computing or data processing systems are specifically configured to perform the various operations described above by hardware, software, or a combination of hardware and software. FIG. 4 is thus provided as an example of one type of data processing system in which aspects of the present invention may be implemented. Many other types of data processing systems likewise may be configured to specifically implement the mechanisms of the illustrative embodiments.

FIG. 4 is a block diagram of an example data processing system in which aspects of the illustrative embodiments may be implemented. Data processing system 400 is an example of a computer, such as server 304A or client 10 in FIG. 3, in which computer usable code or instructions implementing the processes for illustrative embodiments of the present invention may be located. In one illustrative embodiment, FIG. 4 shows a server computing device (such as server 304A) implementing the cognitive system 300 and a QA system pipeline 308 augmented to include the additional mechanisms of the illustrative embodiments described below.

In the depicted example, data processing system 400 employs a hub architecture including a north bridge and memory controller hub (north bridge and memory controller hub, NB/MCH) 402 and a south bridge and input/output (I/O) controller hub (south bridge and I/O controller hub, SB/ICH) 404. Processing unit 406, main memory 408, and graphics processor 410 are connected to NB/MCH 402. Graphics processor 10 is connected to NB/MCH 402 through an accelerated graphics port (accelerated graphics port, AGP).

In the depicted example, local area network (local area network, LAN) adapter 412 connects to SB/ICH 404. Audio adapter 416, keyboard and mouse adapter 420, modem 422, read Only Memory (ROM) 424, hard Disk Drive (HDD) 426, CD-ROM drive 430, universal serial bus (universal serial bus, USB) ports and other communications ports 432, and PCI/PCIe devices 434 connect to SB/ICH 404 through bus 438 and bus 440. PCI/PCIe devices may include, for example, ethernet adapters, add-in cards, and PC cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 24 may be, for example, a flash basic input/output system (BIOS).

HDD426 and CD-ROM drive 430 are connected to SB/ICH 404 via bus 440. The HDD426 and CD-ROM drive 430 may use, for example, integrated drive electronics (integrated drive electronics, IDE) or serial advanced technology attachment (serial advanced technology attachment, SATA) interfaces. A Super I/O (SIO) device 436 is connected to SB/ICH 404.

An operating system runs on processing unit 406. The operating system coordinates and provides control of various components within data processing system 400 in FIG. 4. As clients, the operating system is a commercial operating system, such asObject oriented programming system (such as Java ^TM Programming system) may run in conjunction with an operating system and from Java ^TM Programs, or applications executing on data processing system 400, provide calls to the operating system.

As a server, data processing system 400 can be, for example, a runtime advanced interactive executive (Advanced Interactive Executive,) Operating System or +.>Operating System->eServer ^TM System/>A computer system. Data processing system 400 may be a symmetric multiprocessor system including a plurality of processors in processing unit 406. Alternatively, a single processor system may be employed.

Instructions for the operating system, the object-oriented programming system, and applications or programs are located on storage devices, such as HDD426, and are loaded into main memory 408 for execution by processing unit 406. The processes for illustrative embodiments of the present invention are performed by processing unit 406 using computer usable program code, which may be located in a memory such as, for example, main memory 408, rom424, or in one or more peripheral devices 426 and 430.

A bus system, such as bus 438 or bus 440 shown in fig. 4, includes one or more buses. Of course, the bus system may be implemented using any type of communication fabric or architecture that provides for a transfer of data between different components or devices attached to the fabric or architecture. A communication unit, such as modem 422 or network adapter 412 of fig. 4, includes one or more devices for sending and receiving data. A memory may be, for example, main memory 408, ROM 424, or a cache such as found in NB/MCH 402 in FIG. 4.

Those of ordinary skill in the art will appreciate that the hardware depicted in fig. 3 and 4 may vary depending on the implementation. Other internal hardware or peripheral devices, such as flash memory, equivalent nonvolatile memory, or optical disk drives and the like, may be used in addition to or in place of the hardware depicted in figures 3 and 4. Also, the processes of the illustrative embodiments may be applied to a multiprocessor data processing system, other than the SMP system mentioned previously, without departing from the spirit and scope of the present invention.

Furthermore, data processing system 400 may take the form of any of a number of different data processing systems including client computing devices, server computing devices, tablet computers, laptop computers, telephone or other communication devices, personal digital assistants (personal digital assistant, PDAs), and the like. In some illustrative examples, data processing system 400 may be, for example, a portable computing device configured with flash memory to provide non-volatile memory for storing operating system files and/or user-generated data. Essentially, data processing system 400 may be any known or later developed data processing system without architectural limitation.

Fig. 5 shows an example of a cognitive system processing pipeline, which in the depicted example is a question-answering (QA) system pipeline for processing input questions, in accordance with one illustrative embodiment. As noted above, the cognitive systems with which the illustrative embodiments may be used are not limited to QA systems and, thus, are not limited to using QA system pipelines. Fig. 5 is provided as only one example of a processing structure that may be implemented to process the operation of a request-aware system to present natural language input in response to or as a result of the natural language input.

For example, the QA system pipeline of FIG. 5 may be implemented as QA pipeline 308 of cognitive system 300 in FIG. 3. It should be appreciated that the various stages of the QA pipeline shown in FIG. 5 are implemented as one or more software engines, components, etc. configured with logic for implementing functionality attributed to a particular stage. Each stage is implemented using one or more of such software engines, components, etc. A software engine, component, etc. executes on one or more processors of one or more data processing systems or devices and utilizes or operates on data stored in one or more data storage devices, memories, etc. on one or more of the data processing systems. For example, the QA pipeline of FIG. 5 augments in one or more stages to implement the improved mechanisms of the illustrative embodiments described below, additional stages may be provided to implement the improved mechanisms, or logic separate from the pipeline 500 may be provided for interfacing with the pipeline 500 and implementing the improved functionality and operation of the illustrative embodiments.

As shown in FIG. 5, the QA pipeline 500 includes a plurality of stages 510-580 through which the cognitive system operates to analyze input problems and generate a final response. At an initial questions input stage 510, the QA pipeline 500 receives input questions presented in a natural language format. That is, the user inputs via the user interface an input question that the user wishes to obtain a response, for example, "who is the most intimate consultant of washington? In response to receiving an input question, the next stage of the QA pipeline 500 (i.e., question and topic analysis stage 520) parses the input question using Natural Language Processing (NLP) techniques to extract the dominant features from the input question and categorize the dominant features according to type (e.g., name, date, or any of a number of other defined topics). For example, in the example questions above, the term "who" may be associated with a topic of "people" indicating the identity of the person being sought, "washington" may be identified as the proper name of the person with whom the question is associated, "most intimate" may be identified as a word indicating proximity or relationship, and "advisor" may indicate nouns or other language topics.

In addition, the extracted main features include keywords and phrases classified into question characteristics such as the focus of the question, the lexical answer type of the question (lexical answer type, LAT), etc. As described herein, a Lexical Answer Type (LAT) is a word in or inferred from an input question that indicates the type of answer, independent of the assigned semantics to the word. For example, in the question "what was invented in the 16 th century to accelerate a game and involved two pieces of the same color? In "LAT is the string" maneuver ". The focus of the question is a part of the question that if replaced by a response makes the question a separate statement. For example, in the question "what drugs proved to alleviate ADD symptoms, with relatively few side effects? In "the focus is" drug "because if this word is replaced with a response, for example, the response" aldearl "may be used instead of the term" drug "to produce the sentence" aldearl has been shown to alleviate the symptoms of ADD with relatively few side effects. The focus typically, but not always, contains LAT. On the other hand, in many cases it is not possible to infer a meaningful LAT from the focus.

Referring again to FIG. 5, the identified primary features are then used during a problem decomposition stage 530 to decompose the problem into one or more queries that are applied to the corpus 545 of data/information to generate one or more hypotheses. The queries are generated in any known or later developed query language, such as the structured query language (Structure Query Language, SQL), and the like. The query is applied to one or more databases storing information about electronic text, documents, articles, websites, etc., that make up the data/information corpus 545. That is, these different sources themselves, different sets of sources, etc., represent different corpora 547 within the plurality of corpora 545. Depending on the particular implementation, different corpora 547 may be defined for different sets of documents based on various criteria. For example, different corpora may be built for different topics, topic categories, information sources, and the like. As one example, a first corpus may be associated with healthcare documents and a second corpus may be associated with financial documents. Alternatively, one corpus may be documents published by the U.S. department of Energy (the U.S. device of Energy), while another corpus may be IBM red book documents. Any collection of content having some similar attributes may be considered a corpus 547 of multiple corpora 545.

The query is applied to one or more databases storing information about electronic text, documents, articles, websites, and the like, which constitute a data/information corpus, such as data corpus 306 in FIG. 3. The query is applied to the data/information corpus at hypothesis generation stage 540 to generate results identifying potential hypotheses for replying to the input problem, which may then be evaluated. That is, application of a query results in extraction of portions of the data/information corpus that match the criteria of the particular query. These portions of the corpus are then analyzed and used during the hypothesis generation stage 540 to generate hypotheses for answering the input questions. These hypotheses are also referred to herein as "candidate answers" to the input problem. For any input problem, at this stage 540, there may be hundreds of hypotheses or candidate responses that may need to be evaluated.

Then, at stage 550, the QA pipeline 500 performs in-depth analysis and comparison of the language of the input question and the language of each hypothesis or "candidate answer" and performs evidence scoring to assess the likelihood that a particular hypothesis is a correct answer to the input question. As described above, this involves using multiple inference algorithms, each of which performs a separate type of analysis on the language of the input problem and/or the content of the corpus, which analysis provides evidence of support or non-support of the hypothesis. Each inference algorithm generates a score based on the analysis it performs, which score indicates a measure of the relevance of the various parts of the data/information corpus extracted by the application query (measure) and a measure of the correctness of the corresponding hypothesis, i.e., a measure of the confidence of the hypothesis. Depending on the particular analysis performed, there are various methods of generating such scores. However, in general, these algorithms find a particular term, phrase or pattern of text that indicates the term, phrase or pattern of interest and determine a degree of match that has a higher degree of match that is given a relatively higher score than a lower degree of match.

Thus, for example, an algorithm may be configured to find precise terms from an input question or synonyms for the term (e.g., precise terms or synonyms for the term "movie") in an input question, and generate a score based on the frequency of use of these precise terms or synonyms. In this case, an exact match will be given the highest score, while synonyms may be given a lower score based on the relative ordering of the synonyms (which may be specified by the subject matter expert (person with knowledge of the particular domain and terms used) or automatically determined from the frequency of use of the synonyms in the corpus corresponding to that domain). Thus, for example, an exact match of the term "movie" in the content of a corpus (also called evidence or evidence paragraph) is given the highest score. Synonyms for movies, such as "motion picture", may be given a lower score but still higher than synonyms of the "film" or "show of images (moving picture show)" type. Instances of exact matches and synonyms for each evidence paragraph can be compiled and used in a quantitative function to generate a score for how well the evidence paragraph matches an input question.

Thus, for example, "what is the first movie? The hypothesis or candidate response to the "input question" is "horse in motion (The Horse in Motion)". If the evidence paragraph contains a statement "the first image since history was an" horse in motion "taken by Ade Wo De Mibriqi (Eadweard Muybridge) in 1878. This is a movie "about which the horse runs, and the algorithm is looking for an exact match or synonym of the focus of the input problem (i.e." movie ") and then finds an exact match of" movie "in the second sentence of the evidence paragraph, and can find a high scoring synonym of" movie ", i.e." image ", in the first sentence of the evidence paragraph. This may be combined with further analysis of the passage of evidence to identify that text of the candidate response is also present in the passage of evidence, i.e. "horse in motion" may combine these factors to give this passage of evidence a relatively high score as supporting evidence that the candidate response "horse in motion" is a correct response.

It should be understood that this is just one simple example of how the scoring may be performed. Many other algorithms of various complexity may be used to generate scores for candidate responses and evidence without departing from the spirit and scope of the invention.

In the synthesis stage 560, a number of scores generated by various inference algorithms are synthesized into confidence scores or confidence measures for various hypotheses. The process involves applying weights to the various scores, where the weights have been determined by training of the statistical model employed by the QA pipeline 500 and/or have been dynamically updated. For example, the weight of the score generated by the algorithm that identifies the exact matching term and synonym may be set relatively higher than other algorithms that evaluate the release date of the evidence passage. The weights themselves may be specified by subject matter experts or learned through a machine learning process that evaluates the importance of characteristic evidence paragraphs and their relative importance to the overall candidate response generation.

The weighted scores are processed according to a statistical model generated by training of the QA pipeline 500 that identifies the manner in which these scores can be combined to generate confidence scores or measures for individual hypotheses or candidate responses. This confidence score or measure summarizes the confidence level that the QA pipeline 500 has with respect to evidence of candidate answers inferred by the input question (i.e., the candidate answer is the correct answer to the input question).

The final confidence score or measure is processed by a final confidence merge and sort stage 570 that compares the confidence score and measure to each other, to a predetermined threshold, or performs any other analysis on the confidence score to determine which hypothesis/candidate responses are most likely to be correct responses to the input problem. The hypothesis/candidate responses are ranked according to these comparisons to generate a ranked list of hypothesis/candidate responses (hereinafter referred to as "candidate responses"). At stage 580, a final answer and confidence score, or final set of candidate answers and confidence scores, is generated from the ordered list of candidate answers and output to the presenter of the original input question via a graphical user interface or other mechanism for outputting information.

As shown in FIG. 5, according to one illustrative embodiment, GAN response generation engine 320 is used to evaluate candidate hypotheses (e.g., candidate answers) generated by hypothesis generation stage 540 with respect to n-gram bag (BoN) output data structure 326. Suppose GAN 324 has been trained in a manner similar to that described above with respect to fig. 1A and 1B. As previously described, GAN 324 receives as inputs the noise vector z generated by noise input generator 322, the word embedding matrix of vocabulary V, and the encoded n-gram vector corresponding to the input question from stage 510, which is generated by vector generator logic4 of GAN response generation engine 320 based on the input question from stage 510 that is provided to GAN response engine 320. GAN 324 generates an n-gram bag (BoN) output data structure 326, such as G (z, q) in fig. 2A and 2B. The output data structure 326 is an n-gram bag vector representation (e.g., a bag of words) in which each slot in the vector represents the probability that the corresponding n-gram will appear in the correct answer to the input question. The vector representation may include only those n-grams that are actually in the correct answer to the input question.

It should be appreciated that if the same or similar question is received again from the same or a different client device, boN output data structure 326 may be stored in a database (not shown) in association with the question for retrieval. A lookup operation may be performed on such a database based on the n-gram of the input question, the extracted features of the input question, etc., and the corresponding BoN output data structure 326 may be retrieved. In this manner, the need to generate BoN output data structure 326 each time the same or similar problem is received for processing may be avoided.

The hypotheses generated in hypothesis generation stage 540 may be converted by vector generator 323 into encoded n-gram vectors for comparison with BoN output data structure 326. The hypothesized n-grams are compared to the n-grams in BoN output data structure 326 to calculate a measure of the number of matching n-grams per hypothesis (e.g., per candidate response). This may simply be a count of the number of n-grams matched, or it may be a statistical measure generated using any suitable statistical measure of the relevance of the hypothesized n-grams to the n-grams of the BoN output data structure 326 of the input problem. These metrics are referred to herein as BoN relevance scores. The GAN response generation engine 320 then returns the BoN relevance score for each of the hypotheses to the hypothesis and evidence assessment scale 550. Hypothesis and evidence scoring 550 may rank the hypotheses relative to each other using only the BoN relevance score, or in combination with other criteria for scoring the hypotheses as described above. Thereafter, pipeline 500 operates as discussed above to select the final answer that is output by pipeline 500 and returned to the initiator of the input problem.

Thus, the illustrative embodiments provide a mechanism for improving answer selection in cognitive systems such as question and answer systems by providing an n-gram bag evaluation for selecting the correct answer to a question. GAN provides an accurate representation of the n-grams that may occur in the correct answer to an input question, and these n-grams may be used to evaluate candidate answers, thereby improving the accuracy of the final answer generated.

FIG. 6 is a flowchart outlining an exemplary operation for training a GAN to generate an n-gram bag for use in performing natural language processing operations in accordance with one illustrative embodiment. Steps 610 through 618 may be performed in the generator of the GAN, while steps 620 through 626 may be performed by the discriminator of the GAN. Steps 628 to 634 may be performed by training logic to modify the operation of the neural network of the generator as required to maximize errors in the discriminator, i.e. spoof the discriminator.

As shown in fig. 6, the operation begins by generating a noise input vector (step 610). The noise input vector is projected and copied to generate a matrix (step 612). An n-gram embedding matrix is generated for the vocabulary (step 614) and concatenated with the matrix generated from the projection/replication of the noise input vector (step 616). Each row of the concatenation matrix is processed by a neural network to generate an n-gram bag of n-gram (BoN) vector data structure representing an n-gram probability distribution of the vocabulary (step 618).

The BoN vector data structure is input to a GAN discriminator (step 620), which multiplies the BoN vector data structure by the n-gram embedding matrix of the n-gram vocabulary (step 622). The results are projected and pooling is performed to generate a vector of maxima for each column in the resulting matrix (step 624). The vector is processed by the neural network to generate an indication as to whether the BoN vector data structure represents a true BoN or a false BoN (i.e., whether the discriminator considers the BoN vector data structure to be generated from actual natural language content or alternatively synthesized) (step 626).

A determination is made as to whether the discriminator is correct in terms of determination (step 628). If so, the operation of the neural network in the generator is modified (e.g., the weights are modified) to attempt to improve the BoN output data structure so that it spoofs the discriminator (step 630). If the output is incorrect, a determination is made as to whether the training is complete, i.e., whether the GAN has converged (step 632). It is well known that GAN has converged when the probability given by the discriminator for both the generated instance and the real instance is a predetermined value (e.g., 0.5). When this occurs, the discriminator cannot distinguish between the generated instance and the actual instance. If training is complete, the operation terminates. Otherwise, training continues by modifying the neural network operation (step 630) and returning to step 610. It should be noted that after training, the discriminator is no longer used, only the generator is utilized.

Fig. 7 is a flowchart outlining an exemplary operation for performing answer selection using a trained GAN in accordance with one illustrative embodiment. As shown in FIG. 7, the operation begins by receiving an input question (step 710). An n-gram bag of grammar (BoN) output data structure is generated based on the noise input, the word embedding of the input problem, and the word embedding matrix of the vocabulary (step 712). The QA system generates candidate answers for the input questions via the QA pipeline (step 714). Encoded n-gram vectors are generated for each candidate answer (step 716) and these encoded n-gram vectors are compared to a BoN output data structure to generate a BoN relevance score (step 718).

The candidate answers are ranked relative to each other based on the BoN relevance score (step 720). A final answer to the input question is selected based on the relative ordering of the candidate answers (step 722). The selected final answer is then output as a response to the input question (step 724). The operation then terminates.

As noted above, it should be understood that the illustrative embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In one example embodiment, the mechanisms of the illustrative embodiments are implemented in software or program code that includes, but is not limited to, firmware, resident software, microcode, etc.

A data processing system suitable for storing and/or executing program code will include, for example, at least one processor coupled directly or indirectly to memory elements through a communication bus (such as a system bus). The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. The memory may be of various types including, but not limited to ROM, PROM, EPROM, EEPROM, DRAM, SRAM, flash memory, solid state memory, and the like.

Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening wired or wireless I/O interfaces and/or controllers, etc. In addition to conventional keyboards, displays, pointing devices, etc., the I/O devices may take many different forms, such as communication devices coupled by wired or wireless connections, including but not limited to smart phones, tablet computers, touch screen devices, voice recognition devices, etc. Any known or later developed I/O devices are intended to be within the scope of the illustrative embodiments.

Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters for wired communications. Wireless communication based network adapters may also be utilized including, but not limited to, 802.11a/b/g/n wireless communication adapters, bluetooth wireless adapters, and the like. Any known or later developed network adapter is intended to be within the spirit and scope of the present invention.

The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the embodiments. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. The terminology used herein is chosen to best explain the principles of the embodiments, the practical application, or the technical improvement of commercially available technology, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method in a data processing system comprising at least one processor and at least one memory including instructions executable by the at least one processor to configure the processor to implement a generative countermeasure network GAN for natural language processing, the method comprising:

configuring a generator neural network of the GAN to generate an n-gram bag BoN output based on a noise vector input;

configuring a discriminator neural network of the GAN to receive a BoN input, wherein the BoN input is a BoN output from the generator neural network or a BoN input associated with an actual portion of natural language text;

configuring a discriminator neural network of the GAN to output an indication of the probability of whether the input BoN is from an actual portion of natural language text or a BoN output of the generator neural network; and

the generator neural network and the discriminator neural network are trained based on a feedback mechanism that compares an output indication from the discriminator neural network with an indication of whether the input BoN is from an actual portion of natural language text or the BoN output of the generator neural network.

2. The method of claim 1, wherein the generator neural network generates the BoN output as a vector output, and wherein each vector slot in the vector output of the BoN output is set to a value indicating a probability of whether the corresponding n-gram is in the BoN.

3. The method of claim 1, wherein the discriminator neural network performs one or more statistical analysis operations or feature extraction analysis operations on a BoN output of the generator neural network to score the BoN output and generate an indication of a probability of whether the BoN output is from an actual portion of natural language text.

4. The method of claim 3, wherein the one or more statistical analysis operations or feature extraction analysis operations performed on the BoN output comprise at least one of term frequency analysis or inverse document frequency analysis.

5. The method of claim 1, wherein during training of the GAN, the generator neural network:

receiving a noise vector input;

projecting and copying the noise vector input to form a first matrix data structure;

retrieving an embedding of each n-gram in a vocabulary, the vocabulary comprising a corpus of n-grams that can be represented in the BoN;

Generating a second matrix based on the retrieved embeddings, wherein each embedment is represented as a row in the second matrix;

concatenating the first matrix and the second matrix to generate a concatenated matrix;

inputting each row of the series matrix into a neural network; and is also provided with

Each row of the series matrix is processed through the neural network to generate the BoN output.

6. The method of claim 5, wherein each row of the series matrix comprises a first portion corresponding to a first matrix and a second portion corresponding to a second matrix.

7. The method of claim 5, wherein the neural network is a multi-layer perceptron that uses modified linear units as an activation function of an output layer of the neural network, and wherein the neural network outputs a value indicative of a probability that a corresponding n-gram is present in the BoN based on the noise vector input.

8. The method of claim 1, wherein during training of the GAN, the discriminator neural network:

receiving the BoN input;

Generating a first matrix based on the retrieved embeddings, wherein each embedment is represented as a row in the first matrix;

multiplying the BoN input with a first matrix;

projecting the result of multiplying the BoN input with the first matrix to generate a second matrix;

performing sum pooling on the second matrix to generate a feature vector output; and is also provided with

The feature vector output is processed via a neural network to generate an output indicative of whether the BoN input is from an actual portion of natural language text or a BoN output of the generator neural network.

9. The method of claim 8, wherein the neural network is a multi-layer perceptron having a sigmoid activation function in an output layer of the multi-layer perceptron.

10. The method of claim 1, further comprising:

generating, from the trained GAN, a BoN output representing an n-gram bag of near-real natural language text; and is also provided with

Based on the BoN output generated by the trained GAN, natural language processing is performed on a portion of the natural language text.

11. A computer readable storage medium storing a computer readable program, wherein the computer readable program when executed on a computing device configures the computing device to implement a generative countermeasure network GAN for natural language processing, and cause the computing device to:

12. The computer-readable storage medium of claim 11, wherein the generator neural network generates the BoN output as a vector output, and wherein each vector slot in the vector output of the BoN output is set to a value indicating a probability of whether the corresponding n-gram is in the BoN.

13. The computer-readable storage medium of claim 11, wherein the discriminator neural network performs one or more statistical analysis operations or feature extraction analysis operations on a BoN output of the generator neural network to score the BoN output and generate an indication of a probability of whether the BoN output is from an actual portion of natural language text.

14. The computer-readable storage medium of claim 13, wherein the one or more statistical analysis operations or feature extraction analysis operations performed on the BoN output comprise at least one of term frequency analysis or inverse document frequency analysis.

15. The computer-readable storage medium of claim 11, wherein during training of the GAN, the generator neural network:

receiving a noise vector input;

16. The computer-readable storage medium of claim 15, wherein each row of the series matrix comprises a first portion corresponding to a first matrix and a second portion corresponding to a second matrix.

17. The computer-readable storage medium of claim 15, wherein the neural network is a multi-layer perceptron that uses modified linear units as an activation function of an output layer of the neural network, and wherein the neural network outputs a value indicative of a probability that a corresponding n-gram is present in the BoN based on the noise vector input.

18. The computer-readable storage medium of claim 11, wherein during training of the GAN, the discriminator neural network:

receiving the BoN input;

Multiplying the BoN input with a first matrix;

19. The computer-readable storage medium of claim 18, wherein the neural network is a multi-layer perceptron having a sigmoid activation function in an output layer of the multi-layer perceptron.

20. An apparatus, comprising:

at least one processor; and

at least one memory coupled to the at least one processor, wherein the at least one memory includes instructions that, when executed by the at least one processor, configure the at least one processor to implement a generative antagonism network GAN for natural language processing, and cause the at least one processor to:

21. A computer system comprising modules for performing the steps of the method of any one of claims 1-10, respectively.