CN111813928A

CN111813928A - Evaluating text classification anomalies predicted by a text classification model

Info

Publication number: CN111813928A
Application number: CN202010273725.3A
Authority: CN
Inventors: 谭铭; S·波达尔; L·克里希纳默西
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2019-04-10
Filing date: 2020-04-09
Publication date: 2020-10-23

Abstract

In response to running the at least one test phrase on a pre-trained text classifier and identifying individual predictive classification tags based on the score calculated for each respective at least one test phrase, the text classifier decomposes the plurality of extracted features aggregated in the score into word-level scores for each word in the at least one test phrase. The text classifier assigns a separate heatmap value to each word-level score, with each respective separate heatmap value reflecting a weight for each word-level score. The text classifier outputs individual predictive classification labels and each individual heat map value reflecting a weight for each word-level score for use in defining a heat map identifying contributions of each word in the at least one test phrase to the individual predictive classification labels to facilitate evaluation of text classification anomalies by the client.

Description

Evaluating text classification anomalies predicted by a text classification model

Technical Field

One or more embodiments of the invention relate generally to data processing and, in particular, to evaluating text classification anomalies predicted by a text classification model.

Description of the Related Art

Machine learning plays an important role in many Artificial Intelligence (AI) applications. One of the achievements of the process of training a machine learning application is a data object, called a model used in text classification, which is a parametric representation of patterns (patterns) inferred from training data. After the model is created, the model is deployed into one or more environments for use in text classification. At runtime, the model is the core of the machine learning system based on the number of hours developed and the structure of the large volume of data production.

Disclosure of Invention

In one embodiment, a method involves: in response to running the at least one test phrase on a pre-trained text classifier and identifying individual predictive classification tags based on the score calculated for each respective at least one test phrase, decomposing, by the computer system, the plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase. The method involves assigning, by the computer system, a separate heatmap value to each of the plurality of word-level scores, each respective separate heatmap value reflecting a weight of each of the plurality of word-level scores. The method involves outputting, by the computer system, a separate predictive classification label and each separate heat map value reflecting a weight of each word-level score of the plurality of word-level scores for defining a heat map identifying a contribution of each word of the at least one test phrase to the separate predictive classification label.

In another embodiment, a computer system comprises: one or more processors, one or more computer-readable memories, one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories. The stored program instructions include: program instructions that decompose a plurality of extracted features aggregated in a score into a plurality of word-level scores for each word in at least one test phrase in response to running the at least one test phrase on a pre-trained text classifier and identifying individual predictive classification tags based on the score calculated for each respective at least one test phrase. The stored program instructions include: program instructions that assign a separate heatmap value to each of the plurality of word-level scores, each respective separate heatmap value reflecting a weight of each of the plurality of word-level scores. The stored program instructions include: program instructions to output a separate predictive classification tag and each separate heat map value reflecting a weight of each word-level score of the plurality of word-level scores for defining a heat map identifying contributions of each word of the at least one test phrase to the separate predictive classification tag.

In another embodiment, a computer program product comprises: a computer readable storage medium having program instructions embodied therein, wherein the computer readable storage medium is not a transitory signal per se. The program instructions are executable by a computer to cause the computer to, in response to running at least one test phrase on a pre-trained text classifier and identifying individual predictive classification tags based on a score calculated for each respective at least one test phrase, decompose a plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase. The program instructions are executable by a computer to cause the computer to: assigning, by the computer, a separate heatmap value to each of the plurality of word-level scores, each respective separate heatmap value reflecting a weight of each of the plurality of word-level scores. The program instructions are executable by a computer to cause the computer to: outputting, by the computer, the individual predictive classification label and each individual heat map value reflecting a weight of each word-level score of the plurality of word-level scores for defining a heat map identifying a contribution of each word of the at least one test phrase to the individual predictive classification label.

Drawings

The novel features believed characteristic of one or more embodiments of the invention are set forth in the appended claims. One or more embodiments of the invention itself, however, will be best understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:

FIG. 1 illustrates one example of a block diagram of a text classifier service for facilitating the creation and training of text classifiers that classify text by labels;

FIG. 2 illustrates one example of a block diagram of a text classifier service for providing information about text classification anomalies predicted by a text classifier during a text classifier test;

FIG. 3 illustrates one example of a word-level analysis element evaluated by the word-level analysis component at the text classifier level;

FIG. 4 shows one example of a table illustrating examples of types of extracted features that are decomposed for determining per-word feature scores;

FIG. 5 illustrates one example of a word-level heat map reflecting real situation heat maps compared to a test heat map based on test phrases tested on a trained model;

FIG. 6 illustrates one example of a block diagram of a word-level heat map reflecting a heat map of k preferred significant words based on tokens of test phrases tested on a trained model;

FIG. 7 illustrates one example of a computer system in which one embodiment of the invention may be implemented;

FIG. 8 depicts a high level logic flowchart of a process and computer program for creating and training a classifier model;

FIG. 9 illustrates a high level logic flowchart of a process and computer program for updating a trained classifier model;

FIG. 10 depicts a high level logical flowchart of a process and computer program for analyzing predicted classifications to determine a heat map level at the word level indicating word level contributions to the predicted classifications of test phrases and the classification tags of a trained model;

FIG. 11 depicts a high level logic flowchart of a process and computer program for outputting a predicted classification with a visual indicator having an impact on the predicted classification based on the corresponding word level heat map level that most impacts the classification label;

FIG. 12 depicts a high level logic flowchart of a process and computer program for outputting a predicted classification with a visual indicator having an impact on the predicted classification based on a k-preferred word list of words that are trained to focus on the most influential classification labels according to the respective k-preferred heat map levels; and

FIG. 13 illustrates a high level logic flowchart of a process and computer program for training in support of updating of a text classifier that highlights class label training for identified anomalies.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

Additionally, in the following description, for purposes of illustration, a number of systems are described. It should be noted and will be apparent to those skilled in the art that the present invention may be implemented in a variety of systems, including a variety of computer systems and electronic devices running any number of different types of operating systems.

FIG. 1 illustrates a block diagram of a text classifier service for facilitating the creation and training of text classifiers that classify text by labels.

In one example, machine learning plays an important role in artificial intelligence based applications that interact with one or more Natural Language Processing (NLP) systems. For example, AI-based applications can include, but are not limited to: speech recognition, natural language processing, audio recognition, visual scene analysis, email filtering, social network filtering, machine translation, data leakage, optical character recognition, collation learning, and bioinformatics. In one example, selection of an AI-based application may involve a computer system, which may be running in one or more types of computing environments, performing tasks that require one or more types of text classification analysis. In one example, machine learning may represent one or more types of AIs that train a machine based on data and algorithms that learn from and predict the data. One of the main achievements of the process of creating and training a machine learning environment is the data objects (called models) built from sample inputs. In one example, the model 112 represents a data object of a machine learning environment.

In one example, to create and train the model 112, a user (such as client 120) submits an initial training set, such as the real-world training set 108, to the text classifier service 110. In one example, the real estate training set 108 includes one or more words and multi-word phrases, each of which is identified with a label of a plurality of classification labels identified by the user for training the model 112. For example, the user may select a flag identifying the type of action, such as "on" or "off," and assign a flag of "on" or "off" to each selection that the customer may enter a word or multi-word phrase as a requirement for turning on or off the service, such as the phrase "add service" with the "on" flag and the word "disconnect" with the "off" flag. In one example, the real estate training set 108 may include one or more commercially available training sets. In another example, the real-world training set 108 may include one or more user-generated training sets, such as training sets of words or phrases collected from conversational dialog profiles that have been tagged by users. In another example, the real estate training set 108 may include one or more purpose-specific auto-training sets collected and labeled by an auto-training set generation service.

In this example, the text classifier service 110 creates an instance of the model 112 in the text classifier 102 and trains the model 112 by applying the truth training set 108. The text classifier 102 represents an example of a model 112 that is combined with the scorer 104 and trained by the truth training set 108. In one example, the model 112 represents a parametric representation of patterns inferred from the real-world training set 108 during the training process. In one example, the text classifier service 110 represents an entity that provides a service for use by clients (such as client 120) for the clients to create and train instances of the model 112 in the text classifier 102. For example, the text classifier service 110 represents a cloud service provider for providing the text classifier 102 as a service through one or more applications selected by the client 120. In another example, the text classifier service 110 represents one or more programmatic interfaces by which the client 120 invokes particular functions to create instances of the model 112 in the text classifier 102 and invokes particular functions to train the text classifier 102 based on the real world training set 108. In additional or alternative embodiments, client 120 may interact with text classifier 102 through additional or alternative interfaces and connections.

In one example, after training, the client 120 may then test the text classifier 102 before deploying the text classifier 102 for access by one or more client applications that provide text classification services (such as intent classification, semantic analysis, or file classification for dialog systems). During training and after being deployed, a user may submit text to the text classifier 102. In response to a text submission, text classifier 102 predicts a classification label for the text and returns a predicted classification label indicating which type of text has been received.

In this example, the text classifier 102 may respond to test submissions with classification labels that are not correctly predicted when trained by the truth training set 108. In one example, when text classifier 102 makes an incorrect prediction, the incorrect prediction refers to an anomaly in the classification of text performed by text classifier 102.

The text classifier service 110 enables the client 120 to test the text classifier 102 and update the truth training set 108 for additional training of the text classifier 102 to adjust the prediction accuracy of the text classifier 102 for use by the client. In particular, the client 120 relies on the text classifier 102 to provide accurate classification, however, the accuracy of the classification predictions made by a particular instance of the model in the text classifier 102 may be significantly affected by the distribution of training data in the training set of real-world conditions 108 as originally used to train the model 112 and by additional training data submitted by the client 120 in response to anomalies detected when testing the text classifier 102. Therefore, it is desirable that the text classifier service 110 also provide the user with information about text classification anomalies beyond incorrectly predicted classification labels to enable the user to efficiently and effectively evaluate corrections to the training set of truth 108 that the user may be attempting to train data patterns in the model 112 of the text classifier 102 to improve prediction accuracy.

FIG. 2 illustrates a block diagram of a text classifier service for providing information about text classification anomalies predicted by a text classifier during a text classifier test.

In one example, in the testing phase, a testing interface (e.g., testing controller 208) of client 120 submits text 222 to text classifier 102, for example, through an application programming interface call to text classifier service 110 or a function call directly to text classifier 102. In this example, text 222 may represent a test sample of one or more words or phrases from test set 220. In this example, text classifier 102 receives text 222 and classifies text 222, predicting the label classification of text 222.

In this example, the text classification service provided by the text classifier 102 refers to the following linear classifier process: segmenting text comprising one or more words or other combinations of characteristics, extracting features of the word or words, assigning a weight of a label to each extracted feature, and combining the weights of predefined labels of the text to identify a score for the label. In one example, the text classifier 102 may determine a separate score for each token in the token selection and identify the predicted token from the highest scoring token. In one example, the tag may identify one or more types of classifications, such as, but not limited to, an intent of the content of the text. For example, the text classified by the text classifier 102 may represent a speech of a conversation (speech) converted into text, and the intention label predicted by the text classifier 102 may represent a predicted intention of an utterance based on a highest score label of the speech among a plurality of score labels for the speech.

For example, to classify text, the text classifier 102 implements a scorer 104 that extracts features from words in the rendered text by the scorer 104. Scorer 104 calls functions of model 112 to identify a weight for each extracted feature label from text 222. Based on the individually assigned weights of the extracted features, the scorer 104 may invoke functions of the model 112 to evaluate the classification of the entire text, e.g., with a particular intent of percent probability.

For example, the features extracted by the text classifier 102 include, but are not limited to: unigram-based features, bigram-based features, part-of-speech-based features, word-based (term-based) features, such as entity-based features or concept-based features, average pooling of word-embedding features, and maximum pooling of word-embedding features. For example, in a text phrase "i am a student at university" unary grammatical features may include "i", "is", "student", "at", "university" and "a", and bigram grammatical features may include "university a".

For example, the text classifier 102 may perform a linear classification of text, where the ranking score S for each token I is a weighted sum of a combination of all extracted features, e.g., based on the following equation:

S_I(U)＝f₁(U)w_I1+f₂(U)w_I2…+f_k(U)w_IK+b_I，

wherein U is U₁u₂u₃…u_NIs a test example, u_nIs a word in the test example, f_k(U) is an extracted feature. In one example, the extracted features may be of one or more types of features extracted from certain words or terms inside the text. In this example, w_IKIs the model parameter for the kth feature of a given token I. In this example, b_IReflects the contribution from the filler word, e.g., "bias", which reflects the inherent preference for intent without considering any input word.

In one example, the text classifier 102 represents a text classification model that can be viewed as a black box by the client 120, where text is applied to the text classifier 102 and classification predictions are output from the text classifier 102, but the trained data patterns and functions applied by the text classifier 102 are not visible to the client 120 that requires text classification services from the text classifier 102. To protect the underlying data objects created in model 112, entity deployment model 112 may specify one or more protection layers for allowing the functionality of model 112 to be used when deployed, but protecting the trained data patterns of the data objects of model 112.

In this example, in response to submission of text 222 from client 120, text classifier 110 returns a classification 224 as determined by text classifier 102. In this example, the classification 224 may include a label 226. The labels 226 may include specific classification labels and may include additional values, such as, but not limited to, score probabilities calculated for the classification labels. Additionally, the indicia 226 may include a plurality of indicia.

In one example, in response to receiving classification 224, test controller 208 compares label 226 to an expected label for text 222. In one example, test set 220 includes expected tags for text 222. In one example, if the flag 226 does not match the expected flag of the text 222 in the test set 220, the test controller 208 may trigger an output to the user through the user interface 240 to indicate the exception.

In one example, based on the detected anomaly, the user may choose to adjust the selection of one or more words assigned one or more labels in the training data set 238 within the user interface 240. The user may select within the user interface 240 to ask the training set controller 250 of the test controller 208 to send the training data set 238 to the text classifier 102 for additional training (as shown by training set 252), and to update the truth training set 108 with the training data set 238 to maintain a complete training set for training the text classifier 102. In this example, by having client 120 submit additional training data in training data set 238, client 120 may improve the accuracy of predictions performed by text classifier 102, yet need to support user identification at user interface 240 of which data to include in training data set 238 for training text classifier 102 is likely to resolve the anomaly and improve the accuracy of predictions made by text classifier 102.

The accuracy of the model 112 in performing text classification may be a function of the time value and the resources applied in creating, training, evaluating, and debugging the model to train the text classifier 102 to accurately classify text. In one example, the amount and distribution of labeled training data used to train the model 112 may significantly affect the reliability of the model 112 to accurately classify text. Although the client 120 relies on the accuracy of the text classifier 102 as a measure of the quality of the model 112, the quality and performance of the text classifier may vary widely between models, and there is no uniform measure for measuring the quality of the text classifier model or publicly available uniform training data set that yields the same accuracy measure when used to train the model.

Additionally, when text classifier 102 is implemented as a black box provided by text classifier service 110 and client 120 receives the classification label in classification 224, but the classification label is incorrect, information needs to be provided to client 120 about the choice why text classifier 102 incorrectly classified the text, while the underlying data objects in model 112 are not yet disclosed to client 120, to enable evaluation of what type of training data set is needed for potentially improving the classification accuracy of text classifier 102. For example, if client 120 submits the text 222 of the phrase "how will you help me" to text classifier service 110? (how you going to help me? "classified as" greeting "instead of" capability ".

In particular, in addition to providing the classification labels themselves, the client 120 needs to be provided with information regarding why the text classifier 102 incorrectly classified the selection of text, so that a user monitoring the text classification service received by the client 120 can determine additional training data in the training data set 238 that was sent to the text classifier 102 that is likely to train the text classifier 102 of the selection to correctly classify the text. In particular, it may be difficult for a user to attempt to determine the cause of an anomaly based on only one or both of the labels 226 in the classifications 224 and the training data set submitted by the user in the training data set 238, as multiple combined factors may cause a classification anomaly. The first factor is a small variation within the training data set 238, and feature debugging may substantially alter the classification predictions performed by the text classifier 102. In particular, the text classifier 102 may be trained to determine the class of the text string based on large scale features (e.g., more than 1000 features) extracted internally from the training instance, where the features used to train the model 112 are transparent to the user and the weights of the features are inherently determined by the training process applied to train the text classifier 102. The second factor is that based on the limited nature of the selection of labeled training data used to train the model 112 and the selection of labeled training data for a particular domain of the topic, the model 112 may be overfit over some undesired symbols (tokens) or words for that domain. A third factor is that different classes of features may make the impact of a particular word on the final decision unclear. For example, some features are lexical based, such as unigram and bigram, and other features are not lexical based but are related to the lexicon, such as word embedding and entity type.

For example, considering a second factor, if the system is based in part on word-level representation features because the words "want" and "need" have similar semantic meanings but different lexical meanings, the training string for the classification intent label with "order" may include a large occurrence of the word "want" for other classification intent labels, such that the text classifier 120 may incorrectly predict that the text input of "i need to delete an order" has the classification label of "order" instead of the correct classification label of "delete an order". Considering the first and third factors, classification issues are based on a single word, but identifying the particular word in "i need to delete an order" that caused the misclassification may be challenging based only on test results, and the anomaly may disappear or reappear if additional training is performed that adjusts the total number and type of training utterances used to train model 112, without the user having to create training data that identifies the particular word that generated the anomaly.

According to an advantage of the present invention, an anomaly visualization service is provided to facilitate user understanding of particular words that cause text classification anomalies for the text classifier 102 at the client application level. In particular, according to an advantage of the present invention, the anomaly visualization service performs error analysis of the test set at the text classifier level and provides visual analysis and prompting of information about errors at the application level, thereby assisting the user in refining the training data set 238 for further training of the text classifier 102. In one example, visual analysis and cues may be represented in one or more heat maps, where a heat map applies one or more colors at one or more intensities, and the intensity of a color applied to a word represents the relative weight of that word in its contribution to a particular classification label.

While embodiments described herein relate to a visual heat map output at a user interface as a graphical representation of data representing different values using a system of color coding and color weights, in additional or alternative embodiments, the visual heat map output may be represented in an output interface by other types of output detectable by a user, such as, but not limited to: tactile output of visual indicators in a visual heat map, auditory output of visual indicators in a visual heat map, and other outputs that enable a user to detect different scoring weights for a word. Additionally, in additional or alternative embodiments, the visual heat map output may be represented by a graphically represented numerical value in addition to or instead of color, where the numerical value represents a percentage or other weighted numerical value.

In one example, the anomaly visualization service includes a word-level analysis component 232 implemented at a classifier level with the text classifier 102, a word-level heatmap component 234 implemented at a client application level of the client 120, and a word-level heatmap 236, k preferred word heatmaps 242, and a training data set 238 implemented at a user interface level within a user interface 240. In additional or alternative embodiments, the anomaly visualization service may include additional or alternative functionality and data components.

In one example, the word-level analysis component 232 is implemented at the same layer as the text classifier 102 or incorporated in the text classifier 102 for computing one or more heat map values for the text 222 and one or more heat map values for the classification labels included in the real situation training set 108. In this example, classification 224 is updated by word-level analysis component 232 with one or more heatmap values determined by word-level analysis component 232 with markers 226, shown as heatmap values 228. In one example, each of the heat map values 228 may represent one or more weighted values (e.g., without limitation, percentages and colors), and may be identified with or correspond to one or more symbols (e.g., words), or may be ordered to correspond to particular words in a sequence.

For example, the word-level analysis component 232 can determine the heatmap values 228 by: decomposing the score calculated for each extracted feature into each word or other symbol, andeach decomposed score is assigned as a heat map value that directly reflects the contribution of the word to the final score of the intent classification. For example, when model 112 is a trained model, all w_IKIs stationary. As previously described, the linear model applied by model 112 for text classification of text 222 (denoted as U) uses a text f_k(U) a weighted sum of the extracted various features, and then obtaining a ranking score S for each token I, for example, by:

S_I(U)＝f₁(U)W_I1+f₂(U)W_I2…+f_k(U)w_IK+b_I

for all types of features used in the text classifier 102, the word-level analysis component 232 traces back and determines which words contribute to extracting features. By accumulating all feature scores belonging to each symbol, the word-level analysis component 232 will f_I(U) decomposes to each word as follows:

S_I(U)＝S′_I(u₁)+S′_I(u₂)...+S′_I(u_N)+b_I

in this example, S'_I(u_N) Is used as a heat map value that directly reflects the contribution of the word to the final score of intent I. In particular, in this example, given the test case text 222, the sum of the scores of all words on the heat map is exactly the score used to calculate the tag confidence, so the word-level score directly reflects the importance of each word in calculating the final intent tag confidence.

In one example, in response to the text 222, the word-level heat map controller 234 receives the classification 224 with the markers 226 and the heat map values 228 and generates a visible graphical representation of the heat map values for the text 222 in a word-level heat map 236. In one example, the word-level heatmap controller 234 sequentially applies each percentage or color value in the heatmap values 228 to words or other symbols identified in the text 222. In one example, the word-level heat map 236 may reflect different heat map values by different colors assigned to the different heat map values, by different shades assigned to different percentages in the heat map values, and by other visually discernable output indicators assigned to the different heat map values. In another example, the word-level heat map 236 may reflect different heat map values through other types of output interfaces, including but not limited to auditory and tactile interfaces for adjusting output levels or types to identify different heat map values.

In one embodiment, the word-level heatmap 236 is described with reference to an advantage of the present invention in that the relevance of each word or other symbol in a text sequence to a predictive marker 226 is shown. In another embodiment, the word-level heat map 236 may include additional types of visual indicators of relevance, such as visualizing a comparison of the relevance of each word or symbol in the text sequence for the predictive markers with the relevance of each word or other symbol in the text sequence for the real-world markers. In particular, in this example, the word-level heat map controller 234 may access a real-condition heat map of a sentence related to the text 222 and a desired tag for the sentence, and output the word-level heat map 236 with a visually represented comparison of the real-condition heat map to a heat map generated based on the heat map values 228 for the text 222 and the predictive tags 226. In one example, the text classifier 102 may provide real situation heat map values in the classification 224. In another example, the test controller 208 may store the heat map generated from the classifications 224 in response to the text 222 including the training set of real conditions 108. Additionally, the test set 220 may include a user-generated real-world heatmap.

In one example, the word-level heat map controller 234 initially generates one or more tokens and one or more words in the training data set 238 based on analyzing values in the word-level heat map 236. In one example, a user may manually adjust entities in the training data set 238 based on examining the word-level heat map 236 and ask the training set controller 250 to send the training data set 238 to train the text classifier 102. In one example, the training set controller 250 also updates the truth training set 108 with the training data set 238 to reflect the training data currently used to train the model 112 in the text classifier 102.

In one example, in addition to analyzing words in text 222, word-level analysis component 232 also analyzes the weight of each word under each token identified with respect to the intent tested by test set 220. For example, the word-level analysis component 232 stores a sum of the word-level scores identified for each word in each of the intentions predicted by the test set 220 in terms of word-level scores by the intentions 234. Based on the k preferred scoring terms ordered for the particular order of intent in the term scores of the intents 234, the term analysis component 232 identifies the k preferred important terms for the particular intent label, where k can be set to any value, such as "10". The word level analysis component 232 returns the ranked k preferred important words for a particular intent tag 226 in the k preferred heatmap list 229 in the sequentially ranked list with k preferred scored words. Additionally, the k-top heat map list 229 may include heat map values, such as percentages or colors, assigned to each word in a sequentially ordered list representing the relative score of each word with other words and which is related to the predicted intent.

In this example, in response to receiving the token 226 having k preferred heat map lists 229, the word-level heat map controller 234 generates k preferred word heat maps 242, outputs the token 226 and the k preferred lists and visually highlights each word in the k preferred lists having heat map attributes, such as color and percentile scale, to visually indicate the relative score of each word with respect to predicted intent. According to an advantage of the present invention, the k preferred word heat map 242 provides a visual representation of the weights of the words trained for the predictive intent in order to assist the user in visually assessing whether there are words in the k preferred words of the predictive intent markers that should be ranked higher or lower for the predictive intent markers. Further, the k preferred words heat map 242 provides a visual representation of the weights of the words trained for the desired intent to assist the user in visually assessing whether there are more or less highly ranked words in the k preferred words for the desired intent markers. Within the interface that provides a visual representation of the weights of the words trained for incorrectly predicted intentions and expected intentions in the k preferred heat maps 242, the user is also provided with an interface in which the training data set 238 is selectively modified to increase or decrease the words assigned to the predicted intent markers and the expected intent markers.

In accordance with an advantage of the present invention, the word-level heat map 236 and the k preferred word heat maps 242 collectively provide the user with a visual representation of the particular words and their semantically corresponding words that are most likely to cause an anomaly to facilitate the user's selection of the most likely training text classifier 102 within the training data set 238 to improve prediction accuracy. For example, the word-level heatmap 236 visually identifies one or more words that have the highest contribution to the predicted intent in the test string to prompt the user for additional trained problem words in the incorrectly predicted test string, and the k-preferred word heatmap 242 visually identifies responsive semantically-associated words associated with the incorrectly predicted tokens and the expected tokens to prompt the user for weights of the additional trained problem words in the incorrectly predicted token training and the expected token training.

In accordance with an advantage of the present invention, by providing a word-level heatmap visualization via heatmap values 228 and k preferred heatmap lists 229, via an anomaly visualization service provided by functionality, and via a visual representation provided by word-level analysis component 232, word-level heatmap controller 234, word-level heatmap 236, and k preferred word heatmaps 242, a user of text classifier 102 is minimized from the time and effort at the word level of understanding why text classifier 102 generated a particular label for a particular test phrase and which words of the test phrase contributed most to a text classification decision without disclosing the underlying data objects of model 112. In this example, the user may review a visualization of scores for particular words within the text 222 that contribute to the tag classifications in the word-level heat map 236 and effectively determine which words or words are more relevant to each tag of the test phrase and whether the relationship is correct or reasonable to determine which words require additional training. Additionally, in this example, the user may review a visualization of the score ordering of words related to a particular token in the k preferred word heat maps 242 over multiple test phrases to determine if there are words contributing to the score of the particular token that need to be adjusted.

In one embodiment, the text classifier 102 represents a linear classifier with arbitrary features, such as, but not limited to, a linear Support Vector Machine (SVM), logistic regression, and cognitive ability (concept). In another embodiment, the text classifier 102 may implement a more complex model, such as a deep learning model, however, in accordance with an advantage of the present invention, the functionality of the anomaly visualization service does not require the more complex model environment of the deep learning model, but detecting multiple weights applied to different symbols in the text string by a linear classifier is applicable. Additionally, in one embodiment, the text classifier 102 represents a linear classifier that determines scores based on a sum of separately weighted scores of extracted features, and the word-level analysis component 232 is described with respect to directly decomposing the extracted feature scores that determine the final label prediction to describe how each word or phrase in the text affects the final label output, however, in additional or alternative embodiments, the model 112 may also learn additional attention variables that are generated as ancillary data that may or may not affect the final label prediction scores.

FIG. 3 illustrates a block diagram of one example of a word-level analysis element evaluated by a word-level analysis component at the text classifier level.

In this example, all weights are fixed for the trained text classifier model, as shown at reference numeral 302. In one example, in response to a text phrase M having three words u1, u2, and u3 (as indicated at reference numeral 304), the text classifier 102 classifies the text phrase M by predictive markers X (as indicated at reference numeral 322). In this example, the words u1, u2, and u3 may each represent a single word or a phrase having multiple words. In one example, each of the words u1, u2, and u3 may each be referred to as a symbol.

In this example, to determine a label score 310 for the predictive label X, the text classifier 102 sums the weighted scores for each extracted feature. For example, the marker score X310 is the sum of the product of the extracted feature 312 and the weight 314, the product of the extracted feature 316 and the weight 318, and the bias 320. In one example, the text classifier 102 may extract the same number of features from the test phrase as the number of words or may extract fewer or more features from the test phrase than the number of words.

In this example, the word-level analysis component 232 decomposes the extracted feature product used to calculate the token score X310 to determine a per-word feature score, shown by feature score 326 per word (u1), feature score 327 per word (u2), and feature score 328 per word (u3), that together with the bias 330 sum to the token score X310. For example, in decomposing the extracted feature products, the word-level analysis component 232 can determine S by_I(U) to recover the original classification score,

where I denotes an intention mark, I denotes an extracted feature index, u denotes a feature weight, and w is a contribution sign of each feature. For multi-symbol features, the score for each symbol is averaged.

In this example, the word-level analysis component 232 selects a heat map value for each score by word, as shown at reference numeral 332. For example, the word-level analysis component 232 assigns a heatmap value a 344 to the feature score 326 by word (u1), a heatmap value B346 to the feature score 327 by word (u2), and a heatmap value C348 to the feature score 328 by word (u 3). In this example, the word-level analysis 232 outputs a classification (as shown at reference numeral 350) having a token X and a heatmap value a, a heatmap value B, and a heatmap value C, wherein the sequential order of the heatmap values in the classification corresponds to the order of the words u1, u2, and u3 in the test phrase M.

In this example, for each test phrase in the test set 220, the word-level analysis component 232 updates the record for token X in the word-level score by intent 234 as shown by a token X sum 360. In this example, token X sum 360 includes each word contributing to token X and an aggregate score of all scores for the predicted intent of test set 220, including aggregate score 364 for word U1362, aggregate score 368 for word U2366, and aggregate score 372 for word U3370. In this example, the word-level score by intent 234 includes a record for each intent marker detected for the test phrase in the test set 220.

In this example, based on the total sum of tokens X360 over the plurality of test phrases in the test set 220, the word level analysis component 232 orders the k preferred words of token X by the total score from the one or more test phrases, as shown at reference numeral 380. Next, as indicated at reference numeral 382, word level analysis component 232 assigns a heatmap value to each of the k preferred words by summing the scores (382), and as indicated at reference numeral 384, word level analysis component 232 outputs a list of the k preferred words with the heatmap values.

FIG. 4 shows one example of a table illustrating examples of types of extracted features that are decomposed for determining per-word feature scores.

In one example, the text classifier 102 may support multiple types of feature extraction from among any type of features that may be decomposed into words. In one example, the text classifier 102 supports word-level features such as unigram and part-of-speech (POS). In another example, the text classifier 102 supports word features, such as entity-based features, concept-and word-based features, bigram features, and trigram features. In another example, the text classifier 102 supports letter-level n-gram features. In addition, the text classifier 102 supports maximum (average) pooling of word-embedded features or pre-trained CNN or bilSTM features.

In this example, table 402 shows an example of a feature type extracted from a text string, a symbol applied to the feature type, and an example of a score determined for the symbol. For example, table 402 includes: a column identifying a feature type 410, a column identifying a contribution symbol 412, and a column identifying a score s (u) 414.

In the first example of table 402, for the feature type of the unary grammar 420, the identified contributing symbol is "me" 422, and the score is assigned to "0.4" 424. In one example, the feature f from the feature type for the unary grammar 420_k(U) by-word feature score S'_I(u_N) Can be decomposed according to the following formula:

S′_I(u_N)＝S′_I(u_N)+f_k(U)w_IK。

in the second example of table 402, for the feature type of bigram 430, the contributing symbol identifier is "My Yes" and a score is assigned as "0.4" 434, which is the same as the score assigned to the symbol "I" 422. In one example, the features f from feature types for bigram 430, and for any word-based multi-word features_k(U) by-word feature score S'_I(u_N) Can be calculated according to the following equation:

S′_I(u_N)＝S′_I(u_N)+f_k(U)w_IK/|L|。

in one example, L is the length of the word, so the score of a feature is divided equally to each of the words. For example, the length of "my is" 2 ", so the feature product score for the extracted feature" my is bisected in "me" and "yes".

In the third example of table 402, for the feature type of part-of-speech POS-PP 440, the contributing symbol identifier is "from" and a score is assigned as "0.5" 444, which is a higher score than the scores assigned to the symbol "i" 422 and the symbol "i is" 424. In one example, the features f from the feature type for the part-of-speech preposition phrase (POS-PP)440 may be determined by tagging the POS tag of each word using a POS tagger and then treating the particular POS tag as a feature (contributed by the particular word)_k(U) by-word feature score S'_I(u_N)。

In the fourth example of table 402, for the feature type of entity 450, the contributing symbol identifier is "city name a" 452, where "city name a" may identify a particular city name, and a score is assigned as "0.7" 454, which is a higher score than the score assigned to the previous symbol. In one example, features f from feature types for entity 450, and for any other entity-based or concept-based multi-word features_k(U) by-word feature score S'_I(u_N) Can be calculated according to the following equation:

S′_I(u_N)＝S′_I(u_N)+f_k(U)w_IK/|L|。

in the fifth example of table 402, for the feature type or dimension of the average word vector 460, the contributing symbol identifier is "avg-w 2 v-I" 462, which represents the average vector of all word vectors for words in the sentence, where the average vector has a numerical value. For example, for deep learning, a set of word vectors for a vocabulary word (vocabulary word) may be pre-trained with a large corpus (e.g., a wiki corpus) and used as a fixed input vector for each vocabulary word. In this example, the score is assigned to "-0.27" 464, which is a lower score than the score assigned to the previous symbol. In one example, the features f from the feature type for the average word vector 460_k(U) by-word feature score S'_I(u_N) Can be based on all u_NIs calculated, wherein the score for each word in the sequence is proportionally assigned back to each word in the sequence based on the value of that word in the embedding dimension. The average of the word vectors for each word in the sentence is then used to obtain the type of sentence-level feature.

In the sixth example in table 402, for the maximum word vector 470 feature type, the contributing symbol identifier is "max-w 2 v-I" 472 and a score is assigned to "0.45" 474. In one example, feature f from the maximum word vector 470 feature type_k(U) by-word feature score S'_I(u_N) Can be based on all u_NIs calculated by the maximum value of the word embedding feature of (a), wherein the score of the feature is assigned back to only one word u_NThe word has a maximum in the embedding dimension.

In a seventh example, in table 402, for a feature type of a feature at the character/letter level, such as an alpha-trigram 480, the contributing symbol identifier is "from" 482 and a score is assigned to "0.4" 484. In this example, the word u_NThis has two letter-trigram features "thi" and "his", where each feature includes information from the word u_NThree sequential characters (characters). In one example, the user is directed to an alpha-trigramFeature f of 480 feature type_k(U) by-word feature score S'_I(u_N) Can be calculated according to the following equation:

S′_i(u_N)＝S′_i(u_N)+f_k(tri₁)w_iK+f_k′(tri₁)w_iK′

in this example, k and k' may represent the first and second letter-to-trigrams, respectively.

FIG. 5 illustrates a block diagram of one example of a word-level heat map reflecting real-situation heat maps compared to a test heat map based on test phrases tested on a trained model.

In one example, a word-level heat map 236 is shown for selecting test phrases from test set 220 classified by model 112. In this example, for purposes of illustration, FIG. 5 reflects the results of testing three test phrases included in test set 220. In additional or alternative examples, test set 220 may include an additional number of test phrases.

In the first example of fig. 5, for training the real conditions 504 and test set predictions 506, the same test phrase "how are you going to help me shown under text 516 is made in the word-level heat map 236? (how will you help me. In this example, for the same test phrase, intent marker 510 is identified as "capability" 512 under training truth 504 and as "greeting" 514 under test set prediction 506. In this example, for "how will you help me" from test set 220? "and test set prediction 506 indicates the label currently predicted by text classifier 102. For example, the word-level analysis component 232 determines the classification tags "capabilities" and heat map values for words in the text 516 and outputs the tags and heat map values in the classifications 224.

Word-level heat map controller 234 visually identifies (e.g., by a percentage color level) the percentage probability of each token identified in text 516 based on the heat map values returned in classification 224. For purposes of illustration, in this example, the color percentages shown for color 518 are illustrated by color intensity numbers (scale of 0 to 5), each number in the scale reflecting a different shade or different color that may be applied to each symbolized portion of the text phrase. In this example, a "0" in the color 518 may reflect no chroma, and a "5" in the color level 518 may reflect 100% chroma.

For example, for text 516, the real situation intent marker "capability" 512 is shown to be visually affected by the words "you" and "help" reflecting the highest intensity "4," with the words "you" and "help" indicating more capability than the greeting. Instead, the predictive intent "greeting" 514 is shown as being visually affected by the word "are you" reflecting the highest chroma "5" and the preceding word "how" reflecting the next highest chroma "3", where the word "how" and "you will" are more indicative of a greeting than the capabilities. In this example, by visually displaying the symbolic scores as a heat map predicted for training the real-life situation and test set, the user is able to visually understand that the current system gives more preference to the word "you will" rather than "help me". In this example, the symbol "help me" is intuitively related to how the customer service solves the requestor's problem, to the intent "capabilities", rather than the customer selecting to greet the customer service system. In this example, by visually displaying the symbol scores as a heat map for the predictions, the user may choose to adjust the training data set 238 to include a score for the phrase "how will you help me? "to improve the symbolic score of" help "and other semantically corresponding words (e.g.," help "and" assistance (helps) ") in the same text phrase as" your will "for the intention" ability ". Additionally, in this example, for "how will you help me? "the user may selectively adjust the training data set 238 to reduce the occurrence of the phrase" your will "when it occurs with" help "in the mispredicted intent" greeting "and to enhance its occurrence in training the real situation intent" capability ".

In the second example of fig. 5, the same test phrase "I am feeling good thanks" shown under text 546 is visualized in the word-level heat map 236 for training the real-world situation 504 and the test-set predictions 506. In this example, for the same test phrase, the intent marker 540 is identified as "greeting" 542 under the training truth 504 and "thank you" 544 under the test set prediction 506. In this example, word-level heat map controller 234 visually identifies (e.g., by a percentage color level) a percentage probability for each word symbol identified in text 546 based on the heat map values returned in classification 224. For example, for text 546, the real-world intent marker "greeting" 542 is shown to be visually affected by the words "feeling" and "good" reflecting the intensities "3" and "4," with the words "feeling" and "good" indicating the greeting more than thank you for. Conversely, the prediction intent "thank you" 544 is shown as being visually affected by the word "thanks" reflecting the highest shade of "5", where the word "thank you" is more indicative of thank you than the greeting. In this example, by visually displaying the symbolic scores as a heat map predicted for the training truth and test set, the user is able to visually understand that the current system gives more preference to the word "thank you" rather than "feeling good". In this example, the symbol "nice feeling" is intuitively related to how the customer greets the customer service, to the intent "greetings," rather than the customer selecting to thank the customer service system. In this example, by visually displaying the symbolic score as a heat map for the prediction, the user may choose to adjust the training data set 238 to include additional training for the phrase "i feel well thank you," thereby increasing the symbolic scores for "feel" and "well" and other semantically corresponding words (e.g., "feel" and "not wrong (well)") for the intent "greeting" in the same text phrase as "thank you. Additionally, in this example, for anomalies in the test of "i feel well thank you," the user may selectively adjust the training data set 238 to reduce the occurrence of the phrase "thank you" and increase its occurrence in training the real situation intent "greeting" when occurring with "feel" and "well" in the mispredicted intent "thank you.

In the third example of FIG. 5, the same test phrase "dial the home number" shown under text 576 is visualized in the word-level heat map 236 for the training truth 504 and the test set forecasts 506. In this example, for the same test phrase, the intent tag 570 is identified as "phone" 572 under the training truth 504 and as "venue" 574 under the test set prediction 506. In this example, word-level heat map controller 234 visually identifies (e.g., by a percentage color level) a percentage probability for each word symbol identified in text 576 based on the heat map values returned in classification 224. For example, for text 576, the real situation intent marker "phone" 572 is shown as being visually affected by the words "dial" and "number" reflecting the intensities "4" and "3", where the words "dial" and "number" are more indicative of phone commands than location commands. In contrast, the prediction intent "venue" 574 is shown to be visually affected by the word "home" reflecting the highest chroma "5", where the word "home" is more indicative of a venue command than a phone command. In this example, by visually displaying the symbolic scores as a heat map predicted for the training truth and test set, the user is able to visually understand that the current system gives more preference to the word "family" than to "dial" and "number". In this example, the symbols "dial" and "number" are intuitively related to how the customer requires phone related services, to the intent "phone," rather than the customer selecting a location. In this example, by visually displaying the symbol scores as a heat map for the predictions, the user may choose to adjust the training data set 238 to include additional training for the phrase "dial home number" to increase the symbol scores for "dial" and "number" in the same text as "home" for the intent "phone". Additionally, in this example, for anomalies in the test of "dial home phone," the user may selectively adjust the training data set 238 to reduce the occurrence of the phrase "home" when occurring with "dial" and "number" in the mispredicted intent "location" and to increase its occurrence in training the real-world intent "phone.

Fig. 6 illustrates a block diagram of one example of a word-level heat map that reflects a heat map of k preferred important words based on the tokens of the test phrases tested on the trained model.

In one example, the training set 602 reflects current training data for training the model 112 for the intent "on". For example, training set 602 includes the phrases "i need more headlights," can you turn on the radio, "" click my car lock, "" turn on headlights, "" turn on my wiper, "" turn on lights, "" lock my car door, "" close my car door, "" play music, "" play some music, "" turn on the radio now, "" turn on my backup camera, "" turn on my car lights, "" turn on my windshield wiper, "and" turn on a/C. In this example, the k top-selected significant words 610 illustrate a list of words reflected in the training set 302, ordered by significance in the predicted intent "open". In this example, the k preferred important words 610 are shown in order of importance, with the phrase "open" listed first as the most important column and the word "camera" listed last as the least important. In this example, the rank of the k preferred important words 610 is determined by the word-level analysis component 232 detecting word-level scores as intended 234 while testing the text classifier 102 against the test set 220. In particular, the word-level analysis component 232 may aggregate the scores computed for the heat map values in the word-level scores by intent 234 with respect to each word under each intent and then determine the k preferred aggregate heat map values. In another example, words of the k preferred important words 310 may be colored to visually reflect the importance or aggregate heat map values, with the most important words having the highest percentage of chroma and the least important words having the lowest percentage of chroma.

In this example, the word "door" 612 may reflect an abnormal word, with the ordering of "door" higher than expected for the intent "open," because the training set 602 includes the phrases "lock my door" and "close my door" as training data for the intent classification "open," as shown at reference numeral 604. In this example, a user viewing the k preferred important words 610 may see that the word "door" is reflected as more important than expected and adjust the training set 602 by reducing the occurrence of the outlier word "door". By reducing the occurrence of words in the training data set 238 that are labeled as outliers in the k first-choice significant words 610 and selecting to the text classifier 102 with the training data set 238 as updated, the user can mitigate potential prediction errors prior to deploying the trained classifier model.

In particular, in this example, while in the example of the word-level heat map 236 shown in fig. 5, the user receives a visual evaluation of the words of the test phrases that contribute most and least to the tagged predictions in order to quickly identify the problem words in the particular test phrase that cause the tagged predictions to be abnormal, in the example of the k preferred words heat map 242 shown in fig. 6, the user receives a visual evaluation of the semantic related words in the training corpus that are likely to cause a particular tagged prediction for the test set in order to quickly identify the problem words trained for the particular tag.

FIG. 7 illustrates a block diagram of one example of a computer system in which one embodiment of the invention may be implemented. The invention can be implemented in various systems and combinations of systems that are made up of functional components, such as the functional components described with reference to computer system 700, and that are communicatively coupled to a network, such as network 702.

Computer system 700 includes a bus 722 or other communication device for communicating information within computer system 700, and at least one hardware processing device, such as processor 712, coupled to bus 722 for processing information. Bus 722 preferably includes low-latency and high-latency paths connected by bridges and adapters and controlled within computer system 700 by multiple bus controllers. When implemented as a server or a node, computer system 700 may include multiple processors designed to improve network service power.

Processor 712 may be at least one general-purpose processor that processes data under the control of software 750 during normal operation, which may include at least one of: application software, an operating system, middleware, and other code and computer-executable programs accessible from dynamic storage devices (e.g., Random Access Memory (RAM)714), static storage devices (e.g., Read Only Memory (ROM)716), data storage devices (e.g., mass storage device 718), or other data storage media. Software 750 may include, but is not limited to: code, applications, protocols, interfaces, and processes for controlling one or more systems within a network, including but not limited to: adapters, switches, servers, cluster systems, and grid environments.

Computer system 700 may communicate with a remote computer (e.g., server 740) or a remote client. In one example, server 740 may be connected to computer system 700 via any type of network (e.g., network 702) through a communication interface (e.g., network interface 732) or over a network link that may be connected to, for example, network 702.

In this example, a plurality of systems within a network environment may be communicatively connected via a network 702, the network 702 being the medium used to provide communications links between the various devices and computer systems that are communicatively connected. For example, network 702 may include permanent connections, such as wire or fiber optic cables, as well as temporary connections made through telephone connections and wireless transmission connections, and may include routers, switches, gateways, and other hardware to enable communication channels between systems connected via network 702. Network 702 may represent one or more of the following: packet-switched based networks, telephony-based networks, broadcast television networks, local and cable area networks, public networks, and restricted networks.

The network 702 and the systems communicatively connected to the computer 700 via the network 702 may implement one or more layers of one or more types of network protocol stacks, which may include one or more of a physical layer, a link layer, a network layer, a transport layer, a presentation layer, and an application layer. For example, the network 702 may implement one or more of the following: a transmission control protocol/internet protocol (TCP/IP) protocol stack, or an Open Systems Interconnection (OSI) protocol stack. Additionally, for example, network 702 may represent a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another. The network 702 may implement a secure HTTP protocol layer or other secure protocol for secure communications between systems.

In this example, network interface 732 includes an adapter 734 for connecting computer system 700 to network 702 over a link and for communicatively connecting computer system 700 to server 740 or other computing systems via network 702. Although not depicted, the network interface 732 may include additional software (e.g., device drivers), additional hardware, and other controllers that enable communication. When implemented as a server, computer system 700 may include multiple communication interfaces accessible, for example, via multiple Peripheral Component Interconnect (PCI) bus bridges connected to an input/output controller. In this manner, computer system 700 allows connections to multiple clients via multiple separate ports, and each port may also support multiple connections to multiple clients.

In one embodiment, the operations performed by the processor 712 may control the operations of the flow diagrams of FIGS. 8-13 and other operations described herein. The operations performed by the processor 712 may be requested by the software 750 or other code, or the steps of an embodiment of the invention may be performed by specific hardware components that contain hardwired logic for performing the steps, or by a combination of programmed computer components and custom hardware components. In one embodiment, one or more components of computer system 700 or other components that may be integrated into one or more components of computer system 700 may include hardwired logic for performing the operations of the flow diagrams in fig. 8-13.

Additionally, computer system 700 may include a number of peripheral components that facilitate input and output. These peripheral components are connected to a plurality of controllers, adapters, and expansion slots, such as input/output (I/O) interface 726, coupled to one of the multiple levels of bus 722. For example, input devices 724 may include a microphone, a camera device, an image scanning system, a keyboard, a mouse, or other input peripheral communicatively activated over bus 722 for controlling inputs, e.g., via I/O interface 726. Additionally, for example, output devices 720 communicatively enabled for control output over the bus 722 via the I/O interface 726 may include, for example, one or more graphical display devices, audio speakers, and tactile detectable output interfaces, but may also include other output interfaces. In alternative embodiments of the present invention, additional or alternative input and output peripheral components may be added.

With respect to FIG. 7, the present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those of ordinary skill in the art will appreciate that the hardware depicted in FIG. 7 may vary. Furthermore, those skilled in the art will appreciate that the depicted example is not meant to imply architectural limitations with respect to the present invention.

FIG. 8 illustrates a high level logic flowchart of a process and computer program for creating and training a classifier model. In one example, the process and computer program begin at block 800 and then proceed to block 802. Block 802 shows a determination whether a request to create a trained model is received from a client. At block 802, if a request to create a trained model is received from a client, the process passes to block 804. Block 804 depicts a determination whether a user selected data set is received. At block 804, if a user selected data set is received, the process passes to block 808. At block 804, if a user selected data set is not received, the process passes to block 806. Block 806 illustrates selecting a default training set of data, and the process passes to block 808.

Block 808 illustrates applying the selected training set of data as a training set of true conditions to the model to create a trained model. Next, block 810 illustrates returning the trained model indicator to the client, and the process ends.

FIG. 9 illustrates a high level logic flowchart of a process and computer program for updating a trained classifier model. In one example, the process and computer program begin at block 900 and then proceed to block 902. Block 902 illustrates a determination whether an updated training set of data is received from a client for a trained model. At block 902, if an updated training set of data is received from the client for the trained model, the process proceeds to block 904. Block 904 illustrates the training of updating the classifier model with the updated training set of data. Next, block 906 depicts returning the trained model indicator to the client, and the process ends.

FIG. 10 illustrates a high level logical flowchart of a process and computer program for analyzing predicted classifications to determine a heat map level at the word level that indicates word level contributions to the predicted classifications of test phrases and the classification tags of a trained model.

In one example, the process and computer program begin at block 1000 and then proceed to block 1002. Block 1002 depicts a determination whether a test set is received from a client for testing a trained model. At block 1002, if a test set is received from the client for testing the trained model, the process proceeds to block 1004. Block 1004 shows running the test set on the trained model. Next, block 1006 illustrates identifying predicted category labels and scores for each test set phrase in the test set. Thereafter, block 1008 illustrates decomposing the extracted features aggregated in the token scores into word-level scores for each word in each test set phrase. Next, block 1010 illustrates assigning a heat map value to each word-level score for each word in each testset phrase. Thereafter, block 1012 shows storing the assigned heatmap values by test set phrase and token. Next, block 1014 depicts aggregating the word-wise word-level scores for each token predicted for the test set. Thereafter, block 1016 depicts identifying each tagged k top word in decreasing order based on the aggregated word-level scores for each tag. Next, block 1018 illustrates assigning heatmap values based on the word-level scores to the k preferred words in each tagged list. Thereafter, block 1020 illustrates returning the predicted category labels to the client along with the corresponding heat map values ordered by the test set phrase and the k preferred words with heat map values for the predicted category labels.

FIG. 11 depicts a high level logic flowchart of a process and computer program for outputting a predicted classification with a visual indicator having an impact on the predicted classification based on the corresponding word level heat map level that most impacts the classification label.

In one example, the process and computer program begin at block 1100 and then proceed to block 1102. Block 1102 illustrates a determination whether predicted classification tags and word-level heat map values per test set phrase are received from a text classifier. At block 1102, if predicted classification tags and word-level heat map values per test set phrase are received from the text classifier, the process passes to block 1104. Block 1104 illustrates aligning the classification labels and the heat map values sorted by test set phrase with corresponding test set phrases in the submitted test set. Next, block 1106 depicts accessing (if available) the real situation heat map value evaluations and expected classification labels associated with each test set phrase in the submitted test set. Thereafter, block 1108 depicts identifying the selection of the submitted test set phrase with the returned classification label that does not match the expected label of the test set phrase, indicating an anomaly. Next, block 1110 illustrates outputting, in the user interface, a graphical representation of the selection of the submitted test phrase with the returned classification label and the visual indicator at the word level based on any corresponding word level heat map values as compared to the visual indicator at the word level based on the corresponding real condition heat map values and real condition classification labels, and the process ends.

FIG. 12 illustrates a high level logic flowchart of a process and computer program for outputting a predicted classification with a visual indicator having an impact on the predicted classification based on a k-preferred word list of words that are trained to focus on the most influential classification labels according to the respective k-preferred heat map levels.

In one example, the process and computer program begin at block 1200 and then proceed to block 1202. Block 1202 depicts a determination whether one or more predicted k-preferred word lists with k-preferred heat map values by category are received from a text classifier. At block 1202, if one or more predicted k-preferred word lists with k-preferred heat map values by category are received from the text classifier, the process passes to block 1204. Block 1204 illustrates identifying a training set of class labels corresponding to each k-top word list and heat map values for the class labels. Next, block 1206 depicts a determination whether a list of k preferred words having word-level heat map values per test set phrase is received.

At block 1206, if a list of k preferred words with a word-level heat map value per test set phrase is received, the process passes to block 1208. Block 1208 illustrates a selection of a submitted test set phrase that identifies returned classmates that do not match the truth classmates and corresponding selections of the truth classmates and returned classmates. Next, block 1210 depicts outputting a graphical representation of the selection of the k-top word list with a visual indicator in the user interface based on each respective heat map value for selecting the respective real condition category label and the returned category label, and the process ends.

Returning to block 1206, if a list of k preferred words with a word-level heatmap value per test set phrase is not received, the process passes to block 1212. Next, block 1212 illustrates outputting a graphical representation of the one or more k preferred word lists with visual indicators in the user interface based on each respective heat map value, and the process ends.

FIG. 13 illustrates a high level logic flowchart of a process and computer program for supporting updated training of a text classifier that highlights class label training for identified anomalies.

In one example, the process and computer program begin at block 1300 and then proceed to block 1302. Block 1302 illustrates displaying an editable training set in a user interface for additional training of a trained model. Next, block 1304 illustrates visually highlighting within the editable training set one or more classification label pairs identified as truth classification labels and predictive labels for the identified anomalies. Thereafter, block 1306 illustrates a determination of whether the user has selected to edit the training set and send the training set to the text classifier. At block 1306, if the user chooses to edit the training set and send the training set to the text classifier, the process passes to block 1308. Block 1308 depicts sending a request to the text classifier for a training set and for training to update the text classifier with the training set, and the process ends.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of one or more embodiments of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

While the invention has been particularly shown and described with reference to one or more embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A method comprising the steps of:

in response to running at least one test phrase on a pre-trained text classifier and identifying individual predictive classification tags based on a score calculated for each respective at least one test phrase, decomposing, by a computer system, a plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase;

assigning, by the computer system, a separate heatmap value to each of the plurality of word-level scores, each respective separate heatmap value reflecting a weight of each of the plurality of word-level scores; and

outputting, by the computer system, the individual predictive classification label and each individual heat map value reflecting the weight for each word-level score of the plurality of word-level scores for use in defining a heat map identifying contributions of each word of the at least one test phrase to the individual predictive classification label.

2. The method of claim 1, further comprising the steps of:

aggregating, by the computer system, the plurality of word-level scores word-wise for each individual predicted category label of a plurality of category labels in response to running the at least one test phrase;

identifying, by the computer system, for each individual predictive classification label, a preferred word list of the plurality of words in decreasing order from a highest aggregated by word score; and

outputting, by the computer system, the individual predictive classification label, each individual heat map value, and the top word list for each respective individual predictive classification label.

3. The method of claim 1, further comprising the steps of:

calculating, by the computer system, a score for individual predictive classification labels based on weighted sums of multiple combinations of individual extracted features of the plurality of features and weighted model parameters fixed in the pre-trained text classifier.

4. The method of claim 1, wherein the step of decomposing, by the computer system, the plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase further comprises the steps of:

decomposing, by the computer system, the plurality of extracted features comprising one or more of: unary grammar-based features, term-based features, average pooling of word embedding features, maximum pooling of word embedding features, and character-level features.

5. The method of claim 1, further comprising the steps of:

initiating, by the computer system, a text classifier model;

training, by the computer system, the text classifier model by applying a training set having a plurality of training phrases;

deploying, by the computer system, the text classifier model as the pre-trained text classifier for client testing; and

in response to receiving the at least one test phrase from the client, running, by the computer system, the at least one test phrase on the pre-trained text classifier.

6. The method of claim 1, wherein the step of outputting, by the computer system, the individual predictive classification label and each individual heat map value reflecting the weight for each word-level score of the plurality of word-level scores for providing a heat map identifying contributions of each word of the at least one test phrase to the individual predictive classification label further comprises the steps of:

outputting, by the computer system to a client, the individual predictive classification label and each individual heat map value reflecting the weight for each word-level score of the plurality of word-level scores, wherein the client outputs each individual heat map value in a user interface for graphically representing the weight for each word-level score to identify a contribution of each word of the at least one test phrase to the individual predictive classification label.

7. The method of claim 1, wherein the step of outputting, by the computer system, the individual predictive classification label and each individual heat map value reflecting the weight for each word-level score of the plurality of word-level scores for providing a heat map identifying contributions of each word of the at least one test phrase to the individual predictive classification label further comprises the steps of:

outputting, by the computer system to a client, the individual predictive classification label and each individual heatmap value reflecting the weight for each of the plurality of word-level scores, wherein the client determines whether each individual predictive classification label matches an expected classification label for the client to assess text classification anomalies.

8. A computer system comprising one or more processors, one or more computer-readable memories, one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising:

program code operable to perform the steps of the method according to any one of claims 1 to 7.

9. A computer program product comprising a computer readable storage medium having program instructions embodied therein, the program instructions being executable by a computer to cause the computer to perform the steps in the method according to any one of claims 1 to 7.

10. A computer system comprising means for performing the steps of the method according to any one of claims 1 to 7.

11. A method comprising the steps of:

submitting, by the computer system, a plurality of test phrases to a text classifier;

receiving, by the computer system, a plurality of classification tags from the text classifier, each classification tag comprising one or more respective heat map values, each heat map value associated with a separate word;

aligning, by the computer system, each of the plurality of categorical markers with a respective test phrase of the plurality of test phrases;

identifying, by the computer system, one or more anomalies of a selection of one or more of the plurality of taxonomy labels that are different from an expected taxonomy label of a respective test phrase of the plurality of test phrases; and

outputting, by the computer system, the selection of one or more classification markers and a graphical representation of one or more respective test phrases in a user interface with a visual indicator based on one or more respective heat map values.

12. The method of claim 11, wherein the step of submitting, by the computer system, the plurality of test phrases to the text classifier further comprises the steps of:

submitting, by the computer system, a plurality of test phrases to the text classifier trained by applying a training set having the plurality of test phrases.

13. The method of claim 11, wherein receiving, by the computer system, a plurality of classification tags from the text classifier, each classification tag comprising one or more heat map values, the step of each heat map value being associated with an individual word further comprises the steps of:

receiving, by the computer system, the plurality of classification tags from the text classifier, each classification tag including the one or more heat map values, each heat map value associated with a separate word from the text classifier, wherein, in response to running at least one test phrase on a pre-trained text classifier and identifying a separate predicted classification tag based on a score calculated for each respective at least one test phrase, the text classifier decomposes the plurality of extracted features aggregated in the score into a plurality of word-level scores for each word in the at least one test phrase and assigns a separate heat map value to each word-level score in the plurality of word-level scores, each respective separate heat map value reflecting a weight of each word-level score in the plurality of word-level scores.

14. The method of claim 11, wherein the step of outputting, by the computer system, the selection of one or more classification markers and a graphical representation of one or more respective test phrases in a user interface with a visual indicator based on one or more respective heat map values further comprises the steps of:

outputting, by the computer system, the selection of one or more classification labels and one or more respective test phrases with a visual indicator for identifying a contribution of each word in the respective selection of one or more test phrases to the respective classification label based on the one or more respective heat map values, wherein each respective heat map value reflects a weight of each word-level score in a plurality of word-level scores.

15. The method of claim 11, wherein the step of outputting, by the computer system, the selection of one or more classification markers and a graphical representation of one or more respective test phrases in a user interface with a visual indicator based on one or more respective heat map values further comprises the steps of:

accessing, by the computer system, a separate real-situation heat map value evaluation and a desired classification label associated with each respective test phrase in the plurality of test phrases.

16. The method of claim 11, further comprising the steps of:

displaying, by the computer system, an editable training set having one or more training phrases within the user interface; and

visually highlighting, by the computer system, the selection of one or more classification labels identified as the one or more anomalies within the editable training set.

17. The method of claim 11, further comprising the steps of:

receiving, by the computer system, a preferred word list of a plurality of words and each individual predictive classification tag from the text classifier, wherein the preferred word list is in decreasing order from a highest aggregate-by-word score based on an aggregate-by-word-level score; and

outputting, by the computer system, each individual predictive classification label, the one or more respective heat map values, and the list of preferred words for each respective individual predictive classification label in the user interface.

18. A computer system comprising one or more processors, one or more computer-readable memories, one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising:

program code operable to perform the steps of the method of any of claims 11 to 17.

19. A computer program product comprising a computer readable storage medium having program instructions embodied therein, the program instructions being executable by a computer to cause the computer to perform the steps in the method according to any of claims 11 to 17.

20. A computer system comprising means for performing the steps of the method according to any one of claims 11 to 17.