CN111680512B

CN111680512B - Named entity recognition model, telephone exchange extension switching method and system

Info

Publication number: CN111680512B
Application number: CN202010392261.8A
Authority: CN
Inventors: 沈燕; 陈屹峰; 戴蓓蓉; 陆炜; 王一腾; 孙璐
Original assignee: Alcatel Lucent Shanghai Bell Co Ltd
Current assignee: Nokia Shanghai Bell Co Ltd
Priority date: 2020-05-11
Filing date: 2020-05-11
Publication date: 2024-04-02
Anticipated expiration: 2040-05-11
Also published as: CN111680512A

Abstract

The invention discloses a named entity recognition model, which is based on a two-way long and short-term memory unit-conditional random field of an attention mechanism and comprises the following components: the embedded layer is a pre-trained word vector used by the model; the bidirectional LSTM layer performs feature extraction, and each word obtains a representation containing both forward and backward information; capturing word dependency relations in sentences by a self-attention layer; the full connection layer maps the outputs of the bidirectional LSTM layer and the self-care layer into a vector with one dimension as the number of output labels; the CRF layer is used for learning the dependency relationship between labels. The invention also discloses a telephone exchange switching extension method and a telephone exchange switching extension system. The named entity recognition model can quickly and accurately recognize entity information. The telephone exchange switching method/system can accurately and rapidly search the extension numbers to be contacted for the clients according to the client demands, support to simultaneously provide extension switching service for multiple clients, and provide high-quality and high-efficiency exchange switching service experience.

Description

Named entity recognition model, telephone exchange extension switching method and system

Technical Field

The invention relates to the field of communication, in particular to a named entity recognition model of a two-way long and short-term memory unit-conditional random field based on an attention mechanism. The invention also relates to a telephone exchange extension switching method and a telephone exchange extension switching system which utilize the named entity recognition model.

Background

The telephone of the general enterprise company has a telephone exchange and an extension, and the telephone exchange system can enable the enterprise to only publish one telephone number outwards, and after the telephone number is called in, each service is transferred to different extensions for answering according to the voice navigation set by the enterprise. Or when someone dials the switchboard to search for the extension number, the switchboard personnel can directly transfer telephone traffic to the corresponding extension personnel. When the caller does not know the extension number of the company, the caller can inquire about the extension number, and can directly inform the extension number to make the caller dial again. The same service may correspond to multiple extensions (operators) during this process, which may create a condition: when a customer calls a customer service for a plurality of times according to the same problem, the extension number of the person to be contacted cannot be found, and one thing can be repeated for a plurality of times, so that the experience of the customer is greatly affected. And waste of enterprise resources is caused, and the working efficiency of enterprises is reduced.

Disclosure of Invention

In the summary section, a series of simplified form concepts are introduced that are all prior art simplifications in the section, which are described in further detail in the detailed description section. The summary of the invention is not intended to define the key features and essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

The invention aims to provide a two-way long and short-term memory unit-conditional random field based on an attention mechanism, which can quickly and accurately identify a novel named entity identification model of an entity.

The invention provides a telephone exchange extension switching method capable of quickly and accurately searching extension and completing switching by using the named entity recognition model.

The invention aims to provide a telephone exchange extension switching system which can quickly and accurately search extensions and complete switching by using the named entity recognition model.

Named entity recognition (Named Entity Recognition, NER for short), also known as "private name recognition," refers to the recognition of entities in text that have a specific meaning, mainly including person names, place names, organization names, proper nouns, and the like. Simply stated, the boundaries and categories of entity fingers in natural text are identified. Named entity recognition is essentially a sequence annotation problem, which is the tagging of each character in a given text.

In order to solve the technical problems, the invention provides a named entity recognition model of a bidirectional long short time memory cell-conditional random field (Attention-Based BiLSTM-CRF) Based on an Attention mechanism, which comprises the following steps:

an embedding layer, which is a pre-trained word vector used by the model, the vector being updated continuously as the model iterates;

a bi-directional LSTM layer adapted to perform feature extraction, obtaining a representation for each word containing both forward and backward information; feature extraction is performed using a bi-directional LSTM, and a representation is obtained for each word that contains both forward and backward information. The bi-directional LSTM can be seen as a two-layer neural network, the first layer being the initial input from the right as a series, the last word from a sentence being understood as the input in text processing, the output being bh at each time step i _i While the second layer is input from the left as the beginning of the series, which can be understood in text processing as input starting from the beginning of the sentence, output at each time step i as fh _i Concatenation of hidden states of the final output layer LSTM:

h _i ＝[fh _i ，bh _i ]；

a self-attention layer adapted to capture word dependencies within a sentence;

although bi-directional LSTM can obtain forward and backward information and also has longer distance dependence than RNN, when sentence sequence is longer, LSTM cannot well retain information further away after multi-layer. The invention introduces Self-Attention (Self-Attention) mechanism to capture word dependency relationship in sentence, and calculates current hidden layer state h at each time step i _i And all hidden layer states h= [ h ] ₁ ,h ₂ ,...h _T ]T is the sequence length, then normalized to obtain similarity score alpha, and the alpha is used to weight and sum h to obtain context vector c _i The following are given in detail

The full connection layer maps the outputs of the bidirectional LSTM layer and the self-care layer into a vector with one dimension being the number of output labels, and the vector is the predictive score of the current time step i for all labels:

p _i ＝W _i ([h _i ，c _i ])+b _i

wherein W is _i And b _i Parameters required to be learned for the model obey standard normal distribution, p during initialization _i The vector output by the full connection layer is also the prediction score of the current time step i for all labels;

the CRF layer includes two types of scores, a transmit score and a transfer score. The transmission score is the probability value of mapping each word to a label, namely the output of a full connection layer, and an output matrix of a bidirectional LSTM layer is set as P, wherein P _ij Representative word x _i Mapping to tag _j ，tag _j Representing the non-normalized probability of the jth one of all the tags, wherein the j value range is 0 to the tag number-1, and the non-normalized probability is similar to the emission probability matrix in the CRF model; rotationThe shift fraction is tag _i Transfer of tag to tag _j The transition probability of the label is set to be A, A as the transition matrix _ij Representing tag _i Transfer to tag _j Is a transition probability of (a). For all possible output tag sequences y corresponding to the input sequence X, a score is defined as:

the goal is to learn a set of conditional probability distribution models, i.e., find a set of parameters θ, so that the probability of the real tag sequence in the training data is maximized:

where S is the normalization of the score of all possible calculated output tag sequences y, y' is each possible tag sequence, θ ^* Then it is a set of parameters that maximizes the probability of a true tag sequence;

the label sequence y with highest score is calculated during prediction ^* ：

Where y' is each possible tag sequence.

Optionally, the named entity recognition model training is performed by adopting the following steps;

s1, preprocessing data, including removing specified useless symbols, text word segmentation, removing specified stop words and constructing a feature dictionary;

alternatively, the elimination of the specified useless symbols, namely, the unnecessary blank spaces and other meaningless symbols in the input text are useless for the model, and we use a regular expression in advance for elimination;

alternatively, text segmentation: jieba word segmentation, namely, using a jieba word segmentation library to segment the text, and processing the input text into word sequences. In the word segmentation process, a custom dictionary is built for the special words in some fields which possibly appear or the words which do not want to be split by jieba, and fixed words in the dictionary are reserved when the jieba word segmentation is used;

Alternatively, the specified stop word is removed: in word sequences generated by word segmentation, a plurality of non-meaningful words, such as 'words', and the like, are called stop words, and of course, some non-meaningful words for a model can be customized to be used as stop words, a stop word dictionary is built, and the stop words are removed after word segmentation;

alternatively, a dictionary is constructed: counting word segmentation results of training data, and constructing a dictionary;

s2, inputting data construction, namely converting a text sequence after word segmentation by using the generated feature dictionary, converting the word sequence into an index sequence, dividing a training set and a verification set according to a proportion, and storing the training set and the verification set as an input file;

s3, model training, including parameter setting, training set and verification set reading to perform model training and verification, model training results are stored, and training and verification results are returned;

alternatively, set model parameters: word embedding dimension: 300 dimension; LSTM parameters: hidden layer state number 128 (i.e., dimension corresponding to each word output by the LSTM layer), layer number 1; full connectivity layer output dimension: text sequence length number of labels;

optionally, the entity information of the model includes a department name and a person name, and has a class 5 tag;

Wherein, the label is: a beginning part of a person name, a middle part of a person name, a beginning part of a department name, a middle part of a department name, and non-entity information.

Alternatively, the labels are as follows:

beginning of B-Person name

Middle part of I-Person name

Beginning of B-Depart department name

Middle part of I-Depart department name

O non-entity information.

For example, "help me transfer information part" is "help me transfer information part" after word segmentation, and the output after labeling by the named entity recognition model is "O O O B-Depart B-Person". The department names and person names required to be extracted by the model are respectively 'information part', 'Lihong'.

Alternatively, the CRF layer can add constraint conditions to improve accuracy of the prediction result, where the constraint conditions are automatically learned by the CRF layer during training of the data. Possible constraints are:

1) The beginning of a sentence should be "B-" or "O", rather than "I-";

2) "B-label 1I-label 2I-label 3 …", in this mode, categories 1,2,3 should be the same entity category, e.g., "B-Person I-Person" is correct, while "B-Person I-Hospital" is wrong;

3) "O I-label" is wrong and the beginning of the named entity should be "B-instead of" I- ",

The invention provides a telephone exchange extension switching method utilizing the named entity recognition model, which comprises the following steps:

s4, converting voice information into text;

alternatively, the conversion of voice information into text information is realized by using the existing intelligent voice interaction platform, such as an intelligent voice interaction platform of an ali cloud;

s5, extracting entity information in the text based on the named entity recognition model;

optionally, extracting entity information (department name, person name information) from the training named entity recognition model by using the following substeps;

s5.1, loading a model file generated by training, wherein the model file comprises a dictionary, a label and a training model;

s5.2, performing data processing on the text information of the client to generate a word index sequence; the specific steps of data processing are similar to the model training part, and the construction dictionary is only needed to be replaced by a loading characteristic dictionary file;

s5.3, inputting the generated word index sequence into a trained named entity recognition model, and returning the extracted entity information (department name and person name information).

S6, searching the extension numbers based on similarity analysis;

alternatively, the following substeps are used to retrieve an extension number based on similarity analysis;

S6.1, reading all department names in a database;

s6.2, calculating the similarity between the extracted department names and all department names in the database, wherein the similarity between the department names is a weighted sum of three parts of text semantic similarity, chinese character similarity and pinyin similarity;

wherein the similarity between the department name and all department names in the database is sim (pred, all_Departi), all_Departi is the department name in the database, weights alpha, beta and gamma are empirically set according to multiple experiments, and when the similarity is calculated with the ith department name in the database,for semantic similarity of both ++>Chinese character similarity of the two ++>The pinyin similarity of the two. Edit distance is an edit distance algorithm that attributes the similarity problem of two strings to the cost of converting one string to the other. The higher the cost of the conversion, the lower the similarity of the two strings. When the Chinese character similarity and the pinyin similarity are calculated, a trigger word mechanism is set, namely when the extracted department names can be directly matched in a database, the Chinese character similarity and the pinyin similarity of the department names are directly set to be the highest value of 1.

Alternatively, the similarity sim (pred, all_part i) between the extracted department name and all the department names in the database is ordered, and the first 3 real (in the database) department names with the highest similarity are selected.

S6.3, calculating the similarity between the extracted name and all names under the selected departments;

the name similarity does not include semantic similarity, and only consists of two parts, namely Chinese character similarity and pinyin similarity. When the similarity is calculated, a trigger word mechanism is set, namely when the extracted name can be directly matched in a database, the Chinese character similarity and the pinyin similarity of the name are directly set to be the highest value of 1.

And (3) sorting the similarity of all the names under each department name selected in the step (6.2), and selecting 3 names with highest similarity.

S6.4, calculating the overall similarity of the department names and the person names, and selecting the department name and the person name with the highest overall similarity;

calculating the overall similarity of the department names and the person names = the similarity of the department names + the similarity of the person names;

a total of 3*3 =9 department names and person names were selected for matching, and the overall similarity was calculated as follows:

sim _i ＝sim(depart,all_depart _i )+sim(name,all_depart _i _name _j )；

i.e. for each department name all_part selected in step 6.2 _i Calculating similarity of the department names obtained in the step 6.2 and similarity sim (name, all_part) of the names under the department obtained in the step 6.3 _i _name _j ) Sum sim _i . And finally selecting the department name and the person name with the highest overall similarity.

S6.5, return to the extension number or go to the preset speaking.

S7, selecting the highest similarity to execute transfer.

S7.1, setting a whole similarity threshold, and if the calculated whole similarity is greater than or equal to the whole similarity threshold, returning the extension number of the person to the system;

if the calculated overall similarity is smaller than the overall similarity threshold, guiding the client to speak the information of the to-be-contacted person again by using a preset speaking operation, and jumping back to execute voice information to convert the text;

optionally, S7.2, if the number of times of executing the conversion of the voice information to the text is greater than the conversion threshold, then manually converting. The switching threshold value can be set according to actual conditions, for example, 3 times or more.

The invention provides a telephone exchange switching extension system utilizing the named entity recognition model, which comprises:

a voice recognition module for converting user voice information into text;

an information extraction module that extracts entity information in a text based on a named entity recognition model;

an extension retrieval module that retrieves an extension number based on the similarity analysis;

and the extension switching module is used for selecting the switching to be executed with the highest similarity.

Alternatively, the information extraction module obtains a representation of each word containing both forward and backward information from the bi-directional LSTM layer of the named entity recognition model as follows:

The bi-directional LSTM is a two-layer neural network, the first layer being the initial input from the right as a series,representing slaveThe last word of the sentence is taken as input and output as bh at each time step i _i ；

The second layer serves as a starting input for the series from the left,representing the input starting from the beginning of the sentence, output at each time step i as fh _i Concatenation h of hidden states of final output layer LSTM _i The method comprises the following steps:

h _i ＝[fh _i ，bh _i ]。

alternatively, the information extraction module captures word dependency relationships inside sentences for the self-care layer of the named entity recognition model in the following manner;

at each time step i, the current hidden layer state h is calculated _i And all hidden layer states h= [ h ] ₁ ,h ₂ ,...h _T ]T is the sequence length, then normalized to obtain similarity score alpha, and the alpha is used to weight and sum h to obtain context vector c _i ；

Optionally, the information extraction module defines that the full-connection layer output vector of the named entity recognition model is a prediction score of the current time step i for all tags;

p _i ＝W _i ([h _i ，c _i ])+b _i

wherein W is _i And b _i Parameters required to be learned for the model obey standard normal distribution, p during initialization _i The vector output for the full connection layer is also the predictive score for predicting the current time step i for all tags.

Alternatively, the information extraction module can add constraint conditions to the CRF layer of the entity recognition model to improve the accuracy of the prediction result, and the constraint conditions can be automatically learned by the CRF layer during training data.

Alternatively, the information extraction module can perform model training on the entity recognition model in the following manner;

data preprocessing, including removing specified useless symbols, text word segmentation, removing specified stop words and constructing a feature dictionary;

the method comprises the steps of inputting data construction, including converting text sequences after word segmentation by using a generated feature dictionary, converting word sequences into index sequences, dividing a training set and a verification set according to proportion, and storing the training set and the verification set as input files;

model training comprises the steps of setting parameters, reading a training set and a verification set to perform model training and verification, storing training results of the model, and returning the training and verification results.

Optionally, the information extraction module extracts entity information of the entity recognition model, including department names and person names, and has 5 kinds of labels;

Alternatively, the information extraction module extracts department name and person name information from the training named entity recognition model in the following manner;

Loading a model file generated by training;

carrying out data processing on the text information of the client to generate a word index sequence;

the generated word index sequence is input into a trained named entity recognition model, and the extracted department name and person name information is returned.

Alternatively, the extension search module completes the extension search in the following manner;

reading all department names in a database;

calculating the similarity between the extracted department names and all department names in the database, wherein the similarity between the department names is a weighted sum of three parts of text semantic similarity, chinese character similarity and pinyin similarity;

calculating the similarity between the extracted name and all names under the selected departments;

calculating the overall similarity of the department names and the person names, and selecting the department name and the person name with the highest overall similarity;

return to extension or go to preset session.

Alternatively, the extension switching module performs extension switching in the following manner;

setting a global similarity threshold, and if the calculated global similarity is greater than or equal to the global similarity threshold, returning the extension number of the person to the system;

and if the number of times of executing the conversion of the voice information to the text is larger than the conversion threshold value, manually converting.

The telephone exchange switching method/system converts the voice information of the customer into a text in real time, and extracts entity information such as a person name, a department name and the like in the text by using a named entity recognition technology. The invention provides a named entity recognition model of a bidirectional long short-time memory unit-conditional random field (Attention-Based BiLSTM-CRF) Based on an Attention mechanism, which is characterized in that each word in a continuous sequence is endowed with a corresponding semantic category label so as to recognize entity information. Because of the problems of dialect, poor tone quality of telephone path, error of converting voice into text, etc., the extracted information may not be accurate enough, for example, the extracted information is "Hu Jian", which should be "foolproof" in practice, if the database is searched by directly using "Hu Jian", the information cannot be searched, and no matter how many times the client passes, the conversion of the extension can not be completed as long as the client speaks "Hu Jian", the flexibility of the system is poor, and the client experience is affected. Therefore, the invention adopts the similarity analysis based extension number searching, calculates the similarity between the extracted information and the corresponding information in the database, selects the optimal match, inquires the extension number and automatically switches.

By adopting the technical scheme of the invention, the received customer voice information can be converted into text information in real time, the department names and the person names are extracted from the text information, the extension numbers are searched according to the extracted department names and person names and are switched to the customer, so that the problems that the customer cannot contact the person to be contacted, repeated statement is made aiming at the same problem, the process is complicated and the efficiency is low are avoided. The invention can accurately and rapidly search the extension numbers of the contact persons to be contacted for the clients and transfer the extension numbers according to the needs of the clients, supports the simultaneous provision of extension transfer services for multiple clients, has strong flexibility when the clients are hit and pass, combines the assistance of the manual telephone exchange customer service, and provides high-quality and efficient telephone exchange transfer service experience for the clients.

The invention adopts a named entity recognition model of a bidirectional long short time memory unit-conditional random field based on an Attention mechanism, the bidirectional long short time memory unit (BiLSTM) can extract forward and backward information, a Self-Attention (Self-Attention) mechanism is introduced to capture long-distance word dependency relationship, the semantic understanding capability of the model is stronger, the dependency relationship among learning labels of a Conditional Random Field (CRF) layer limits the label sequence, and the recognition accuracy of the model to the entity is higher. In addition, the invention searches extension numbers based on similarity analysis, which allows the problems of a certain dialect, poor tone quality of telephone channels and error of voice-to-text conversion, and has strong flexibility and high accuracy.

Drawings

The accompanying drawings are intended to illustrate the general features of methods, structures and/or materials used in accordance with certain exemplary embodiments of the invention, and supplement the description in this specification. The drawings of the present invention, however, are schematic illustrations that are not to scale and, thus, may not be able to accurately reflect the precise structural or performance characteristics of any given embodiment, the present invention should not be construed as limiting or restricting the scope of the numerical values or attributes encompassed by the exemplary embodiments according to the present invention. The invention is described in further detail below with reference to the attached drawings and detailed description:

FIG. 1 is a schematic diagram of a named entity recognition model of the present invention.

FIG. 2 is a schematic diagram of a training process of a named entity recognition model according to the present invention.

Fig. 3 is a flow chart of a method for switching extension telephone exchange according to the invention.

Detailed Description

Other advantages and technical effects of the present invention will become more fully apparent to those skilled in the art from the following disclosure, which is a detailed description of the present invention given by way of specific examples. The invention may be practiced or carried out in different embodiments, and details in this description may be applied from different points of view, without departing from the general inventive concept. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. The following exemplary embodiments of the present invention may be embodied in many different forms and should not be construed as limited to the specific embodiments set forth herein. It should be appreciated that these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the technical solution of these exemplary embodiments to those skilled in the art.

The first embodiment of the invention provides a named entity recognition model of a two-way long short-term memory cell-conditional random field (Attention-Based BiLSTM-CRF) Based on an Attention mechanism, wherein entity information of the model comprises a department name and a person name, and the model is provided with a class 5 label;

wherein, the label is: the beginning part of the name, the middle part of the name, the beginning part of the department name, the middle part of the department name and the non-entity information are labeled as follows:

beginning of B-Person name

Middle part of I-Person name

Beginning of B-Depart department name

Middle part of I-Depart department name

O non-entity information.

As shown in fig. 1, the named entity recognition model includes:

h _i ＝[fh _i ，bh _i ]；

a self-attention layer adapted to capture word dependencies within a sentence;

The full connection layer maps the outputs of the bidirectional LSTM layer and the self-care layer into a vector with one dimension being the number of output labels, and the vector is the predictive score of the current time step i for all labels: the method comprises the steps of carrying out a first treatment on the surface of the

p _i ＝W _i ([h _i ，c _i ])+b _i

the CRF layer includes two types of scores, a transmit score and a transfer score. The transmission score is the probability value of mapping each word to a label, namely the output of a full connection layer, and an output matrix of a bidirectional LSTM layer is set as P, wherein P _ij Representative word x _i Mapping to tags _j ，tag _j Representing the j-th of all tags, the value of j ranges from 0 to the number of tags-1Normalizing the probability, analogizing to a transmission probability matrix in the CRF model; the transfer fraction is tag _i Transfer of tag to tag _j The transition probability of the label is set to be A, A as the transition matrix _ij Representing tag _i Transfer to tag _j Is a transition probability of (a). For an output tag sequence y corresponding to the input sequence X, a score is defined as:

the label sequence y with highest score is calculated during prediction ^* ：

Where y' is each possible tag sequence.

The second embodiment is further improved on the first embodiment, and a step of training the named entity recognition model is added, so that the same parts as those of the first embodiment are not repeated; as shown in fig. 2, the named entity recognition model training is performed by adopting the following steps;

s1, preprocessing data, including removing specified useless symbols, text word segmentation, removing specified stop words and constructing a feature dictionary; removing specified useless symbols, namely, inputting redundant blank spaces and other meaningless symbols in the text, which are useless for the model, and removing by using a regular expression in advance;

text segmentation: jieba word segmentation, namely, using a jieba word segmentation library to segment the text, and processing the input text into word sequences. In the word segmentation process, a custom dictionary is built for the special words in some fields which possibly appear or the words which do not want to be split by jieba, and fixed words in the dictionary are reserved when the jieba word segmentation is used;

removing the specified stop words: in word sequences generated by word segmentation, a plurality of non-meaningful words, such as 'words', and the like, are called stop words, and of course, some non-meaningful words for a model can be customized to be used as stop words, a stop word dictionary is built, and the stop words are removed after word segmentation;

Constructing a dictionary: counting word segmentation results of training data, and constructing a dictionary;

setting model parameters: word embedding dimension: 300 dimension; LSTM parameters: hidden layer state number 128, layer number 1; full connectivity layer output dimension: text sequence length number of labels;

further improving the second embodiment, the CRF layer can add constraint conditions to improve the accuracy of the prediction result, wherein the constraint conditions are automatically learned by the CRF layer when training data. Possible constraints are:

1) The beginning of a sentence should be "B-" or "O", rather than "I-";

in a third embodiment, the present invention provides a method for naming entity recognition models in the first or second embodiment, including the following steps:

s4, converting the voice information into text information by using the existing intelligent voice interaction platform, for example, an intelligent voice interaction platform of the Arian;

s5, extracting entity information in the text based on the named entity recognition model, wherein the method comprises the following substeps;

S6, searching the extension numbers based on similarity analysis, wherein the method comprises the following sub-steps;

s6.1, reading all department names in a database;

Wherein the similarity between the department name and all department names in the database is sim (pred, all_part) _i )，all_depart _i For department names in the database, weights alpha, beta and gamma are empirically set according to a plurality of experiments, and when similarity is calculated with the ith department name in the database,for semantic similarity of both ++>Chinese character similarity of the two ++>The pinyin similarity of the two. Edit distance is an edit distance algorithm that attributes the similarity problem of two strings to the cost of converting one string to the other. The higher the cost of the conversion, the lower the similarity of the two strings. When the Chinese character similarity and the pinyin similarity are calculated, a trigger word mechanism is set, namely when the extracted department names can be directly matched in a database, the Chinese character similarity and the pinyin similarity of the department names are directly set to be the highest value of 1.

Similarity sim (pred, all_part) of extracted department names to all department names in the database _i ) Sorting is performed, and the top 3 true (in the database) department names with highest similarity are selected.

sim _i ＝sim(depart,all_departi)+sim(name,all_depart _i _name _j )；

S6.5, return to the extension number or go to the preset speaking.

S7, selecting the highest similarity to execute transfer.

And S7.2, if the number of times of executing the conversion of the voice information to the text is larger than the conversion threshold value, manually converting.

The fourth embodiment of the present invention provides a telephone exchange switching extension system using the named entity recognition model, including:

a voice recognition module for converting user voice information into text;

the information extraction module obtains each word from the bi-directional LSTM layer of the named entity recognition model, and the representation of the forward and backward information is as follows:

the bi-directional LSTM is a two-layer neural network, the first layer being the initial input from the right as a series,representing the last word from the sentence as input, output as bhi at each time step i;

the second layer serves as a starting input for the series from the left,representing the input starting from the beginning of the sentence, the output is fhi at each time step i, and the concatenation hi of hidden states of the final output layer LSTM is:

h _i ＝[fh _i ，bh _i ]。

the information extraction module captures word dependency relations in sentences for the self-attention layer of the named entity recognition model in the following way;

at each time step i, the current hidden layer state h is calculated _i And all hidden layer states h= [ h ] ₁ ,h ₂ ,...h _T ]T is the sequence length, normalized to obtain a similarity score alpha, and the alpha is used for weighting and summing h to obtain a context vector c _i ；

The information extraction module defines that the full-connection layer output vector of the named entity recognition model is the prediction scores of the current time step i for all tags;

p _i ＝W _i ([h _i c _i ])+b _i

The extension search module is used for completing extension search in the following way;

reading all department names in a database;

return to extension or go to preset session.

The extension switching module is used for executing extension switching in the following way;

The fifth embodiment is further modified in the fourth embodiment, and the same parts as those of the fourth embodiment are not described in detail; the information extraction module can add constraint conditions to the CRF layer of the entity identification model to improve the accuracy of the prediction result, the constraint conditions can be obtained through automatic learning by the CRF layer during training data, and the constraint conditions are obtained through automatic learning by the CRF layer during training data. Possible constraints are:

1) The beginning of a sentence should be "B-" or "O", rather than "I-";

the information extraction module can perform model training on the entity identification model in the following manner;

The information extraction module extracts entity information of the entity recognition model, including department names and person names, and has 5 types of labels;

The information extraction module extracts department name and person name information of the training named entity recognition model in the following mode;

loading a model file generated by training;

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The present invention has been described in detail by way of specific embodiments and examples, but these should not be construed as limiting the invention. Many variations and modifications may be made by one skilled in the art without departing from the principles of the invention, which is also considered to be within the scope of the invention.

Claims

1. A named entity recognition method based on a two-way long short term memory unit-conditional random field of an attention mechanism, comprising:

a bi-directional LSTM layer adapted to perform feature extraction, obtaining a representation for each word containing both forward and backward information; the bi-directional LSTM layer obtains a representation of each word containing both forward and backward information as follows:

The bi-directional LSTM is a two-layer neural network, the first layer being the initial input from the right as a series,representing the last from sentenceThe words are used as input, and are output as bh at each time step i _i ；

h _i ＝[fh _i ，bh _i ]；

a self-attention layer adapted to capture word dependencies within a sentence; the self-attention layer captures word dependency relationships inside sentences in the following way;

A full connection layer adapted to map the outputs of the bi-directional LSTM layer and the self-care layer using the full connection layer into one vector having one dimension as the number of output labels; the full connection layer output vector is the predictive score of the current time step i for all tags;

p _i ＝W _i ([h _i ，c _i ])+b _i ；

a CRF layer having two types of scores, a transmission score and a transfer score, which are adapted to learn the dependency relationship between tags;

the transmission score is the probability value of mapping each word to a label, namely the output of a full-connection layer;

the transition score is a transition probability of the first tag transitioning to the second tag.

2. The named entity recognition method of claim 1, wherein: the CRF layer can add constraint conditions to improve the accuracy of the prediction result, and the constraint conditions are automatically learned by the CRF layer when training data.

3. The named entity recognition method of claim 2, wherein the model training is performed by the steps of;

s3, model training comprises parameter setting, training set and verification set reading to perform model training and verification, model training results are stored, and training and verification results are returned.

4. A method for switching extension telephone exchange using the named entity recognition method of claim 1, comprising the steps of:

s4, converting voice information into text;

s6, searching the extension numbers based on similarity analysis;

s7, selecting the highest similarity to execute transfer.

5. The method for switching extension telephone exchange according to claim 4, wherein the following steps are adopted to extract entity information for completing training of named entity recognition model;

s5.1, loading a model file generated by training;

s5.2, performing data processing on the text information of the client to generate a word index sequence;

s5.3, inputting the generated word index sequence into a trained named entity recognition model, and returning the extracted entity information.

6. The telephone switchboard extension method according to claim 4, characterized in that step S6 comprises the following sub-steps:

s6.1, reading all department names in a database;

s6.5, return to the extension number or go to the preset speaking.

7. The telephone switchboard extension method according to claim 4, characterized in that step S7 comprises the following sub-steps:

8. A telephone exchange switching extension system utilizing the named entity recognition method of claim 1, comprising:

a voice recognition module for converting user voice information into text;

9. The telephone switchboard extension system according to claim 8, characterized in that: the information extraction module obtains each word from the bi-directional LSTM layer of the named entity recognition model, and the representation of the forward and backward information is as follows:

the bi-directional LSTM is a two-layer neural network, the first layer being the initial input from the right as a series,representing the output from the last word of the sentence as input, at each time step i as bh _i ；

The second layer serves as a starting input for the series from the left,representing opening from the beginning of a sentenceStarting input, outputting fh at each time step i _i Concatenation h of hidden states of final output layer LSTM _i The method comprises the following steps:

h _i ＝[fh _i ，bh _i ]。

10. the telephone switchboard extension system according to claim 8, characterized in that: the information extraction module captures word dependency relations in sentences for the self-attention layer of the named entity recognition model in the following way;

11. The telephone switchboard extension system according to claim 8, characterized in that: the information extraction module defines that the full-connection layer output vector of the named entity recognition model is the prediction scores of the current time step i for all tags;

p _i ＝W _i ([h _i ，c _i ])+b _i ；

12. The telephone switchboard extension system according to claim 8, characterized in that: the information extraction module can add constraint conditions to the CRF layer of the entity identification model to improve the accuracy of the prediction result, and the constraint conditions can be automatically learned by the CRF layer during data training.

13. The telephone switchboard extension system according to claim 8, characterized in that: the information extraction module can perform model training on the entity identification model in the following manner;

14. The telephone switchboard extension system according to claim 13, characterized in that: the information extraction module extracts department name and person name information of the training named entity recognition model in the following mode;

loading a model file generated by training;

15. The telephone switchboard extension system according to claim 8, characterized in that: the extension search module completes the extension search in the following way;

reading all department names in a database;

return to extension or go to preset session.

16. The telephone switchboard extension system according to claim 8, characterized in that: the extension switching module performs extension switching in the following manner;