CN110188199A - A kind of file classification method for intelligent sound interaction - Google Patents
A kind of file classification method for intelligent sound interaction Download PDFInfo
- Publication number
- CN110188199A CN110188199A CN201910427808.0A CN201910427808A CN110188199A CN 110188199 A CN110188199 A CN 110188199A CN 201910427808 A CN201910427808 A CN 201910427808A CN 110188199 A CN110188199 A CN 110188199A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- training
- feature
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 230000003993 interaction Effects 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 72
- 238000013145 classification model Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000011218 segmentation Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 12
- 239000000203 mixture Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000004891 communication Methods 0.000 description 16
- 238000001914 filtration Methods 0.000 description 14
- 230000002452 interceptive effect Effects 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000003066 decision tree Methods 0.000 description 3
- 235000013399 edible fruits Nutrition 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000010224 classification analysis Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of file classification methods for intelligent sound interaction, comprising: obtains the training text collection with label;Word segmentation processing is carried out to the training text under each label, obtains word sequence;Word frequency and inverse document frequency statistics are carried out to word sequence, obtain the corresponding TF-IDF value of each word;Word using TF-IDF value greater than predetermined threshold is as the Feature Words under the label, to generate the corresponding feature lexicon of each label;Training text collection is filtered based on feature lexicon, to obtain the corresponding eigenmatrix of training text collection;And will be trained in the textual classification model of eigenmatrix input pre-training, to be classified based on the textual classification model after training to speech text.The program can be improved the efficiency and accuracy of text classification, improve the automation and intelligence of intelligent sound interaction.
Description
Technical field
The present invention relates to field of communication technology more particularly to a kind of file classification methods for intelligent sound interaction, intelligence
It can voice interactive method, calculating equipment and storage medium.
Background technique
Existing call center has become the complete integrated information service system being linked together with enterprise, using unified mark
Quasi- service mode can provide the service of systematization, intelligence and hommization for user.Wherein, interactive voice response can be with
Automatic speech service is provided, after user is by dual tone phone input information, response that can be good to user's played pre-recorded
Voice.
It is then corresponding to being executed after character analysis processing by converting speech into text during interactive voice
Instruction, feedback content is converted into voice output finally by speech synthesis technique.But existing interactive voice response
System needs user to select the mode of telephone key-press to complete to inquire or order accordingly to execute.But this interactive mode is inadequate
It is intelligent.
Therefore, it is necessary to one kind can intelligent sound exchange method, can directly be fed back according to the voice of user and be answered accordingly
Answer voice.
Summary of the invention
For this purpose, the present invention provides a kind of file classification method for intelligent sound interaction and intelligent sound interaction sides
Method, with try hard to solve the problems, such as or at least alleviate above it is existing at least one.
According to an aspect of the invention, there is provided a kind of file classification method for intelligent sound interaction, this method
Suitable for being executed in calculating equipment.Firstly, obtaining the training text collection with label.Then, to the training text under each label
This progress word segmentation processing, obtains word sequence.Then, word frequency is carried out to word sequence and inverse document frequency counts, obtain each word pair
The TF-IDF value answered.Then, the word using TF-IDF value greater than predetermined threshold is as the Feature Words under the label, to generate each mark
Sign corresponding feature lexicon.Training text collection is filtered subsequently, based on feature lexicon, it is corresponding to obtain training text collection
Eigenmatrix.Finally, by being trained in the textual classification model of eigenmatrix input pre-training, so as to based on the text after training
This disaggregated model classifies to speech text.
Optionally, in the above-mentioned methods, data extending is carried out to training text collection, it is identical to generate quantity under each label
Training text.
Optionally, in the above-mentioned methods, the word in feature lexicon is matched with the word in word sequence, to be trained
The feature word sequence of text;Then, it is corresponding to obtain training text collection for the feature vector that feature word sequence is converted to predetermined format
Eigenmatrix.
Optionally, in the above-mentioned methods, feature vector is by label value, feature number and corresponding eigenvalue cluster at described
Feature number is characterized dimension of the word in feature lexicon, and the characteristic value is time that the specific word occurs in training text
Number.
Optionally, in the above-mentioned methods, the prediction fractional value under each label can be mapped as by normalized function
Probability matrix, using probability value maximum in probability matrix as predicted value.Then, it is based on label value and predicted value, calculates canonical
Change penalty values.Finally, first derivative and second dervative based on regularization penalty values, inverse iteration updates textual classification model
Parameter, when meeting predetermined condition, training terminates.
Optionally, in the above-mentioned methods, predetermined condition includes that the number of iterations reaches pre-determined number or penalty values less than pre-
Determine threshold value.
According to a further aspect of the present invention, a kind of intelligent sound exchange method is provided, this method is suitable for calculating equipment
Middle execution.In the method, firstly, the user speech received is converted to speech text.Then, speech text is carried out pre-
Processing, obtains the feature vector of predetermined format.Text classification will be obtained in textual classification model after feature vector input training
As a result.Finally, determining the voice messaging to user feedback based on text classification result.
Optionally, in the above-mentioned methods, word segmentation processing is carried out to speech text, obtains corresponding word sequence;And from data
The feature lexicon constructed is obtained in library.It is then based on feature lexicon to be filtered the corresponding word sequence of speech text, obtain
The corresponding feature word sequence of speech text.Finally, feature word sequence to be converted to the feature vector of predetermined format.
Optionally, in the above-mentioned methods, preset matching rule is obtained first.Matching rule is then based on to speech text
It carries out the matching analysis and exports text classification result if successful match.
Optionally, in the above-mentioned methods, matching rule includes regular expression list, Keyword List and filter word column
Table matches speech text with the items in regular expression list, export text classification if successful match as a result,
It is no to match speech text with the items in Keyword List, text classification is exported if successful match as a result, otherwise
Speech text is matched with the items in filtering word list, text classification result is exported if successful match.
Optionally, in the above-mentioned methods, firstly, position and current word of the initialization current filter word in filtering word list
Position of the symbol string in speech text.Then, current string is matched with the filter word in filtering word list, and judged
Whether the filter word of current matching is the last one word filtered in word list, if it is, successful match, if it is not, then
The position of current string is added 1, the position of current filter word is added 1, continues to match.If the character string of current matching
It arrived speech text end, then it fails to match and stops matching.
According to a further aspect of the present invention, a kind of calculating equipment, including one or more processors are provided;Memory;
One or more programs, the one or more program store in memory and are configured as being held by one or more processors
Row, one or more programs are used to execute the instruction of the above method.
According to a further aspect of the present invention, a kind of computer readable storage medium storing one or more programs is provided,
The one or more program includes instruction, when instruction is executed by calculating equipment, so that calculating equipment executes the above method.
Scheme according to the present invention is filtered text to be sorted by construction feature dictionary, by the feature of generation to
The training dataset as model is measured, can reduce the complexity of textual classification model training.Further, in conjunction with a variety of texts
Matching or classification method, can be improved the accuracy and efficiency of text classification.It, can be by user speech in interactive voice scene
Text is converted in real time, and is classified to text, really directional user is defeated according to classification results and preset service logic rule
The automation and intelligence of intelligent sound interaction can be improved in voice content out in this way, promotes user and calls experience.
Detailed description of the invention
To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings
Face, these aspects indicate the various modes that can practice principles disclosed herein, and all aspects and its equivalent aspect
It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned
And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical appended drawing reference generally refers to identical
Component or element.
Fig. 1 shows the organigram according to an embodiment of the invention for calculating equipment 100;
Fig. 2 shows the structural schematic diagrams of intelligent speech interactive system 200 according to an embodiment of the invention;
Fig. 3 shows the file classification method 300 according to an embodiment of the invention for intelligent sound interaction
Schematic flow chart;
Fig. 4 shows the file classification method 300 according to an embodiment of the invention for intelligent sound interaction
Schematic flow chart;
Fig. 5 shows the schematic flow chart of intelligent sound exchange method 500 according to an embodiment of the invention;
Fig. 6 shows the schematic flow chart of text matching technique 600 according to an embodiment of the invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Fig. 1 shows the organigram according to an embodiment of the invention for calculating equipment 100.In basic configuration
In 102, calculates equipment 100 and typically comprise system storage 106 and one or more processor 104.Memory bus 108
It can be used for the communication between processor 104 and system storage 106.
Depending on desired configuration, processor 104 can be any kind of processing, including but not limited to: microprocessor
(μ p), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include such as
The cache of one or more rank of on-chip cache 110 and second level cache 112 etc, processor core
114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU),
Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor
104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to: easily
The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System storage
Device 106 may include operating system 120, one or more program 122 and program data 124.In some embodiments,
Program 122 may be arranged to be operated using program data 124 on an operating system.
Calculating equipment 100 can also include facilitating from various interface equipments (for example, output equipment 142, Peripheral Interface
144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example
Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as facilitate via
One or more port A/V 152 is communicated with the various external equipments of such as display or loudspeaker etc.Outside example
If interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, facilitates
Via one or more port I/O 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touch
Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set
Standby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and one
A or multiple other calculate communication of the equipment 162 by network communication link.
Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave
Or computer readable instructions, data structure, program module in the modulated data signal of other transmission mechanisms etc, and can
To include any information delivery media." modulated data signal " can such signal, one in its data set or more
It is a or it change can the mode of encoded information in the signal carry out.As unrestricted example, communication media can be with
Wired medium including such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared
(IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein may include depositing
Both storage media and communication media.
Calculating equipment 100 can be implemented as server, such as file server, database server, application program service
Device and WEB server etc., are also possible to a part of portable (or mobile) electronic equipment of small size, these electronic equipments can be with
It is that such as cellular phone, personal digital assistant (PDA), personal media player device, wireless network browsing apparatus, individual wear
Equipment, application specific equipment or may include any of the above function mixing apparatus.Equipment 100 is calculated to be also implemented as
Personal computer including desktop computer and notebook computer configuration.In some embodiments, calculating equipment 100 can be matched
Be set to execute it is of the invention it is a kind of for the file classification method 300 of intelligent sound interaction, intelligent sound exchange method 500 and
Text matching technique 600.Wherein, it includes according to the present invention for executing for calculating one or more programs 122 of equipment 100
State the instruction of each method.
Text classification is exactly to pass through effective algorithm or establish effective model, realizes the automatic classification to text.Due to
Intelligent sound interaction scenarios are higher to the requirement of real-time of semantic classification, therefore this programme is excellent by carrying out to textual classification model
Change training, a kind of file classification method for intelligent sound interaction is provided, in conjunction with a variety of text classification algorithms, to improve intelligence
The real-time of text classification during interactive voice improves intelligent sound level of interaction.
Fig. 2 shows the structural schematic diagrams of intelligent speech interactive system 200 according to an embodiment of the invention.Such as figure
Shown in 2, intelligent speech interactive system 200 includes speech recognition module 210, business logic modules 220, text classification module 230
With voice playing module 240.Wherein, the user speech received can be converted to text, industry by speech recognition module 220 in real time
Business logic module can be database, save various model files, mapping table, preset rules etc..Text classification module can be right
Speech text is classified, and module scheduling module therein can be called not according to different text labels by HTTP interface
Same disaggregated model, obtains classification results.Voice playing module 240 can be by the corresponding response voice feedback of text classification to use
Voice automatic response is realized at family.
The real-time key of speech answering is the real-time of text classification processing in intelligent sound interaction.The prior art is logical
Classify frequently with single model or algorithm to text, but single model is applied to short text classification and can not often obtain
Good classifying quality.This programme combines a variety of text analyzing methods, carries out matching and classification processing to file.
Fig. 3 shows the file classification method 300 according to an embodiment of the invention for intelligent sound interaction
Schematic flow chart.This method can execute in calculating equipment 100, and calculating equipment 100 may reside within intelligent sound interaction
In system 200.It is matched as shown in figure 3, text to be predicted can be based on preset matching rule first, if matching
It is successful then export text classification as a result, if it fails to match by text input general class disaggregated model to be predicted carry out classification point
Analysis, output category result further will carry out classification analysis, output point in text input service class disaggregated model to be predicted
Class result.Wherein general class model and business class model can be obtained from pre-stored model file, be protected in model file
Label, feature lexicon and model parameter are deposited.As shown in figure 3, can be according to the IP address, port numbers, data of redis database
Library number obtains training label data and training text data and matching rule from redis database.
It is with the text classification algorithm based on polytypic textual classification model and based on matching rule in following embodiment
Example is illustrated file classification method of the invention.First before carrying out text classification using textual classification model, need pair
Textual classification model is trained.Fig. 4 shows the text according to an embodiment of the invention for intelligent sound interaction
The schematic flow chart of classification method 300;
In step s310, the training text collection with label is obtained.
Wherein, acquired training text collection can be the short text for having marked class label, may include multiple classifications
Training text can be labeled by label according to general class and service class label in one embodiment of the invention.Example
Such as, service class label can be identifying code label, marketing label, notified tag, industry label etc..The training text of each label
It may include a plurality of.
It according to one embodiment of present invention, can be to instruction when the training text quantitative difference under each label is larger
Practice text set progress data extending and reduces data nonbalance to generate the training text that data are essentially identical under each label
Problem.For example, resampling can be carried out to the classification of training text negligible amounts, by the training text number under all categories label
Amount is filled into original training text and concentrates the other quantity of maximum kind.
Then in step s 320, word segmentation processing is carried out to the training text under each label, obtains word sequence.
Word segmentation processing can be the participle based on statistics, be segmented based on corpus to text.It can be based on dictionary
Participle, the word in character string and established dictionary to be matched is matched according to certain strategy, if finding some
Word then successful match.In one embodiment of the invention, stammerer participle can be used, word segmentation processing is carried out to training text.Knot
Bar participle can realize the scanning of efficient word figure based on decision tree structure, and it is all possible at word situation institute to generate Chinese character in sentence
The directed acyclic graph of composition.Maximum probability path is wherein searched using Dynamic Programming, finds out the maximum cutting combination based on word frequency.
For the word being not logged in, using the hidden Markov model and viterbi algorithm based on Chinese character at word ability.Stammerer Chinese word segmentation
It supports three kinds of participle modes, including accurate model: attempting most accurately to cut sentence, be suitble to text analyzing;Syntype: sentence
All in son all to scan at the word of word, speed is very fast, but not can solve ambiguity problem;Search engine mould
On the basis of accurate model, to long word cutting again, recall rate is improved, is segmented suitable for search engine likes:.In addition, knot
Bar participle support traditional font participle and custom dictionaries method, can pass through addition Custom Dictionaries improve participle accuracy.Example
Such as in short message field, the short message that collection can be added sends the common-use words in field, such as identifying code, warm tip;It can be added
Some industries or mobile application end mark, such as Alipay, industrial and commercial bank, China Unicom.
Then in step S330, word frequency is carried out to word sequence and inverse document frequency counts, obtains the corresponding TF- of each word
LDF value.
One inverse document frequency of word frequency can be used to assess a word for some document in a document sets or corpus
Significance level.Wherein, word frequency TF refers to the number that some given word occurs in this document.Since the same word exists
It may be than there is higher word frequency in short essay part, whether important but regardless of the word in long file.Therefore, word frequency would generally be done and return
One change processing, such as by word frequency divided by the total word number of article, to prevent it to be biased to long file.The single highest word of the frequency of occurrences is such as
" " function words such as "Yes" " " can not reflect the meaning of text, therefore Western medicine adds a weight, the most common word to each word
Give the smallest weight, more rare but can reflect that the word of text meaning gives biggish weight, this weight is exactly reverse text
Part frequency IDF.I.e. if the document comprising a certain entry is fewer, IDF is bigger, then illustrates that entry has good class discrimination energy
Power.The lDF of a certain particular words, can be by general act number divided by the number of the file comprising the word, then the quotient that will be obtained
Logarithm is taken to obtain.
It in an embodiment of the invention, can be a text by all training text tissues under same a label
Shelves, it is assumed that a shared n label can filter out non-stop words m using the TF-lDF module in python in all documents
It is a, obtain the TF-IDF matrix of a n*m.
Then in step S340, the word using TF-IDF value greater than predetermined threshold is as the Feature Words under the label, with life
At the corresponding feature lexicon of each label.
Since each numerical value corresponds to representativeness power of each word under each label in TF-lDF matrix.In a certain label
Under, the TF-IDF value of statistics is greater than the word of predetermined threshold as the Feature Words under the label.It can be by the label and corresponding
Feature Words are saved in feature lexicon.
In step S350, training text collection is filtered based on feature lexicon, it is corresponding to obtain training text collection
Eigenmatrix.
According to one embodiment of present invention, the word in feature lexicon can be matched with the word in word sequence, with
Obtain the feature word sequence of training text.For the ease of the training of model, feature word sequence can be converted into predetermined format
Feature vector.According to one embodiment of present invention, feature word sequence can be converted to the feature vector of livsvm format, this
Sample can reduce the use of memory, improve the calculating speed of model.It can be used FormatDataLibsvm.xls pairs of macros
Feature word sequence, which formats, also can be used customized program in machine code, and this programme does not limit this.
Feature vector can be by label (label value), feature number and corresponding eigenvalue cluster at format is as follows: <
Labe1><index1>:<value1><index2>:<value2>... wherein, label indicates the label of training text, can be with
It is customized, it then can be label value predetermined if it is classification task, if it is the task of recurrence, then can be target value.
Index indicates that feature number, feature number correspond to dimension subscript of the word in feature lexicon.Value indicates characteristic value, i.e.,
The number that the specific word occurs in the training text under the label.
Finally, will be trained in the textual classification model of eigenmatrix input pre-training, in step S360 so as to base
Textual classification model after training classifies to speech text.
For the training text of multiple labels, the decision Tree algorithms of gradient promotion can be used.The algorithm is by more decisions
Tree composition, the conclusion of all trees, which adds up, obtains classification results.Wherein, decision tree can be divided into regression tree and classification tree, return
Gui Shu, for predicting tag along sort value, is regression tree used in textual classification model for predicting real number value, classification tree.
According to one embodiment of present invention, the extreme gradient lift scheme of XGBoost can be used, by each Weak Classifier
Output regard successive value as so that loss function value be it is continuous, Optimized model is achieved the purpose that by the iteration of weak learner.
The algorithm idea of XGBoost is exactly that constantly addition is set, and constantly carries out feature and divides to grow one tree, addition one every time
Tree is one new function of study in fact, goes the residual error of fitting last time prediction.The score for predicting a sample is exactly basis in fact
The feature of this sample, as soon as can fall on corresponding leaf node in each tree, each leaf node corresponds to a score,
Finally the corresponding score of each tree is added up be exactly the sample predicted value.
In one embodiment of the invention, the training text collection that can be will acquire is divided into training set according to a certain percentage
Collect with verifying.XGBoost model is for each classification M classifier of training.Assuming that there is k classification, then the model after training has
M*k tree.In the training process, XGBoost uses one-to-many strategy, every time by a classification as positive sample, remaining classification
As negative sample.Can initialization model first parameter, then using the corresponding eigenmatrix of training set as the input of model.
Feature vector corresponding for each label obtains prediction fractional value.Prediction fractional value can be mapped by normalized function
For probability matrix, and using probability value maximum in probability matrix as predicted value.Finally based between label value and predicted value
The first derivative and second dervative of regularization penalty values, inverse iteration update the parameter of textual classification model, predetermined until meeting
When condition, training terminates.Predetermined condition may include the maximum preset quantity of tree, penalty values less than predetermined threshold, the number of iterations
Reach pre-determined number etc..Wherein, regularization penalty values are made of two parts, first part be used to measure predicted value and label value it
Between error, such as can be mean square deviation penalty values.Another part is then regularization term.Regularization term is exactly to apply one to parameter
Fixed control prevents model over-fitting.Common regularization term is L2 canonical, i.e., the quadratic sum of all parameters.Used above
Textual classification model is merely exemplary, and can also use the textual classification model based on convolutional neural networks, this programme is to this
Without limitation.
After the training for completing above-mentioned model, the textual classification model after training can be saved as to model file, model text
Corresponding label, feature lexicon and model parameter are saved in part.Then, so that it may utilize the textual classification model after being trained
Classification prediction is carried out to speech text to be sorted.Fig. 5 shows intelligent sound interaction according to an embodiment of the invention
The schematic flow chart of method 500.As shown in figure 5, the user speech received is converted to voice text in step S510
This.Speech recognition algorithm can be used, voice is converted into text in real time, to carry out classification analysis.
Then in step S520, speech text is pre-processed, to obtain the feature vector of predetermined format.For example,
Word segmentation processing can be carried out to text, obtain the corresponding word sequence of text.Then using the feature lexicon constructed in advance to word order
Column are filtered, and obtain feature word sequence.Then feature word sequence is encoded, can be one-hot coding or TF-IDF
Coding, is then converted to scheduled format for the vector after coding.Such as vector can be done to the compression of libsvm format, it obtains
Convenient for the feature vector of model treatment.Data format is label: feature number, characteristic value, wherein the automatic zero padding in label position.
Then it in step S530, will be obtained in the textual classification model after the corresponding feature vector input training of speech text
Obtain text classification result.
The corresponding feature vector of text can be saved as to the format or file format of eigenmatrix.Such as it can will be special
Classification prediction is carried out in sign Input matrix XGBoost textual classification model.Eigenmatrix can pass through k of textual classification model
Tree (corresponding k label) obtains fractional value of the text under each label, and obtained each fractional value is damaged by softmax
Lose the probability matrix that Function Mapping is k dimension.Multiple fractional values that softmax function can obtain model carry out normalizing
Change processing, the value made allow result to become interpretable between 0-1.It can regard result as probability, some classification is general
A possibility that rate is bigger, and text is classified as the category is also higher.It is corresponding that most probable value is finally chosen from probability matrix
Label is as text classification result.If the most probable value in probability matrix is less than predetermined threshold and current text disaggregated model
For the last one text classification module, then returning can not understand, otherwise continue text classification into next categorization module
Analysis.
It according to one embodiment of present invention, can be first before being classified using textual classification model to text
Speech text is based on default rule and carries out the matching analysis, to obtain text classification result.Fig. 6 shows according to the present invention
The schematic flow chart of the text matching technique 600 of one embodiment.As shown in fig. 6, obtaining preset matching rule first, so
Afterwards, it is based on preset matching rule, text to be predicted is subjected to the matching analysis, text matches knot is exported if successful match
Fruit.It may include regular expression list, key column for example, preset matching rule can be obtained from redis database
Table, filtering word list, the matching of Lai Jinhang dynamic fuzzy.Wherein regular expression is one operated to character string and spcial character
Kind logical formula, can use some specific characters pre-defined or character combination, forms a regular character string, is used to
Express a kind of filter logic to character string.Keyword is some identifiers with specific function, such as " false, ture,
None, and, if, or etc..Wherein, the re module of python provides regular expression matching operation, and regular expression is compiled
At a series of bytecode, can be executed by the matching engine that C language is write.It is, for example, possible to use the compilings of compile function just
Then expression formula mode is matched since initial position using match function, is traversed and is matched using findalI function, group letter
Number grouping matching etc..The matching rule and method of regular expression are general knowledge known in this field, and this programme repeats no more this.
In an embodiment of the invention, as follows to the matched process of speech text progress using filtering word list:
Position of position and current string of the initialization current filter word in filtering word list in speech text first, such as just
Position i=0, character string position endpoint=0 in speech text of the beginningization filter word in filtering word list.Then will
Current string is matched with the i-th bit word in filtering word list, and judges whether i-th bit word is in filtering word list
The last one word, if it is successful match;If not then enable endpoint be equal to current matching character string position
End add 1, if the character string of current matching has arrived at speech text end, it fails to match.Otherwise, current character is enabled
String is equal to the current position endpoint to the interception string at speech text end, and i, which is incremented by, adds 1.Continue current interception string and mistake
I-th bit word in filter word list is matched.If the i-th bit word match in current string and filtering word list fails
Then stop matching, final output matching result.
The re module that python can directly be utilized, by speech text and regular expression list it is every one by one
Match, if successful match, returned text divides classification results, if it fails to match, continues speech text and key column
Items in table match one by one, otherwise the returned text classification results if successful match continue speech text and filter word
List is matched, and the returned text classification results if successful match continue to use text classification mould if it fails to match
Type carries out classification prediction to speech text.Textual classification model may include general class model and business class model, can be first
It calls general class disaggregated model to classify text to be predicted, obtains universal classification as a result, then calling service class classification mould
Type classifies to text to be predicted, obtains business classification results.
Finally in step S540, the voice messaging to user feedback is determined based on text classification result.
It is corresponding with speech answering information since text categories have been stored in advance in business logic modules 220 shown in Fig. 2
Relationship can determine speech answering information by searching for the mode of mapping table, feed back to user.
According to the solution of the present invention, text to be sorted is filtered by construction feature dictionary, by the feature of generation
Training dataset of the vector as model can reduce the complexity of textual classification model training.Further, in conjunction with a variety of texts
This matching or classification method can be improved the accuracy and efficiency of text classification.It, can be by user's language in interactive voice scene
Sound is converted to text in real time, and classifies to text, according to classification results and the true directional user of preset service logic rule
The automation and intelligence of intelligent sound interaction can be improved in the voice content of output in this way, improves user and calls experience.
A6, method as described in a5, wherein the predetermined condition includes that the number of iterations reaches pre-determined number or penalty values
Less than predetermined threshold.
B10, the method as described in B9, wherein the matching rule includes regular expression list, Keyword List and mistake
Word list is filtered, it is described to include: to the step of speech text progress the matching analysis based on the matching rule
Speech text is matched with the items in regular expression list, text classification is exported if successful match
As a result, otherwise
Speech text is matched with the items in Keyword List, text classification knot is exported if successful match
Fruit, otherwise
Speech text is matched with the items in filtering word list, text classification knot is exported if successful match
Fruit.
B11, the method as described in B10, wherein described that speech text and the items progress in filtering word list is matched
Step includes:
Initialize position of position and current string of the current filter word in filtering word list in speech text;
By current string with filtering word list in filter word match, and judge current matching filter word whether
It is the last one word filtered in word list, if it is, successful match, if it is not, then the position of current string is added
1, the position of current filter word is added 1, continues to match;
If the character string of current matching arrived speech text end, it fails to match and stops matching.
It should be appreciated that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, it is right above
In the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure or
In person's descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. claimed hair
Bright requirement is than feature more features expressly recited in each claim.More precisely, as the following claims
As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real
Thus the claims for applying mode are expressly incorporated in the specific embodiment, wherein each claim itself is used as this hair
Bright separate embodiments.
Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups
Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example
In different one or more equipment.Module in aforementioned exemplary can be combined into a module or furthermore be segmented into multiple
Submodule.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
Various technologies described herein are realized together in combination with hardware or software or their combination.To the present invention
Method and apparatus or the process and apparatus of the present invention some aspects or part can take insertion tangible media, such as it is soft
The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums,
Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to practice this hair
Bright equipment.
In the case where program code executes on programmable computers, calculates equipment and generally comprise processor, processor
Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely
A few output device.Wherein, memory is configured for storage program code;Processor is configured for according to the memory
Instruction in the said program code of middle storage executes method of the present invention.
By way of example and not limitation, computer-readable medium includes computer storage media and communication media.It calculates
Machine readable medium includes computer storage media and communication media.Computer storage medium storage such as computer-readable instruction,
The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc.
Data-signal processed passes to embody computer readable instructions, data structure, program module or other data including any information
Pass medium.Above any combination is also included within the scope of computer-readable medium.
In addition, be described as herein can be by the processor of computer system or by executing by some in the embodiment
The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method
The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, Installation practice
Element described in this is the example of following device: the device be used for implement as in order to implement the purpose of the invention element performed by
Function.
As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc.
Description plain objects, which are merely representative of, is related to the different instances of similar object, and is not intended to imply that the object being described in this way must
Must have the time it is upper, spatially, sequence aspect or given sequence in any other manner.
Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from
It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that
Language used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit
Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, for this
Many modifications and changes are obvious for the those of ordinary skill of technical field.For the scope of the present invention, to this
Invent done disclosure be it is illustrative and not restrictive, it is intended that the scope of the present invention be defined by the claims appended hereto.
Claims (10)
1. a kind of file classification method for intelligent sound interaction, suitable for being executed in calculating equipment, which comprises
Obtain the training text collection with label;
Word segmentation processing is carried out to the training text under each label, to obtain word sequence;
Word frequency and inverse document frequency statistics are carried out to the word sequence, obtain the corresponding TF-IDF term frequency-inverse document frequency of each word
Rate statistical value;
Word using the TF-IDF value greater than predetermined threshold is as the Feature Words under the label, to generate the corresponding spy of each label
Levy dictionary;
The training text collection is filtered based on feature lexicon, to obtain the corresponding eigenmatrix of training text collection;And
It will be trained in the textual classification model of eigenmatrix input pre-training, so as to based on the textual classification model after training
Classify to speech text.
2. the method for claim 1, wherein under each label training text carry out word segmentation processing the step of it
Before, the method also includes:
Data extending is carried out to training text collection, to generate the identical training text of quantity under each label.
3. the method for claim 1, wherein step being filtered based on feature lexicon to the training text collection
Suddenly include:
Word in feature lexicon is matched with the word in word sequence, to obtain the feature word sequence of the training text;With
And
The feature vector that the feature word sequence is converted to predetermined format obtains the corresponding eigenmatrix of training text collection.
4. method as claimed in claim 3, wherein described eigenvector is by label value, feature number and corresponding characteristic value
Composition, the feature number are characterized dimension of the word in feature lexicon, and the characteristic value is the specific word in training text
The number of appearance.
5. method as claimed in claim 4, wherein described to be carried out in the textual classification model of eigenmatrix input pre-training
Trained step includes:
Prediction fractional value under each label is mapped as probability matrix by normalized function;
Using probability value maximum in the probability matrix as predicted value;
Based on label value and predicted value, regularization penalty values are calculated;And
First derivative and second dervative based on regularization penalty values, inverse iteration update the parameter of textual classification model, until
When meeting predetermined condition, training terminates.
6. a kind of intelligent sound exchange method, suitable for being executed in calculating equipment, which comprises
The user speech received is converted into speech text;
The speech text is pre-processed, the feature vector of predetermined format is obtained;
Text classification result will be obtained in textual classification model after described eigenvector input training, wherein the text point
Class model is generated based on method described in claim 1-5 any one;And
The voice messaging to user feedback is determined based on text classification result.
7. method as claimed in claim 6, wherein it is described that the speech text is pre-processed, obtain predetermined format
The step of feature vector includes:
Word segmentation processing is carried out to speech text, obtains corresponding word sequence;
The feature lexicon constructed is obtained from database;
The word sequence is filtered based on feature lexicon, obtains the corresponding feature word sequence of speech text;And
Feature word sequence is converted to the feature vector of predetermined format.
8. method as claimed in claim 6, wherein before pre-processing to institute's voice file, the method is also wrapped
It includes:
Obtain preset matching rule;
The matching analysis is carried out to speech text based on the matching rule and exports text matches result if successful match.
9. a kind of calculating equipment, comprising:
One or more processors;
Memory;
One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one
A or multiple processors execute, and one or more of programs include for executing in -8 the methods according to claim 1
The instruction of either method.
10. a kind of computer readable storage medium for storing one or more programs, one or more of programs include instruction,
Described instruction is when calculating equipment execution, so that the equipment that calculates executes appointing in method described in -8 according to claim 1
One method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910427808.0A CN110188199A (en) | 2019-05-21 | 2019-05-21 | A kind of file classification method for intelligent sound interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910427808.0A CN110188199A (en) | 2019-05-21 | 2019-05-21 | A kind of file classification method for intelligent sound interaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110188199A true CN110188199A (en) | 2019-08-30 |
Family
ID=67717204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910427808.0A Pending CN110188199A (en) | 2019-05-21 | 2019-05-21 | A kind of file classification method for intelligent sound interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110188199A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110634050A (en) * | 2019-09-06 | 2019-12-31 | 北京无限光场科技有限公司 | Method, device, electronic equipment and storage medium for identifying house source type |
CN110689878A (en) * | 2019-10-11 | 2020-01-14 | 浙江百应科技有限公司 | XLNET-based intelligent voice conversation intention recognition method |
CN110929513A (en) * | 2019-10-31 | 2020-03-27 | 北京三快在线科技有限公司 | Text-based label system construction method and device |
CN111177084A (en) * | 2019-12-20 | 2020-05-19 | 平安信托有限责任公司 | File classification method and device, computer equipment and storage medium |
CN111357015A (en) * | 2019-12-31 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech synthesis method, apparatus, computer device and computer-readable storage medium |
CN111506757A (en) * | 2020-04-10 | 2020-08-07 | 复旦大学 | Voice marking device and method based on incremental iteration |
CN111506702A (en) * | 2020-03-25 | 2020-08-07 | 北京万里红科技股份有限公司 | Knowledge distillation-based language model training method, text classification method and device |
CN111753525A (en) * | 2020-05-21 | 2020-10-09 | 浙江口碑网络技术有限公司 | Text classification method, device and equipment |
CN111753086A (en) * | 2020-06-11 | 2020-10-09 | 北京天空卫士网络安全技术有限公司 | Junk mail identification method and device |
CN111930944A (en) * | 2020-08-12 | 2020-11-13 | 中国银行股份有限公司 | File label classification method and device |
CN112133308A (en) * | 2020-09-17 | 2020-12-25 | 中国建设银行股份有限公司 | Method and device for multi-label classification of voice recognition text |
CN112181490A (en) * | 2020-09-22 | 2021-01-05 | 中国建设银行股份有限公司 | Method, device, equipment and medium for identifying function category in function point evaluation method |
CN112507082A (en) * | 2020-12-16 | 2021-03-16 | 作业帮教育科技(北京)有限公司 | Method and device for intelligently identifying improper text interaction and electronic equipment |
WO2021051560A1 (en) * | 2019-09-17 | 2021-03-25 | 平安科技(深圳)有限公司 | Text classification method and apparatus, electronic device, and computer non-volatile readable storage medium |
CN112699944A (en) * | 2020-12-31 | 2021-04-23 | ***股份有限公司 | Order-returning processing model training method, processing method, device, equipment and medium |
CN112908319A (en) * | 2019-12-04 | 2021-06-04 | 海信视像科技股份有限公司 | Method and equipment for processing information interaction |
CN113010667A (en) * | 2019-12-20 | 2021-06-22 | 王道维 | Training method for machine learning decision model by using natural language corpus |
CN113268579A (en) * | 2021-06-24 | 2021-08-17 | 中国平安人寿保险股份有限公司 | Dialog content type identification method and device, computer equipment and storage medium |
CN113449103A (en) * | 2021-01-28 | 2021-09-28 | 民生科技有限责任公司 | Bank transaction flow classification method and system integrating label and text interaction mechanism |
CN113806492A (en) * | 2021-09-30 | 2021-12-17 | 中国平安人寿保险股份有限公司 | Record generation method, device and equipment based on semantic recognition and storage medium |
CN113821631A (en) * | 2021-01-20 | 2021-12-21 | 广东省信息网络有限公司 | Commodity matching method based on big data |
CN117524196A (en) * | 2023-11-07 | 2024-02-06 | 北京鸿途信达科技股份有限公司 | Advertisement generation system based on voice interaction |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095996A (en) * | 2016-06-22 | 2016-11-09 | 量子云未来(北京)信息科技有限公司 | Method for text classification |
CN106649694A (en) * | 2016-12-19 | 2017-05-10 | 北京云知声信息技术有限公司 | Method and device for identifying user's intention in voice interaction |
CN107316643A (en) * | 2017-07-04 | 2017-11-03 | 科大讯飞股份有限公司 | Voice interactive method and device |
AU2018101524A4 (en) * | 2018-10-14 | 2018-11-15 | Chai, Xiayun MISS | Stock prediction research based on finiancial news by svm |
US20190066675A1 (en) * | 2017-08-23 | 2019-02-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Artificial intelligence based method and apparatus for classifying voice-recognized text |
CN109657064A (en) * | 2019-02-28 | 2019-04-19 | 广东电网有限责任公司 | A kind of file classification method and device |
CN109710758A (en) * | 2018-12-11 | 2019-05-03 | 浙江工业大学 | A kind of user's music preferences classification method based on Labeled-LDA model |
CN109726387A (en) * | 2017-10-31 | 2019-05-07 | 科沃斯商用机器人有限公司 | Man-machine interaction method and system |
-
2019
- 2019-05-21 CN CN201910427808.0A patent/CN110188199A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095996A (en) * | 2016-06-22 | 2016-11-09 | 量子云未来(北京)信息科技有限公司 | Method for text classification |
CN106649694A (en) * | 2016-12-19 | 2017-05-10 | 北京云知声信息技术有限公司 | Method and device for identifying user's intention in voice interaction |
CN107316643A (en) * | 2017-07-04 | 2017-11-03 | 科大讯飞股份有限公司 | Voice interactive method and device |
US20190066675A1 (en) * | 2017-08-23 | 2019-02-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Artificial intelligence based method and apparatus for classifying voice-recognized text |
CN109726387A (en) * | 2017-10-31 | 2019-05-07 | 科沃斯商用机器人有限公司 | Man-machine interaction method and system |
AU2018101524A4 (en) * | 2018-10-14 | 2018-11-15 | Chai, Xiayun MISS | Stock prediction research based on finiancial news by svm |
CN109710758A (en) * | 2018-12-11 | 2019-05-03 | 浙江工业大学 | A kind of user's music preferences classification method based on Labeled-LDA model |
CN109657064A (en) * | 2019-02-28 | 2019-04-19 | 广东电网有限责任公司 | A kind of file classification method and device |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110634050A (en) * | 2019-09-06 | 2019-12-31 | 北京无限光场科技有限公司 | Method, device, electronic equipment and storage medium for identifying house source type |
CN110634050B (en) * | 2019-09-06 | 2023-04-07 | 北京无限光场科技有限公司 | Method, device, electronic equipment and storage medium for identifying house source type |
WO2021051560A1 (en) * | 2019-09-17 | 2021-03-25 | 平安科技(深圳)有限公司 | Text classification method and apparatus, electronic device, and computer non-volatile readable storage medium |
CN110689878A (en) * | 2019-10-11 | 2020-01-14 | 浙江百应科技有限公司 | XLNET-based intelligent voice conversation intention recognition method |
CN110929513A (en) * | 2019-10-31 | 2020-03-27 | 北京三快在线科技有限公司 | Text-based label system construction method and device |
CN112908319B (en) * | 2019-12-04 | 2022-10-25 | 海信视像科技股份有限公司 | Method and equipment for processing information interaction |
CN112908319A (en) * | 2019-12-04 | 2021-06-04 | 海信视像科技股份有限公司 | Method and equipment for processing information interaction |
CN111177084A (en) * | 2019-12-20 | 2020-05-19 | 平安信托有限责任公司 | File classification method and device, computer equipment and storage medium |
CN113010667A (en) * | 2019-12-20 | 2021-06-22 | 王道维 | Training method for machine learning decision model by using natural language corpus |
CN111357015B (en) * | 2019-12-31 | 2023-05-02 | 深圳市优必选科技股份有限公司 | Text conversion method, apparatus, computer device, and computer-readable storage medium |
CN111357015A (en) * | 2019-12-31 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech synthesis method, apparatus, computer device and computer-readable storage medium |
CN111506702A (en) * | 2020-03-25 | 2020-08-07 | 北京万里红科技股份有限公司 | Knowledge distillation-based language model training method, text classification method and device |
CN111506757A (en) * | 2020-04-10 | 2020-08-07 | 复旦大学 | Voice marking device and method based on incremental iteration |
CN111753525A (en) * | 2020-05-21 | 2020-10-09 | 浙江口碑网络技术有限公司 | Text classification method, device and equipment |
CN111753525B (en) * | 2020-05-21 | 2023-11-10 | 浙江口碑网络技术有限公司 | Text classification method, device and equipment |
CN111753086A (en) * | 2020-06-11 | 2020-10-09 | 北京天空卫士网络安全技术有限公司 | Junk mail identification method and device |
CN111930944A (en) * | 2020-08-12 | 2020-11-13 | 中国银行股份有限公司 | File label classification method and device |
CN111930944B (en) * | 2020-08-12 | 2023-08-22 | 中国银行股份有限公司 | File label classification method and device |
CN112133308B (en) * | 2020-09-17 | 2024-07-09 | 中国建设银行股份有限公司 | Method and device for classifying multiple tags of speech recognition text |
CN112133308A (en) * | 2020-09-17 | 2020-12-25 | 中国建设银行股份有限公司 | Method and device for multi-label classification of voice recognition text |
CN112181490A (en) * | 2020-09-22 | 2021-01-05 | 中国建设银行股份有限公司 | Method, device, equipment and medium for identifying function category in function point evaluation method |
CN112181490B (en) * | 2020-09-22 | 2024-05-24 | 中国建设银行股份有限公司 | Method, device, equipment and medium for identifying function category in function point evaluation method |
CN112507082A (en) * | 2020-12-16 | 2021-03-16 | 作业帮教育科技(北京)有限公司 | Method and device for intelligently identifying improper text interaction and electronic equipment |
CN112699944A (en) * | 2020-12-31 | 2021-04-23 | ***股份有限公司 | Order-returning processing model training method, processing method, device, equipment and medium |
CN112699944B (en) * | 2020-12-31 | 2024-04-23 | ***股份有限公司 | Training method, processing method, device, equipment and medium for returning list processing model |
CN113821631B (en) * | 2021-01-20 | 2022-04-22 | 广东省信息网络有限公司 | Commodity matching method based on big data |
CN113821631A (en) * | 2021-01-20 | 2021-12-21 | 广东省信息网络有限公司 | Commodity matching method based on big data |
CN113449103B (en) * | 2021-01-28 | 2024-05-10 | 民生科技有限责任公司 | Bank transaction running water classification method and system integrating label and text interaction mechanism |
CN113449103A (en) * | 2021-01-28 | 2021-09-28 | 民生科技有限责任公司 | Bank transaction flow classification method and system integrating label and text interaction mechanism |
CN113268579A (en) * | 2021-06-24 | 2021-08-17 | 中国平安人寿保险股份有限公司 | Dialog content type identification method and device, computer equipment and storage medium |
CN113268579B (en) * | 2021-06-24 | 2023-12-08 | 中国平安人寿保险股份有限公司 | Dialogue content category identification method, device, computer equipment and storage medium |
CN113806492A (en) * | 2021-09-30 | 2021-12-17 | 中国平安人寿保险股份有限公司 | Record generation method, device and equipment based on semantic recognition and storage medium |
CN113806492B (en) * | 2021-09-30 | 2024-02-06 | 中国平安人寿保险股份有限公司 | Record generation method, device, equipment and storage medium based on semantic recognition |
CN117524196A (en) * | 2023-11-07 | 2024-02-06 | 北京鸿途信达科技股份有限公司 | Advertisement generation system based on voice interaction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188199A (en) | A kind of file classification method for intelligent sound interaction | |
US11663409B2 (en) | Systems and methods for training machine learning models using active learning | |
US11425064B2 (en) | Customized message suggestion with user embedding vectors | |
US20200019609A1 (en) | Suggesting a response to a message by selecting a template using a neural network | |
US11704500B2 (en) | Techniques to add smart device information to machine learning for increased context | |
WO2019153613A1 (en) | Chat response method, electronic device and storage medium | |
CN110032632A (en) | Intelligent customer service answering method, device and storage medium based on text similarity | |
US11610064B2 (en) | Clarification of natural language requests using neural networks | |
CN108416375B (en) | Work order classification method and device | |
CN108038492A (en) | A kind of perceptual term vector and sensibility classification method based on deep learning | |
US20210303636A1 (en) | Systems and methods for automatically determining utterances, entities, and intents based on natural language inputs | |
US11599666B2 (en) | Smart document migration and entity detection | |
CN112163419A (en) | Text emotion recognition method and device, computer equipment and storage medium | |
US20190228297A1 (en) | Artificial Intelligence Modelling Engine | |
CN110019790A (en) | Text identification, text monitoring, data object identification, data processing method | |
US20220383157A1 (en) | Interpretable machine learning for data at scale | |
CN112016313A (en) | Spoken language element identification method and device and alarm situation analysis system | |
CN114492669B (en) | Keyword recommendation model training method, recommendation device, equipment and medium | |
Schultz et al. | Distance based source domain selection for sentiment classification | |
CN110489730A (en) | Text handling method, device, terminal and storage medium | |
CN117216227B (en) | Tobacco enterprise intelligent information question-answering method based on knowledge graph and large language model | |
CN111581377B (en) | Text classification method and device, storage medium and computer equipment | |
US20230419968A1 (en) | Method of generating summary based on main speaker | |
US11893352B2 (en) | Dependency path reasoning for measurement extraction | |
US11874880B2 (en) | Apparatuses and methods for classifying a user to a posting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190830 |