CN110347840A - Complain prediction technique, system, equipment and the storage medium of text categories - Google Patents
Complain prediction technique, system, equipment and the storage medium of text categories Download PDFInfo
- Publication number
- CN110347840A CN110347840A CN201910650261.0A CN201910650261A CN110347840A CN 110347840 A CN110347840 A CN 110347840A CN 201910650261 A CN201910650261 A CN 201910650261A CN 110347840 A CN110347840 A CN 110347840A
- Authority
- CN
- China
- Prior art keywords
- history
- text data
- data
- classification
- complaint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 239000007787 solid Substances 0.000 claims abstract description 49
- 238000012545 processing Methods 0.000 claims abstract description 31
- 238000004422 calculation algorithm Methods 0.000 claims description 35
- 238000012549 training Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000007635 classification algorithm Methods 0.000 description 3
- 238000003064 k means clustering Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/12—Hotels or restaurants
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/14—Travel agencies
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Human Resources & Organizations (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses prediction technique, system, equipment and the storage medium of a kind of complaint text categories of OTA platform, the prediction technique includes the history complaint text data for obtaining OTA platform;It complains text data to be clustered history, mark the complaint classification that processing every part of history of acquisition complains text data;Obtain history dimension data and history solid data;Establish the prediction model for predicting complaint classification belonging to complaint text data;It obtains target and complains text data;Target is complained into text data input prediction model, obtains the probability value that target complains text data to belong to every kind of complaint classification;Determine that target complains target belonging to text data to complain classification according to probability value.The present invention improves the precision of text classification, and realization automatically sorts out customer complaint content, relevant persons in charge is handled in the complaint classification being responsible in time oneself, also saves a large amount of manpower while improving user experience.
Description
Technical field
The present invention relates to technical field of data processing, in particular to a kind of prediction side of the complaint text categories of OTA platform
Method, system, equipment and storage medium.
Background technique
In OTA (Online Travel Agency, online tourism) platform, need to carry out classification processing to complaint text
It determines its corresponding complaint classification, and then takes different solutions to improve to promote use according to different complaint classifications
Family experience.
Currently, mostly using RNN (Recognition with Recurrent Neural Network) or the CNN of word-based insertion (volume greatly in text classification scene
Product neural network) algorithm.However, although the text classification algorithm based on RNN can effectively be built for text context
Mould captures context semanteme, but later moment in time needs to rely on the calculated result of previous moment, that is, can not achieve parallel place
Reason, therefore generally require the training time grown very much.The algorithm of the CNN of word-based insertion is often because of OOV (unregistered word), spy
It levies sparse and leads to model over-fitting, although the text classification algorithm based on CNN can solve problem that cannot be parallel, but be based on
The text classification algorithm of CNN can only identify local text information, therefore will receive certain influence in precision aspect.
Summary of the invention
The technical problem to be solved by the present invention is in order to overcome in the prior art to complain text carry out classification processing calculation
Method, which exists, is unable to parallel processing, and the training time is longer or precision is unsatisfactory for desired defect, provides a kind of complaint text of OTA platform
The prediction technique, system, equipment and storage medium of this classification.
The present invention is to solve above-mentioned technical problem by following technical proposals:
The present invention provides a kind of prediction technique of the complaint text categories of OTA platform, and the prediction technique includes:
It obtains OTA platform corresponding history in history set period of time and complains text data;
It complains text data to be labeled processing the history, obtains every part of history and complain text data corresponding
Complain classification;
Obtain history dimension data corresponding with history complaint text data and history entity number in the OTA platform
According to;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
Using history complaint text data, the history dimension data and the history solid data as input, with
The history complains the corresponding history of text data to complain classification as output, establishes for predicting to complain text data institute
The prediction model of the complaint classification of category;
It obtains target and complains text data;
It complains text data to input the prediction model target, obtains the target and text data is complained to belong to often
Kind complains the probability value of classification;
Determine that the target complains target belonging to text data to complain classification according to the probability value.
Preferably, the acquisition OTA platform is in history set period of time the step of corresponding history complaint text data
Later, before the step of complaining text data to be labeled processing the history further include:
Text data is complained to carry out clustering processing the history using clustering algorithm;
It is described to complain text data to be labeled processing the history, it obtains every part of history and complains text data pair
The step of complaint classification answered includes:
The history for belonging to same cluster result complaint text data is labeled as the same complaint classification.
Preferably, described complain text data, the history dimension data and the history solid data with the history
As input, complains the corresponding history of text data to complain classification as output using the history, establish for predicting to throw
Before the step of telling the prediction model of complaint classification belonging to text data further include:
To mark, treated that the history complains text data to pre-process.
Preferably, described determine that the target complains target belonging to text data to complain classification according to the probability value
Step includes:
Determine that corresponding complaint classification is that the target complains the mesh belonging to text data when the probability value maximum
Mark complains classification.
Preferably, the acquisition OTA platform is in history set period of time the step of corresponding history complaint text data
Before further include:
It complains text data to carry out pre-training the history using a kind of BERT (natural language processing algorithm) algorithm to obtain
Take language model;
It is described to complain text data, the history dimension data and the history solid data as defeated using the history
Enter, complains the corresponding history of text data to complain classification as output using the history, establish for predicting to complain text
The step of prediction model of complaint classification belonging to data includes:
BERT algorithm is used to complain text data, the history dimension data and the history solid data with the history
As input, complains the corresponding history of text data to complain classification as output using the history, be based on the language mould
Type is established when training by way of covering the part solid data at random for predicting to complain throwing belonging to text data
Tell the prediction model of classification.
The present invention also provides a kind of forecasting system of the complaint text categories of OTA platform, the forecasting system includes history
Text data obtains module, mark processing module, dimension and solid data and obtains module, surveys model building module, target text
Data acquisition module, probability value obtain module and target complains classification to obtain module;
The history text data acquisition module is thrown for obtaining OTA platform corresponding history in history set period of time
Tell text data;
The mark processing module is used to complain text data to be labeled processing the history, goes through described in every part of acquisition
History complains the corresponding complaint classification of text data;
The dimension and solid data obtain module and complain text data with the history for obtaining in the OTA platform
Corresponding history dimension data and history solid data;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
The prediction model establishes module for complaining text data, the history dimension data and described with the history
History solid data complains the corresponding history of text data to complain classification as output as input, using the history, builds
Found the prediction model for predicting complaint classification belonging to complaint text data;
The target text data acquisition module complains text data for obtaining target;
The probability value obtains module and is used to complaining the target into the text data input prediction model, described in acquisition
Target complains text data to belong to every kind of probability value for complaining classification;
The target complains classification to obtain module and is used to determine that the target complains text data institute according to the probability value
The target of category complains classification.
Preferably, the forecasting system further includes cluster module;
The cluster module is used to complain text data to carry out clustering processing the history using clustering algorithm;
The mark processing module is used to complain text data to be labeled as together the history for belonging to same cluster result
The one complaint classification.
Preferably, the forecasting system further includes preprocessing module;
The preprocessing module is used for that treated that the history complains text data to pre-process to mark.
Preferably, the target complains classification to obtain module for determining corresponding complaint classification when the probability value maximum
The target belonging to text data is complained to complain classification for the target.
Preferably, the forecasting system further includes that language model obtains module;
The language model obtains module and is used to complain text data to carry out pre-training the history using BERT algorithm
Obtain language model;
The prediction model establishes module for using BERT algorithm to complain text data, history dimension with the history
Degree evidence and the history solid data complain the corresponding history of text data to complain classification as input with the history
As output, it is based on the language model, is used for when training by way of covering the part solid data at random to establish
The prediction model of complaint classification belonging to text data is complained in prediction.
The present invention also provides a kind of electronic equipment, including memory, processor and storage on a memory and can handled
The computer program run on device, which is characterized in that the processor realizes above-mentioned OTA platform when executing computer program
Complain the prediction technique of text categories.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer journey
The step of prediction technique of the complaint text categories of above-mentioned OTA platform is realized when sequence is executed by processor.
The positive effect of the present invention is that:
In the present invention, language model is obtained by pre-training, improved BERT algorithm is then used to complain text with history
Data, history dimension data and history solid data complain classification as output as input, using history, are based on the language model
Establish prediction model;Obtaining target using prediction model complains text data to belong to the probability value of every kind of complaint classification, and selects
The highest complaint classification of probability value complains target belonging to text data to complain classification as target, improves the essence of prediction model
Degree improves the accuracy of text classification, and realization automatically sorts out customer complaint content, and relevant persons in charge is existed
The complaint classification being responsible at the first time oneself is handled, and also saves a large amount of people while improving user experience
Power, to improve whole work efficiency.
Detailed description of the invention
Fig. 1 is the flow chart of the prediction technique of the complaint text categories of the OTA platform of the embodiment of the present invention 1.
Fig. 2 is the flow chart of the prediction technique of the complaint text categories of the OTA platform of the embodiment of the present invention 2.
Fig. 3 is the module diagram of the forecasting system of the complaint text categories of the OTA platform of the embodiment of the present invention 3.
Fig. 4 is the module diagram of the forecasting system of the complaint text categories of the OTA platform of the embodiment of the present invention 4.
Fig. 5 is the electronic equipment of the prediction technique of the complaint text categories of the realization OTA platform in the embodiment of the present invention 5
Structural schematic diagram.
Specific embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to the reality
It applies among a range.
Embodiment 1
As shown in Figure 1, the prediction technique of the complaint text categories of the OTA platform of the present embodiment includes:
S101, OTA platform corresponding history complaint text data in history set period of time is obtained;
S102, it complains text data to be labeled processing history, obtains every part of history and complain the corresponding throwing of text data
Tell classification;
S103, history dimension data corresponding with history complaint text data and history solid data in OTA platform are obtained;
Wherein, history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
History solid data is the data for characterizing the proper noun in hotel field;
Specifically, history dimension data includes order information, hotel information and user information etc., and wherein order information includes
But it is not limited to the corresponding means of payment of order, conclusion of the business state, order type;Hotel information includes but is not limited to HOTEL FACILITIES equipment
Title, information of real estate etc.;User information includes but is not limited to user's name, gender etc..
History solid data includes the proper noun in hotel field, such as advance payment, big double bed, is dodged.By history entity
Data are stored with the format of dictionary.
S104, it complains text data, history dimension data and history solid data as input using history, is complained with history
The corresponding history of text data complains classification as output, establishes for predicting to complain the pre- of complaint classification belonging to text data
Survey model;
S105, target complaint text data is obtained;
S106, target is complained into text data input prediction model, obtains target and text data is complained to belong to every kind of complaint
The probability value of classification;
S107, determine that target complains target belonging to text data to complain classification according to probability value.
Data screening is all made of the mode of stochastical sampling to guarantee the same distribution of data in the present embodiment.
In the present embodiment, complain text data, history dimension data and history solid data as input using history, to go through
History complains classification to establish prediction model as output;Obtaining target using prediction model complains text data to belong to every kind of complaint class
Other probability value, and the highest complaint classification of select probability value complains target belonging to text data to complain classification as target,
The precision of text classification is improved, realization automatically sorts out customer complaint content, enables relevant persons in charge first
The complaint classification that time is responsible for oneself is handled, and also saves a large amount of manpower while improving user experience.
Embodiment 2
As shown in Fig. 2, the prediction techniques of the complaint text categories of the OTA platform of the present embodiment is to embodiment 1 into one
Step is improved, specifically:
After step S101, before step S102 further include:
S1020, text data is complained to carry out clustering processing history using clustering algorithm;
Wherein, clustering algorithm includes but is not limited to K-MEANS clustering algorithm (k means clustering algorithm), DBSCAN cluster calculation
Method (a kind of density-based algorithms), mean shift clustering algorithm, hierarchical clustering algorithm and synthesis cluster.
Step S102 includes:
S1021, the history for belonging to same cluster result complaint text data is labeled as same complaint classification.
Specifically, during mark, relevant staff is selected in conjunction with business demand and cluster result, history is thrown
Tell that text data is labeled.
After step S103, before step S104 further include:
S1040, text data is complained to pre-process mark treated history.
Specifically, pretreatment include but is not limited to by full-shape be converted to half-angle, by traditional font be converted to it is simplified, will capitalization convert
For small letter, remove stop words and low-frequency word, filtering null value, filtering sensitive word.
In addition, before step S101 further include:
S1010, text data is complained to carry out pre-training acquisition language model history using BERT algorithm.
Step S104 includes:
S1041, BERT algorithm is used to complain text data, history dimension data and history solid data as defeated using history
Enter, complains the corresponding history of text data to complain classification as output using history, language model is based on, by covering at random when training
The mode of cover solid data come establish for predict complain text data belonging to complaint classification prediction model.
I.e. by being finely adjusted on the basis of the language model obtained with training, may be implemented compared to training from the beginning
Faster convergence, while phase can be reached using the labeled data of less data volume, dimension and solid data in classification layer
To better nicety of grading and effect;Word is specifically replaced with by phase by matching entities dictionary at mask (random to cover)
The solid data answered can prevent label from revealing in this way, can establish the higher prediction model of precision.
In view of artificial labeled data can there is a certain error, text data can be complained to use history and be predicted
Model predicted, then the history by most probable value in the section 0.5-0.7 complain the corresponding complaint classification of text data into
Pedestrian's work marks again, re -training model, until most probable value, which is greater than 0.7, stops repetitive exercise, guarantees to predict with this
The precision of model.
Step S107 is specifically included:
S1071, determine that corresponding complaint classification is that target complains target belonging to text data to complain when probability value maximum
Classification.
In the present embodiment, language model is obtained by pre-training, improved BERT algorithm is then used to complain text with history
Notebook data, history dimension data and history solid data complain classification as output as input, using history, are based on the language mould
Type establishes prediction model;Obtaining target using prediction model complains text data to belong to the probability value of every kind of complaint classification, and selects
The highest complaint classification of probability value, which is selected, as target complains target belonging to text data to complain classification, improves text classification
Precision, realization automatically sort out customer complaint content, and relevant persons in charge is responsible for oneself in first time
Complaint classification handled, also save a large amount of manpower while improving user experience, improve treatment effeciency.
Embodiment 3
As shown in figure 3, the forecasting system of the complaint text categories of the OTA platform of the present embodiment includes that history text data obtain
Modulus block 1, mark processing module 2, dimension and solid data obtain module 3, survey model building module 4, target text data and obtain
Modulus block 5, probability value obtain module 6 and target complains classification to obtain module 7.
History text data acquisition module 1 is complained for obtaining OTA platform corresponding history in history set period of time
Text data;
Mark processing module 2 is used to complain text data to be labeled processing history, obtains every part of history and complains text
The corresponding complaint classification of data;
Dimension and solid data obtain module 3 for obtaining history corresponding with history complaint text data in OTA platform
Dimension data and history solid data;
Wherein, history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
History solid data is the data for characterizing the proper noun in hotel field;
Specifically, history dimension data includes order information, hotel information and user information etc., and wherein order information includes
But it is not limited to the corresponding means of payment of order, conclusion of the business state, order type;Hotel information includes but is not limited to HOTEL FACILITIES equipment
Title, information of real estate etc.;User information includes but is not limited to user's name, gender etc..
History solid data includes the proper noun in hotel field, such as advance payment, big double bed, is dodged.By history entity
Data are stored with the format of dictionary.
Prediction model establishes module 4 for complaining text data, history dimension data and history solid data to make with history
For input, complains the corresponding history of text data to complain classification as output using history, establish for predicting to complain text data
The prediction model of affiliated complaint classification;
Target text data acquisition module 5 complains text data for obtaining target;
Probability value obtains module 6 and is used to target complaining text data input prediction model, obtains target and complains textual data
According to the probability value for belonging to every kind of complaint classification;
Target complains classification to obtain module 7 and is used to determine that target complains target belonging to text data to complain according to probability value
Classification.
Data screening is all made of the mode of stochastical sampling to guarantee the same distribution of data in the present embodiment.
In the present embodiment, complain text data, history dimension data and history solid data as input using history, to go through
History complains classification to establish prediction model as output;Obtaining target using prediction model complains text data to belong to every kind of complaint class
Other probability value, and the highest complaint classification of select probability value complains target belonging to text data to complain classification as target,
The precision of text classification is improved, realization automatically sorts out customer complaint content, enables relevant persons in charge first
The complaint classification that time is responsible for oneself is handled, and also saves a large amount of manpower while improving user experience.
Embodiment 4
As shown in figure 4, the forecasting systems of the complaint text categories of the OTA platform of the present embodiment is to embodiment 3 into one
Step is improved, specifically:
Forecasting system further includes cluster module 8;
Cluster module 8 is used to complain text data to carry out clustering processing history using clustering algorithm;
Wherein, clustering algorithm includes but is not limited to K-MEANS clustering algorithm, DBSCAN clustering algorithm, mean shift cluster
Algorithm, hierarchical clustering algorithm and synthesis cluster.
Mark processing module 2 is used to complain text data to be labeled as same complaint class the history for belonging to same cluster result
Not.
Specifically, during mark, relevant staff is selected in conjunction with business demand and cluster result, history is thrown
Tell that text data is labeled.
Forecasting system further includes preprocessing module 9;
Preprocessing module 9 is used to complain text data to pre-process mark treated history.
Specifically, pretreatment include but is not limited to by full-shape be converted to half-angle, by traditional font be converted to it is simplified, will capitalization convert
For small letter, remove stop words and low-frequency word, filtering null value, filtering sensitive word.
Specifically, forecasting system further includes that language model obtains module 10;
Language model obtains module 10 and is used to complain text data progress pre-training to obtain language history using BERT algorithm
Say model;
Prediction model establishes module 4 for using BERT algorithm to complain text data, history dimension data with history and go through
Historical facts volume data complains the corresponding history of text data to complain classification as output as input, using history, is based on language model,
It is established by way of covering part solid data at random when training for predicting to complain complaint classification belonging to text data
Prediction model.
I.e. by being finely adjusted on the basis of the language model obtained with training, may be implemented compared to training from the beginning
Faster convergence, while phase can be reached using the labeled data of less data volume, dimension and solid data in classification layer
To better nicety of grading and effect;Word is specifically replaced with by corresponding entity number by matching entities dictionary in mask
According to can prevent label from revealing in this way, can establish the higher prediction model of precision.
In view of artificial labeled data can there is a certain error, text data can be complained to use history and be predicted
Model predicted, then the history by most probable value in the section 0.5-0.7 complain the corresponding complaint classification of text data into
Pedestrian's work marks again, re -training model, until most probable value, which is greater than 0.7, stops repetitive exercise, guarantees to predict with this
The precision of model.
Target complains classification to obtain module 7, and for determining, corresponding complaint classification is that target complains text when probability value maximum
Target belonging to data complains classification.
In the present embodiment, language model is obtained by pre-training, improved BERT algorithm is then used to complain text with history
Notebook data, history dimension data and history solid data complain classification as output as input, using history, are based on the language mould
Type establishes prediction model;Obtaining target using prediction model complains text data to belong to the probability value of every kind of complaint classification, and selects
The highest complaint classification of probability value, which is selected, as target complains target belonging to text data to complain classification, improves text classification
Precision, realization automatically sort out customer complaint content, and relevant persons in charge is responsible for oneself in first time
Complaint classification handled, also save a large amount of manpower while improving user experience.
Embodiment 5
Fig. 5 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention 5 provides.Electronic equipment include memory,
Processor and storage are on a memory and the computer program that can run on a processor, processor realize implementation when executing program
The prediction technique of the complaint text categories of OTA platform in example 1 or 2 in any one embodiment.The electronic equipment 30 that Fig. 5 is shown is only
Only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, electronic equipment 30 can be showed in the form of universal computing device, such as it can set for server
It is standby.The component of electronic equipment 30 can include but is not limited to: at least one above-mentioned processor 31, above-mentioned at least one processor
32, the bus 33 of different system components (including memory 32 and processor 31) is connected.
Bus 33 includes data/address bus, address bus and control bus.
Memory 32 may include volatile memory, such as random access memory (RAM) 321 and/or cache
Memory 322 can further include read-only memory (ROM) 323.
Memory 32 can also include program/utility 325 with one group of (at least one) program module 324, this
The program module 324 of sample includes but is not limited to: operating system, one or more application program, other program modules and journey
It may include the realization of network environment in ordinal number evidence, each of these examples or certain combination.
Processor 31 by operation storage computer program in memory 32, thereby executing various function application and
The prediction side of the complaint text categories of OTA platform in data processing, such as the embodiment of the present invention 1 or 2 in any one embodiment
Method.
Electronic equipment 30 can also be communicated with one or more external equipments 34 (such as keyboard, sensing equipment etc.).It is this
Communication can be carried out by input/output (I/O) interface 35.Also, the equipment 30 that model generates can also pass through Network adaptation
Device 36 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) logical
Letter.As shown in figure 5, the other modules for the equipment 30 that network adapter 36 is generated by bus 33 and model communicate.It should be understood that
Although not shown in the drawings, the equipment 30 that can be generated with binding model uses other hardware and/or software module, including but unlimited
In: microcode, device driver, redundant processor, external disk drive array, RAID (disk array) system, magnetic tape drive
Device and data backup storage system etc..
It should be noted that although being referred to several units/modules or subelement/mould of electronic equipment in the above detailed description
Block, but it is this division be only exemplary it is not enforceable.In fact, embodiment according to the present invention, is retouched above
The feature and function for two or more units/modules stated can embody in a units/modules.Conversely, above description
A units/modules feature and function can with further division be embodied by multiple units/modules.
Embodiment 6
A kind of computer readable storage medium is present embodiments provided, computer program is stored thereon with, program is processed
The step in the prediction technique of the complaint text categories of the OTA platform in embodiment 1 or 2 in any one embodiment is realized when device executes
Suddenly.
Wherein, what readable storage medium storing program for executing can use more specifically can include but is not limited to: portable disc, hard disk, random
Access memory, read-only memory, erasable programmable read only memory, light storage device, magnetic memory device or above-mentioned times
The suitable combination of meaning.
In possible embodiment, the present invention is also implemented as a kind of form of program product comprising program generation
Code, when program product is run on the terminal device, program code is appointed for executing terminal device in realization embodiment 1 or 2
Step in the prediction technique of the complaint text categories of OTA platform in an embodiment of anticipating.
Wherein it is possible to be write with any combination of one or more programming languages for executing program of the invention
Code, program code can be executed fully on a user device, partly execute on a user device, is independent as one
Software package executes, part executes on a remote device or executes on a remote device completely on a user device for part.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that this is only
For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from
Under the premise of the principle and substance of the present invention, many changes and modifications may be made, but these change and
Modification each falls within protection scope of the present invention.
Claims (12)
1. a kind of prediction technique of the complaint text categories of OTA platform, which is characterized in that the prediction technique includes:
It obtains OTA platform corresponding history in history set period of time and complains text data;
It complains text data to be labeled processing the history, obtains every part of history and complain the corresponding complaint of text data
Classification;
Obtain history dimension data corresponding with history complaint text data and history solid data in the OTA platform;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
Complain text data, the history dimension data and the history solid data as input using the history, with described
History complains the corresponding history of text data to complain classification as output, establishes for predicting to complain belonging to text data
Complain the prediction model of classification;
It obtains target and complains text data;
It complains text data to input the prediction model target, obtains the target and text data is complained to belong to every kind of throwing
Tell the probability value of classification;
Determine that the target complains target belonging to text data to complain classification according to the probability value.
2. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that the acquisition OTA
After platform the step of corresponding history complains text data in history set period of time, text data is complained to the history
Before the step of being labeled processing further include:
Text data is complained to carry out clustering processing the history using clustering algorithm;
It is described to complain text data to be labeled processing the history, it obtains every part of history and complains text data corresponding
Complain classification the step of include:
The history for belonging to same cluster result complaint text data is labeled as the same complaint classification.
3. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that described with described
History complains text data, the history dimension data and the history solid data as input, complains text with the history
The corresponding history of notebook data complains classification as output, establishes for predicting to complain complaint classification belonging to text data
Before the step of prediction model further include:
To mark, treated that the history complains text data to pre-process.
4. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that described according to institute
It states probability value and determines that the step of target complains target belonging to text data to complain classification includes:
Determine that corresponding complaint classification is that the target complains the target belonging to text data to throw when the probability value maximum
Tell classification.
5. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that the acquisition OTA
Before platform the step of corresponding history complains text data in history set period of time further include:
It complains text data to carry out pre-training the history using BERT algorithm and obtains language model;
It is described to complain text data, the history dimension data and the history solid data as inputting using the history, with
The history complains the corresponding history of text data to complain classification as output, establishes for predicting to complain text data institute
The step of prediction model of the complaint classification of category includes:
Use BERT algorithm using the history complain text data, the history dimension data and the history solid data as
Input complains the corresponding history of text data to complain classification as output, is based on the language model, instruction using the history
It is established by way of covering the part solid data at random when practicing for predicting to complain complaint class belonging to text data
Other prediction model.
6. a kind of forecasting system of the complaint text categories of OTA platform, which is characterized in that the forecasting system includes history text
Data acquisition module, mark processing module, dimension and solid data obtain module, survey model building module, target text data
Obtain module, probability value obtains module and target complains classification to obtain module;
The history text data acquisition module complains text for obtaining OTA platform corresponding history in history set period of time
Notebook data;
The mark processing module is used to complain text data to be labeled processing the history, obtains every part of history and throws
Tell the corresponding complaint classification of text data;
The dimension and solid data obtain module and complain text data corresponding with the history for obtaining in the OTA platform
History dimension data and history solid data;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
The prediction model establishes module for complaining text data, the history dimension data and the history with the history
Solid data complains the corresponding history of text data to complain classification as output as input, using the history, establishes and uses
The prediction model of complaint classification belonging to text data is complained in prediction;
The target text data acquisition module complains text data for obtaining target;
The probability value obtains module and is used to complain text data to input the prediction model target, obtains the target
Text data is complained to belong to the probability value of every kind of complaint classification;
The target complains classification to obtain module and is used to determine that the target is complained belonging to text data according to the probability value
Target complains classification.
7. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the prediction system
System further includes cluster module;
The cluster module is used to complain text data to carry out clustering processing the history using clustering algorithm;
The mark processing module is used to complain text data to be labeled as same institute the history for belonging to same cluster result
State complaint classification.
8. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the prediction system
System further includes preprocessing module;
The preprocessing module is used for that treated that the history complains text data to pre-process to mark.
9. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the target is thrown
Tell that classification obtains module and is used to determine that corresponding complaint classification to be that the target complains text data institute when the probability value maximum
The target belonged to complains classification.
10. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the prediction system
System further includes that language model obtains module;
The language model obtains module and is used to complain text data to carry out pre-training acquisition the history using BERT algorithm
Language model;
The prediction model establishes module for using BERT algorithm to complain text data, the history number of dimensions with the history
According to the history solid data as inputting, using the history complain the corresponding history of text data complain classification as
Output is based on the language model, is established by way of covering the part solid data at random for predicting when training
Complain the prediction model of complaint classification belonging to text data.
11. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine program, which is characterized in that the processor realizes OTA of any of claims 1-5 when executing computer program
The prediction technique of the complaint text categories of platform.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The prediction technique of the complaint text categories of OTA platform of any of claims 1-5 is realized when being executed by processor
Step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650261.0A CN110347840B (en) | 2019-07-18 | 2019-07-18 | Prediction method, system, equipment and storage medium for complaint text category |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650261.0A CN110347840B (en) | 2019-07-18 | 2019-07-18 | Prediction method, system, equipment and storage medium for complaint text category |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110347840A true CN110347840A (en) | 2019-10-18 |
CN110347840B CN110347840B (en) | 2023-06-13 |
Family
ID=68178920
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910650261.0A Active CN110347840B (en) | 2019-07-18 | 2019-07-18 | Prediction method, system, equipment and storage medium for complaint text category |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110347840B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110930022A (en) * | 2019-11-20 | 2020-03-27 | 携程计算机技术(上海)有限公司 | Hotel static information detection method and system, electronic equipment and storage medium |
CN111192160A (en) * | 2019-12-17 | 2020-05-22 | 山大地纬软件股份有限公司 | Power public opinion monitoring method and system based on multi-fractal optimization |
CN111553817A (en) * | 2020-04-24 | 2020-08-18 | 北京北大软件工程股份有限公司 | Analysis method and system for goodness of fit of complaint reporting case and treatment department |
CN112052994A (en) * | 2020-08-28 | 2020-12-08 | 中信银行股份有限公司 | Customer complaint upgrade prediction method and device and electronic equipment |
CN112288446A (en) * | 2020-10-28 | 2021-01-29 | 中国联合网络通信集团有限公司 | Method and device for calculating complaint and claim |
CN112925911A (en) * | 2021-02-25 | 2021-06-08 | 平安普惠企业管理有限公司 | Complaint classification method based on multi-modal data and related equipment thereof |
CN113704407A (en) * | 2021-08-30 | 2021-11-26 | 平安银行股份有限公司 | Complaint amount analysis method, device, equipment and storage medium based on category analysis |
CN113810212A (en) * | 2020-06-15 | 2021-12-17 | ***通信集团浙江有限公司 | Root cause positioning method and device for 5G slice user complaints |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309948A (en) * | 2013-05-20 | 2013-09-18 | 携程计算机技术(上海)有限公司 | System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center |
CN107844559A (en) * | 2017-10-31 | 2018-03-27 | 国信优易数据有限公司 | A kind of file classifying method, device and electronic equipment |
US20180203754A1 (en) * | 2017-01-17 | 2018-07-19 | Bank Of America Corporation | Individualized Channel Error Detection and Resolution |
CN108573031A (en) * | 2018-03-26 | 2018-09-25 | 上海万行信息科技有限公司 | A kind of complaint sorting technique and system based on content |
CN109492091A (en) * | 2018-09-28 | 2019-03-19 | 科大国创软件股份有限公司 | A kind of complaint work order intelligent method for classifying based on convolutional neural networks |
CN109670843A (en) * | 2018-11-12 | 2019-04-23 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and the storage medium of complaint business |
CN109684475A (en) * | 2018-11-21 | 2019-04-26 | 斑马网络技术有限公司 | Processing method, device, equipment and the storage medium of complaint |
CN109726290A (en) * | 2018-12-29 | 2019-05-07 | 咪咕数字传媒有限公司 | Complaint classification model determination method and device and computer-readable storage medium |
CN109816399A (en) * | 2019-01-07 | 2019-05-28 | 平安科技(深圳)有限公司 | Complain management method, device, computer equipment and the storage medium of part |
CN109858702A (en) * | 2019-02-14 | 2019-06-07 | 中国联合网络通信集团有限公司 | Client upgrades prediction technique, device, equipment and the readable storage medium storing program for executing complained |
CN109918501A (en) * | 2019-01-18 | 2019-06-21 | 平安科技(深圳)有限公司 | Method, apparatus, equipment and the storage medium of news article classification |
CN109982367A (en) * | 2017-12-28 | 2019-07-05 | ***通信集团四川有限公司 | Mobile terminal Internet access customer complaint prediction technique, device, equipment and storage medium |
-
2019
- 2019-07-18 CN CN201910650261.0A patent/CN110347840B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309948A (en) * | 2013-05-20 | 2013-09-18 | 携程计算机技术(上海)有限公司 | System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center |
US20180203754A1 (en) * | 2017-01-17 | 2018-07-19 | Bank Of America Corporation | Individualized Channel Error Detection and Resolution |
CN107844559A (en) * | 2017-10-31 | 2018-03-27 | 国信优易数据有限公司 | A kind of file classifying method, device and electronic equipment |
CN109982367A (en) * | 2017-12-28 | 2019-07-05 | ***通信集团四川有限公司 | Mobile terminal Internet access customer complaint prediction technique, device, equipment and storage medium |
CN108573031A (en) * | 2018-03-26 | 2018-09-25 | 上海万行信息科技有限公司 | A kind of complaint sorting technique and system based on content |
CN109492091A (en) * | 2018-09-28 | 2019-03-19 | 科大国创软件股份有限公司 | A kind of complaint work order intelligent method for classifying based on convolutional neural networks |
CN109670843A (en) * | 2018-11-12 | 2019-04-23 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and the storage medium of complaint business |
CN109684475A (en) * | 2018-11-21 | 2019-04-26 | 斑马网络技术有限公司 | Processing method, device, equipment and the storage medium of complaint |
CN109726290A (en) * | 2018-12-29 | 2019-05-07 | 咪咕数字传媒有限公司 | Complaint classification model determination method and device and computer-readable storage medium |
CN109816399A (en) * | 2019-01-07 | 2019-05-28 | 平安科技(深圳)有限公司 | Complain management method, device, computer equipment and the storage medium of part |
CN109918501A (en) * | 2019-01-18 | 2019-06-21 | 平安科技(深圳)有限公司 | Method, apparatus, equipment and the storage medium of news article classification |
CN109858702A (en) * | 2019-02-14 | 2019-06-07 | 中国联合网络通信集团有限公司 | Client upgrades prediction technique, device, equipment and the readable storage medium storing program for executing complained |
Non-Patent Citations (2)
Title |
---|
WENJING DUAN ET AL.: "Mining Online User-Generated Content: Using Sentiment Analysis Technique to Study Hotel Service Quality" * |
唐雪薇: "旅游网络口碑信息特征对出游意向的影响" * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110930022A (en) * | 2019-11-20 | 2020-03-27 | 携程计算机技术(上海)有限公司 | Hotel static information detection method and system, electronic equipment and storage medium |
CN111192160A (en) * | 2019-12-17 | 2020-05-22 | 山大地纬软件股份有限公司 | Power public opinion monitoring method and system based on multi-fractal optimization |
CN111553817A (en) * | 2020-04-24 | 2020-08-18 | 北京北大软件工程股份有限公司 | Analysis method and system for goodness of fit of complaint reporting case and treatment department |
CN113810212A (en) * | 2020-06-15 | 2021-12-17 | ***通信集团浙江有限公司 | Root cause positioning method and device for 5G slice user complaints |
CN112052994A (en) * | 2020-08-28 | 2020-12-08 | 中信银行股份有限公司 | Customer complaint upgrade prediction method and device and electronic equipment |
CN112288446A (en) * | 2020-10-28 | 2021-01-29 | 中国联合网络通信集团有限公司 | Method and device for calculating complaint and claim |
CN112288446B (en) * | 2020-10-28 | 2023-06-06 | 中国联合网络通信集团有限公司 | Calculation method and device for complaint and claim payment |
CN112925911A (en) * | 2021-02-25 | 2021-06-08 | 平安普惠企业管理有限公司 | Complaint classification method based on multi-modal data and related equipment thereof |
CN112925911B (en) * | 2021-02-25 | 2022-08-12 | 平安普惠企业管理有限公司 | Complaint classification method based on multi-modal data and related equipment thereof |
CN113704407A (en) * | 2021-08-30 | 2021-11-26 | 平安银行股份有限公司 | Complaint amount analysis method, device, equipment and storage medium based on category analysis |
CN113704407B (en) * | 2021-08-30 | 2023-08-25 | 平安银行股份有限公司 | Complaint volume analysis method, device, equipment and storage medium based on category analysis |
Also Published As
Publication number | Publication date |
---|---|
CN110347840B (en) | 2023-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110347840A (en) | Complain prediction technique, system, equipment and the storage medium of text categories | |
CN112015859B (en) | Knowledge hierarchy extraction method and device for text, computer equipment and readable medium | |
US11868733B2 (en) | Creating a knowledge graph based on text-based knowledge corpora | |
Casamayor et al. | Identification of non-functional requirements in textual specifications: A semi-supervised learning approach | |
CN110633366B (en) | Short text classification method, device and storage medium | |
US11901047B2 (en) | Medical visual question answering | |
CN110781294A (en) | Training corpus refinement and incremental update | |
CN110263157B (en) | Data risk prediction method, device and equipment | |
US20220100772A1 (en) | Context-sensitive linking of entities to private databases | |
CN109299271A (en) | Training sample generation, text data, public sentiment event category method and relevant device | |
CN112989761B (en) | Text classification method and device | |
CN112270546A (en) | Risk prediction method and device based on stacking algorithm and electronic equipment | |
US20220100967A1 (en) | Lifecycle management for customized natural language processing | |
CN113435998B (en) | Loan overdue prediction method and device, electronic equipment and storage medium | |
Noguti et al. | Legal document classification: An application to law area prediction of petitions to public prosecution service | |
US20230092274A1 (en) | Training example generation to create new intents for chatbots | |
CN111179055B (en) | Credit line adjusting method and device and electronic equipment | |
CN113704389A (en) | Data evaluation method and device, computer equipment and storage medium | |
CN112685374B (en) | Log classification method and device and electronic equipment | |
CN111210332A (en) | Method and device for generating post-loan management strategy and electronic equipment | |
US20200173889A1 (en) | Component testing plan considering distinguishable and undistinguishable components | |
CN110782128B (en) | User occupation label generation method and device and electronic equipment | |
CN116402166A (en) | Training method and device of prediction model, electronic equipment and storage medium | |
CN111159370A (en) | Short-session new problem generation method, storage medium and man-machine interaction device | |
US20230222150A1 (en) | Cognitive recognition and reproduction of structure graphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |