CN107688604A - Data answering processing method, device and server - Google Patents

Data answering processing method, device and server Download PDF

Info

Publication number
CN107688604A
CN107688604A CN201710616525.1A CN201710616525A CN107688604A CN 107688604 A CN107688604 A CN 107688604A CN 201710616525 A CN201710616525 A CN 201710616525A CN 107688604 A CN107688604 A CN 107688604A
Authority
CN
China
Prior art keywords
data
term vector
index
vector
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710616525.1A
Other languages
Chinese (zh)
Inventor
陈召群
崔恒斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710616525.1A priority Critical patent/CN107688604A/en
Publication of CN107688604A publication Critical patent/CN107688604A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This specification embodiment provides a kind of data answering processing method, device and server.This method includes:Obtain the enquirement data of user;The term vector of the enquirement data is determined based on default term vector set;The matching degree between the term vector and the index vector of predetermined number is calculated, the reply data corresponding to the index vector minimum with the term vector matching degree is fed back into targeted customer.

Description

Data answering processing method, device and server
Technical field
This specification embodiment is related to Internet technical field, more particularly to a kind of data answering processing method, device and Server.
Background technology
Affairs in Internet era, people's daily life are usually handled in internet.User is in internet industry During business system uses product or service, the demand that counseling problem, inquiry business etc. obtain data is often had.Business System is often by way of automating response, to meet the needs of obtaining data during user's use.
Automation data reply process typically is realized by the way of keyword identification matching in the prior art, specifically , it can include pre-setting keyword recognition combination rule to identify and combine the keyword putd question in data, then, establish Put question to mapping relations of one or more of the data between keyword and reply data;Then, the mapping relations based on foundation Match keyword and corresponding reply data in the enquirement data of user's input;Finally, the reply data that will match to feeds back to User, to reach the purpose for providing the user demand data.For example, the mapping relations established include keyword " thanks " and response Data " unfriendly " are corresponding, accordingly, when user puts question to data to include keyword " thanks ", it is possible to match response Data " unfriendly ", reply data " unfriendly " is fed back into user.But in actual applications, Chinese synonym is a lot, And multiple crucial phrases into sentence can express the different meanings.Therefore, in the prior art the keyword based on foundation with should There is the problem of the match is successful rate is low in the method that mapping relations of the answer between carry out keyword identification matching, it is desirable to provide more Reliable scheme.
The content of the invention
The purpose of this specification embodiment is to provide a kind of data answering processing method, device and server, can improve Recognition success rate to puing question to data, fast and accurately provide the user the data of demand.
This specification embodiment is realized in:
A kind of data answering processing method, including:
Obtain object question data;
The target term vector of the object question data is determined based on default term vector set;
Calculate the matching degree between the index vector of the target term vector and predetermined number, will with the target word to Reply data corresponding to the minimum index vector of flux matched degree feeds back to targeted customer.
A kind of data answering processing unit, including:
Object question data acquisition module, for obtaining object question data;
Target term vector determining module, for determining the target word of the object question data based on default term vector set Vector;
Matching degree computing module, the matching between index vector for calculating the target term vector and predetermined number Degree;
Reply data feedback module, for by corresponding to the index vector minimum with the target word Vectors matching degree Reply data feeds back to targeted customer.
A kind of data answering processing server, including processor and memory, the memory storage is by the processor The computer program instructions of execution, the computer program instructions include:
Obtain object question data;
The target term vector of the object question data is determined based on default term vector set;
Calculate the matching degree between the index vector of the target term vector and predetermined number, will with the target word to Reply data corresponding to the minimum index vector of flux matched degree feeds back to targeted customer.
As seen from the above, this specification one or more embodiment is by the way that object question data and index data are converted into Matched after the corresponding term vector comprising phrase semantic, because the vector distance between the term vector of synonym is nearer, protected Demonstrate,prove in matching process, can mutually be replaced with less error, meanwhile, this specification embodiment uses whole object question The matching of the term vector of data and index data, the semanteme of all words in object question data and index data is contained, protected The semanteme of object question data and index data can accurately be represented by having demonstrate,proved term vector, and then can improve the knowledge to puing question to data Other success rate, fast and accurately provide the user the data of demand.
Brief description of the drawings
In order to illustrate more clearly of this specification embodiment or technical scheme of the prior art, below will to embodiment or The required accompanying drawing used is briefly described in description of the prior art, it should be apparent that, drawings in the following description are only Some embodiments described in this specification, for those of ordinary skill in the art, before creative work is not paid Put, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of the embodiment for the data answering processing method that specification provides;
Fig. 2 is a kind of schematic diagram of embodiment of the term vector model training that this specification provides and application;
Fig. 3 is the target term vector that the object question data are determined based on default term vector set that this specification provides A kind of embodiment schematic flow sheet;
Fig. 4 be this specification provide by corresponding to the index vector minimum with the target word Vectors matching degree should Answer evidence feeds back to a kind of schematic flow sheet of embodiment of targeted customer;
Fig. 5 is a kind of structural representation of the embodiment for the data answering processing unit that this specification provides.
Embodiment
This specification embodiment provides a kind of data answering processing method, device and server.
In order that those skilled in the art more fully understand the technical scheme in this specification, below in conjunction with this explanation Accompanying drawing in book embodiment, the technical scheme in this specification embodiment is clearly and completely described, it is clear that described Embodiment be only this specification part of the embodiment, rather than whole embodiment.Based on the embodiment in this specification, The every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, should all belong to The scope of this specification protection.
But in practical application, because the synonym of Chinese is a lot, and in view of the scene of specific area, expansible is same Adopted word is more.By taking " purchase " in " how buying fund " of financial field as an example, under the linguistic context, synonym include but It is not limited to " applying to purchase ", " starting with ", " buying in ", " marching into the arena " etc..Multiple keywords express different semantemes in different context simultaneously, Such as it is " fund " and " what " to put question to the keyword in data " what is fund ".But at the two crucial contaminations Reason can match with index data " what is fund ", can also be with " what use fund has ", " the suitable primary financing of what fund The more complicated semantic index data matching such as person's purchase ".And " what is fund ", " fund have what with ", " what fund is fitted Close primary financier purchase " corresponding to reply data it is significantly different.Therefore, matching is identified using keyword to realize that data should Processing is answered, usually because the synonym extension of keyword is inadequate, semantic coverage of the keyword recognition combination rule to keyword Deficiency etc., there is the problem of the match is successful rate is low.It is considered that general each word has multiple semantemes, word here with can be with Including one or more words.Such as " coffee ", color coffee color can be represented, coffee can also be represented, also may indicate that curry etc.. Based on this, this specification embodiment introduces " term vector " for the semantic information that can directly reflect word, specifically, " term vector " It can show that the real number of semantic association degree between word and word is vectorial including the use of K dimension tables, the semantic association between two words Degree is higher, and distance is nearer between the term vector of corresponding two words.Such as word " mobile phone " and " computer " are because being electronics production Product, the semantic association degree that semantic association degree is higher than between word " mobile phone " and " road " be present.Accordingly, " mobile phone " and " electricity The distance between two term vectors corresponding to brain " are than the distance between two term vectors corresponding to " mobile phone " and " road " Closer to.Therefore, data and index data can will be putd question to be converted into corresponding term vector, is then based on the distance between term vector Judgement realize the matching putd question between data and index data, finally, the reply data that will match to feeds back to user, with up to To the purpose for providing the user demand data.
A kind of specific embodiment of data answering processing method of specification introduced below.Fig. 1 is the data that specification provides A kind of schematic flow sheet of embodiment of response process method, present description provides the method as described in embodiment or flow chart Operating procedure, but either can include more or less operating procedures without performing creative labour based on conventional.In embodiment The step of enumerating order is only a kind of mode in numerous step execution sequences, does not represent unique execution sequence.In reality In system or client production when performing, can either method shown in the drawings order be performed or held parallel according to embodiment Row (such as environment of parallel processor or multiple threads).Specifically as shown in figure 1, methods described can include:
S102:Obtain object question data.
What object question data described in this specification embodiment can input including user at preset search interface waits to obtain The mark data for evidence of fetching.Specifically, for example, when user is needed in a certain data of internet hunt, often input can Enquirement data as the mark of the data.Accordingly, Internet service system can be that business datum setting can be used as the industry The index data of the key message mark of business data provides the user demand data to be matched with enquirement data to realize Purpose.In general, index data can be the form such as data the problem of in title or question answering system.
S104:The target term vector of the object question data is determined based on default term vector set.
In this specification embodiment, the language of sign word obtained to presetting corpus data to be trained can be in advance based on The set (i.e. described default term vector set) of the term vector of the adopted degree of association.Specifically, in this specification embodiment preset word to The determination method of duration set can be as follows:
It is possible, firstly, to a large amount of corpus datas are gathered in advance, the default corpus data being trained as needs.
The corpus data can combine the field that term vector set needs to apply in practical application and gather corresponding field Language material.
Then, word segmentation processing is carried out to the text in the corpus data of collection, obtains the word after word segmentation processing.
Specifically, the word after word segmentation processing can be one or more word compositions.Here word segmentation processing can include more Text according to word carry out cutting, for example, " I likes Shanghai " carry out word segmentation processing after, can obtain " I ", " liking ", " on Three, sea " participle.
Then, the word after word segmentation processing is entered into Word2vector models to be trained, in the training process can will be every Individual word is mapped to K dimension real number vectors, judges the language between them by the distance between word (such as Euclidean distance etc.) The adopted degree of association.It is trained based on Word2vector models to gathering each word in a large amount of corpus datas, obtains corresponding field The default term vector set of the semantic association degree of the domanial words can be obtained characterizing while term vector model.
As shown in Fig. 2 Fig. 2 is a kind of signal of embodiment of the term vector model training that this specification provides and application Figure.It can be seen that after the term vector model for training to obtain based on default language material, subsequently, should by a new word input Term vector model, the term vector model can export the term vector of the word based on default term vector set.
In a specific embodiment, word w and word v are turned into a term vector respectivelyWith, term vectorWithThe distance between just represent word w and word v semantic association degree, distance is nearer, and semantic association degree is higher.For example, " liking ", " love " and " running " three words are changed into after term vector respectivelyWithThen:
Wherein, dist can represent to calculate the function of two vector distances.
Furthermore, it is necessary to explanation, the determination method of default term vector set described in this specification embodiment simultaneously not only limits In above-mentioned mode, other modes can also be included in actual applications.
In a specific embodiment, as shown in figure 3, Fig. 3 be this specification provide based on default term vector set A kind of schematic flow sheet of embodiment of the target term vector of the object question data is determined, specifically, can include:
S302:Word segmentation processing is carried out to the object question data, obtains multiple participles.
In being actually used in, in general, the enquirement data of user frequently include multiple words, and accordingly, target can be carried Ask that data carry out word segmentation processing, obtain multiple participles.Here word segmentation processing may refer to the description of above-mentioned word segmentation processing, herein not Repeat again.
S304:The term vector of the multiple participle is inquired about from the default term vector set.
S306:The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as institute State the target term vector of object question data.
In actual applications, it is understood that there may be the feelings of the term vector of some participle can not be inquired in default term vector set Condition, accordingly, it can directly ignore the participle, the term vector of the participle to inquiring is weighted average computation, will calculate The term vector arrived is as the target term vector.Specifically, influence of the different participles to semanteme is different in actual applications, example As " " influence in semanteme it is smaller, accordingly, it may be predetermined that the coefficient each segmented, the weight coefficient can To be pre-set according to practical situations with reference to the participle to the semantic influence degree of corresponding object question data.So The laggard row vector mean value calculation of corresponding weight coefficient will be multiplied by respectively every the term vector of each participle, be calculated afterwards Vector can be as the target term vector of the object question data.It so can more accurately reflect object question data Semanteme.Certainly, the corresponding weight coefficient of each participle can be with identical here, and can be different.
S106:The matching degree between the index vector of the target term vector and predetermined number is calculated, will be with the mesh Reply data corresponding to the minimum index vector of mark term vector matching degree feeds back to targeted customer.
Knowledge base can be pre-established in this specification embodiment, index data can be stored in the knowledge base.It is described Index data can have one-to-one mapping relations with reply data, it is preferred that a reply data has phase with multiple There are corresponding mapping relations with semantic index data.Accordingly, default mapping table can be set, by reply data with Mapping relations between index data are stored in the default mapping table, in order to follow-up, it is determined that after index data, The default mapping table can be directly inquired about to go to make corresponding reply data.Certainly, the default mapping table can To be stored in the knowledge base, other positions can also be stored in.
In a specific embodiment, the index vector of the predetermined number can include using following methods true It is fixed:
Obtain the index data in the knowledge base that pre-establishes, each index in the knowledge base pre-established described in traversal Data, performed when traveling through each index data it is following determination index datas index vector the step of:
Index data is subjected to word segmentation processing, obtains multiple participles of the index data;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the vector being calculated as the index number According to index vector;
Specifically, what is used during the index vector of above-mentioned determination index data is divided into processing, weighted average calculation The description above is may refer to, will not be repeated here.
, can be by calculating the rope of the target term vector and predetermined number respectively specifically, in this specification embodiment Draw the matching degree between term vector, using the index vector minimum with the target word Vectors matching degree as with target term vector The term vector to match, further, the term vector to index data reply data be user need data.
In a specific embodiment, as shown in figure 4, Fig. 4 be this specification provide will be with the target term vector Reply data corresponding to the minimum index vector of matching degree feeds back to a kind of schematic flow sheet of embodiment of targeted customer, has Body, it can include:
S402:It is determined that the index vector minimum with the target word Vectors matching degree.
Specifically, can be by between the target term vector and the index vector of predetermined number in this specification embodiment Matching degree regards " distance " in the one-dimensional space as, and all term vectors are regarded as to each point in one " group ", target term vector Regard the center of " group " as, the index vector of predetermined number is as other points in " group ";It is a certain in " if group " Point is nearer from " group " center, and the difference for representing index vector and target term vector corresponding to the point in current " group " is got over It is small, namely currently the matching degree of index vector and target term vector is better corresponding to the point in " group " for representative.
In a specific embodiment, matching degree between term vector can between term vector Euclidean distance, work as base It is smaller in the numerical value for the Euclidean distance that two term vectors are calculated, it can represent that the matching degree between two term vectors is better, Conversely, the numerical value for working as the Euclidean distance being calculated based on two term vectors is bigger, between two term vectors can be represented It is poorer with spending.
Certainly, the matching degree in this specification embodiment between term vector is not limited only to above-mentioned Euclidean distance, can be with Including COS distance, manhatton distance etc., the matching degree between the described two term vectors of this specification embodiment is not with above-mentioned It is limited.
S404:The default mapping table of inquiry determines the answer number corresponding to the minimum index vector of the matching degree According to.
Specifically, it be able to can be inquired about it is determined that after the index vector minimum with the target word Vectors matching degree Default mapping table determines the reply data of the index data corresponding to the minimum index vector of the matching degree.Such as the institute of table 1 Show, table 1 is a kind of example of the embodiment for the default mapping table that this specification embodiment provides.
Table 1
In table 1, Q11, Q12, Q13 are reply data A1 index datas, and index data Q11, Q12, Q13 have it is identical Semanteme, pass through here set for reply data it is multiple with semantic index datas, it is ensured that index data is to reality The coverage rate of the enquirement data of user in, the recognition capability of the enquirement data to user can be improved.
S406:The reply data is fed back into targeted customer.
In this specification embodiment, it is determined that after reply data, the reply data can be fed back to targeted customer, Meet the demand of the data acquisition of user.
As can be seen here, a kind of embodiment of data answering processing method of this specification is by by object question data and index Data conversion into being matched after the corresponding term vector comprising phrase semantic, due between the term vector of synonym to span From relatively closely, ensure in matching process, can mutually be replaced with less error, meanwhile, this specification embodiment uses whole The matching of the term vector of individual object question data and index data, contains all words in object question data and index data Semanteme, ensure that term vector can accurately represent the semanteme of object question data and index data, and then can improve to carrying The recognition success rate of data is asked, fast and accurately provides the user the data of demand.
On the other hand this specification also provides a kind of data answering processing unit, Fig. 5 is that the data that this specification provides should A kind of structural representation of embodiment of processing unit is answered, as shown in figure 5, described device 500 can include:
Object question data acquisition module 510, it can be used for obtaining object question data.
Target term vector determining module 520, it can be used for determining the object question data based on default term vector set Target term vector.
Matching degree computing module 530, can be used for calculating the target term vector and predetermined number index vector it Between matching degree.
Reply data feedback module 540, it can be used for the index vector minimum with the target word Vectors matching degree Corresponding reply data feeds back to targeted customer.
In another embodiment, the target term vector determining module 520 can include:
Word segmentation processing unit, it can be used for carrying out word segmentation processing to the object question data, obtain multiple participles;
Query unit, it can be used for from the default term vector set inquiring about the term vector of the multiple participle;
Computing unit, the term vector that can be used for the participle to inquiring are weighted average computation, by what is be calculated Target term vector of the term vector as the object question data;
Wherein, the default term vector set is included based on the sign word obtained to presetting corpus data to be trained The set of the term vector of semantic association degree.
In another embodiment, the index vector of the predetermined number can include determining using following methods:
Obtain the index data in the knowledge base that pre-establishes, each index in the knowledge base pre-established described in traversal Data, performed when traveling through each index data it is following determination index datas index vector the step of:
Index data is subjected to word segmentation processing, obtains multiple participles of the index data;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the index The index vector of data;
Wherein, the default term vector set is included based on the sign word obtained to presetting corpus data to be trained The set of the term vector of semantic association degree.
In another embodiment, the reply data feedback module 540 can include:
Index vector determining unit, be determined for the index terms minimum with the target word Vectors matching degree to Amount;
Reply data determining unit, it can be used for inquiring about the index terms that default mapping table determines the matching degree minimum Reply data corresponding to vector;
Reply data feedback unit, it can be used for the reply data feeding back to targeted customer.
In another embodiment, the matching degree can at least include one of the following:
Euclidean distance, COS distance, manhatton distance.
The above-mentioned session information processing method or device that this specification embodiment provides can be in a computer by processors Corresponding programmed instruction is performed to realize, is such as realized using the c++ language of windows operating systems at PC ends, or other are for example Realized using android, iOS system programming language in intelligent terminal, and the processing logic based on quantum computer is real Now etc..Therefore, on the other hand this specification also provides a kind of data answering processing server, including processor and memory, institute Computer program instructions of the memory storage by the computing device are stated, the computer program instructions can include:
Obtain object question data;
The target term vector of the object question data is determined based on default term vector set;
Calculate the matching degree between the index vector of the target term vector and predetermined number, will with the target word to Reply data corresponding to the minimum index vector of flux matched degree feeds back to targeted customer.
Specifically, in this specification one or more embodiment, described processor can include central processing unit (CPU), naturally it is also possible to including other single-chip microcomputers with logic processing capability, logic gates, integrated circuit etc., or its It is appropriately combined.The memory can be including nonvolatile memory etc..
In another embodiment, the target term vector that the object question data are determined based on default term vector set It can include:
Word segmentation processing is carried out to the object question data, obtains multiple participles;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the target Put question to the target term vector of data;
Wherein, the default term vector set is included based on the sign word obtained to presetting corpus data to be trained The set of the term vector of semantic association degree.
In another embodiment, the index vector of the predetermined number can include determining using following methods:
Obtain the index data in the knowledge base that pre-establishes, each index in the knowledge base pre-established described in traversal Data, performed when traveling through each index data it is following determination index datas index vector the step of:
Index data is subjected to word segmentation processing, obtains multiple participles of the index data;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the index The index vector of data;
Wherein, the default term vector set is included based on the sign word obtained to presetting corpus data to be trained The set of the term vector of semantic association degree.
It is described by answering corresponding to the index vector minimum with the target word Vectors matching degree in another embodiment Answer evidence, which feeds back to targeted customer, to be included:
It is determined that the index vector minimum with the target word Vectors matching degree;
The default mapping table of inquiry determines the reply data corresponding to the minimum index vector of the matching degree;
The reply data is fed back into targeted customer.
In another embodiment, the matching degree comprises at least one of the following:
Euclidean distance, COS distance, manhatton distance.
As can be seen here, the embodiment of a kind of data answering processing method of this specification, device or server is by by target Put question to data and index data to be matched after being converted into the corresponding term vector comprising phrase semantic, due to synonym word to Vector distance between amount is nearer, ensures in matching process, can mutually be replaced with less error, meanwhile, this specification is real The matching that example uses the term vector of whole object question data and index data is applied, contains object question data and index The semanteme of all words in data, it ensure that term vector can accurately represent the semanteme of object question data and index data, enter And the recognition success rate to puing question to data can be improved, fast and accurately provide the user the data of demand.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the action recorded in detail in the claims or step can be come according to different from the order in embodiment Perform and still can realize desired result.In addition, the process described in the accompanying drawings not necessarily require show it is specific suitable Sequence or consecutive order could realize desired result.In some embodiments, multitasking and parallel processing be also can With or be probably favourable.
In the 1990s, the improvement for a technology can clearly distinguish be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And as the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow is programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, PLD (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, its logic function is determined by user to device programming.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, without asking chip maker to design and make Special IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " patrols Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but have many kinds, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also should This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, Can is readily available the hardware circuit for realizing the logical method flow.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing Device and storage can by the computer of the computer readable program code (such as software or firmware) of (micro-) computing device Read medium, gate, switch, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller include but is not limited to following microcontroller Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that except with Pure computer readable program code mode realized beyond controller, completely can be by the way that method and step is carried out into programming in logic to make Controller is obtained in the form of gate, switch, application specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. to come in fact Existing identical function.Therefore this controller is considered a kind of hardware component, and various for realizing to including in it The device of function can also be considered as the structure in hardware component.Or even, can be by for realizing that the device of various functions regards For that not only can be the software module of implementation method but also can be the structure in hardware component.
Device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity, Huo Zheyou Product with certain function is realized.One kind typically realizes that equipment is computer.Specifically, computer for example can be individual People's computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media player, navigation Any equipment in equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment Combination.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during specification one or more embodiment.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, apparatus or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (device) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided Instruct the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data reply process equipment To produce a machine so that produced by the instruction of computer or the computing device of other programmable data reply process equipment It is raw to be used to realize the function of specifying in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames Device.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data reply process equipment In the computer-readable memory to work in a specific way so that the instruction being stored in the computer-readable memory produces bag The manufacture of command device is included, the command device is realized in one flow of flow chart or multiple flows and/or one side of block diagram The function of being specified in frame or multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data reply process equipment so that Series of operation steps is performed on computer or other programmable devices to produce computer implemented processing, so as to calculate The instruction performed on machine or other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or square frame The step of function of being specified in one square frame of figure or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk, graphene stores or other Magnetic storage apparatus or any other non-transmission medium, the information that can be accessed by a computing device available for storage.According to herein In define, computer-readable medium does not include the data of temporary computer readable media (transitory media), such as modulation Signal and carrier wave.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Other identical element also be present in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of this specification can be provided as method, apparatus or computer program production Product.Therefore, this specification can use the implementation in terms of complete hardware embodiment, complete software embodiment or combination software and hardware The form of example.Moreover, this specification embodiment can use wherein includes computer usable program code in one or more The computer that computer-usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of program product.
This specification embodiment can described in the general context of computer executable instructions, Such as program module.Usually, program module includes performing particular task or realizes the routine of particular abstract data type, journey Sequence, object, component, data structure etc..The one or more that this specification can also be put into practice in a distributed computing environment is real Example is applied, in these DCEs, by performing task by communication network and connected remote processing devices. In DCE, program module can be located at the local and remote computer-readable storage medium including storage device In.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.Especially for device and For server example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to side The part explanation of method embodiment.
The embodiment of this specification is the foregoing is only, is not limited to this specification.For art technology For personnel, the embodiment of this specification can have various modifications and variations.All embodiments in this specification it is spiritual and former Any modification, equivalent substitution and improvements made within reason etc., should be included within right.

Claims (15)

1. a kind of data answering processing method, including:
Obtain object question data;
The target term vector of the object question data is determined based on default term vector set;
The matching degree between the index vector of the target term vector and predetermined number is calculated, will be with the target term vector Targeted customer is fed back to the reply data corresponding to the minimum index vector of degree.
2. the method according to claim 11, wherein, it is described that the object question data are determined based on default term vector set Target term vector include:
Word segmentation processing is carried out to the object question data, obtains multiple participles;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the object question The target term vector of data;
Wherein, the default term vector set includes the semanteme based on the sign word obtained to presetting corpus data to be trained The set of the term vector of the degree of association.
3. according to the method for claim 1, wherein, the index vector of the predetermined number includes using following methods true It is fixed:
Obtain the index data in the knowledge base that pre-establishes, each index number in the knowledge base pre-established described in traversal According to, performed when traveling through each index data it is following determination index datas index vector the step of:
Index data is subjected to word segmentation processing, obtains multiple participles of the index data;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the index data Index vector;
Wherein, the default term vector set includes the semanteme based on the sign word obtained to presetting corpus data to be trained The set of the term vector of the degree of association.
4. according to the method for claim 1, wherein, it is described by the index terms minimum with the target word Vectors matching degree to The corresponding reply data of amount, which feeds back to targeted customer, to be included:
It is determined that the index vector minimum with the target word Vectors matching degree;
The default mapping table of inquiry determines the reply data corresponding to the minimum index vector of the matching degree;
The reply data is fed back into targeted customer.
5. according to the method for claim 1, wherein, the matching degree comprises at least one of the following:
Euclidean distance, COS distance, manhatton distance.
6. a kind of data answering processing unit, including:
Object question data acquisition module, for obtaining object question data;
Target term vector determining module, for determined based on default term vector set the target words of the object question data to Amount;
Matching degree computing module, the matching degree between index vector for calculating the target term vector and predetermined number;
Reply data feedback module, for the response corresponding to by the index vector minimum with the target word Vectors matching degree Data feedback is to targeted customer.
7. device according to claim 6, wherein, the target term vector determining module includes:
Word segmentation processing unit, for carrying out word segmentation processing to the object question data, obtain multiple participles;
Query unit, for inquiring about the term vector of the multiple participle from the default term vector set;
Computing unit, the term vector for the participle to inquiring are weighted average computation, and the term vector being calculated is made For the target term vector of the object question data;
Wherein, the default term vector set includes the semanteme based on the sign word obtained to presetting corpus data to be trained The set of the term vector of the degree of association.
8. device according to claim 6, wherein, the index vector of the predetermined number includes using following methods true It is fixed:
Obtain the index data in the knowledge base that pre-establishes, each index number in the knowledge base pre-established described in traversal According to, performed when traveling through each index data it is following determination index datas index vector the step of:
Index data is subjected to word segmentation processing, obtains multiple participles of the index data;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the index data Index vector;
Wherein, the default term vector set includes the semanteme based on the sign word obtained to presetting corpus data to be trained The set of the term vector of the degree of association.
9. device according to claim 6, wherein, the reply data feedback module includes:
Index vector determining unit, for determining the index vector minimum with the target word Vectors matching degree;
Reply data determining unit, determine that the minimum index vector institute of the matching degree is right for inquiring about default mapping table The reply data answered;
Reply data feedback unit, for the reply data to be fed back into targeted customer.
10. device according to claim 6, wherein, the matching degree comprises at least one of the following:
Euclidean distance, COS distance, manhatton distance.
11. a kind of data answering processing server, including processor and memory, the memory storage are held by the processor Capable computer program instructions, the computer program instructions include:
Obtain object question data;
The target term vector of the object question data is determined based on default term vector set;
The matching degree between the index vector of the target term vector and predetermined number is calculated, will be with the target term vector Targeted customer is fed back to the reply data corresponding to the minimum index vector of degree.
12. server according to claim 11, wherein, it is described that the object question is determined based on default term vector set The target term vector of data includes:
Word segmentation processing is carried out to the object question data, obtains multiple participles;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the object question The target term vector of data;
Wherein, the default term vector set includes the semanteme based on the sign word obtained to presetting corpus data to be trained The set of the term vector of the degree of association.
13. server according to claim 11, wherein, the index vector of the predetermined number includes using following sides Method determines:
Obtain the index data in the knowledge base that pre-establishes, each index number in the knowledge base pre-established described in traversal According to, performed when traveling through each index data it is following determination index datas index vector the step of:
Index data is subjected to word segmentation processing, obtains multiple participles of the index data;
The term vector of the multiple participle is inquired about from the default term vector set;
The term vector of participle to inquiring is weighted average computation, using the term vector being calculated as the index data Index vector;
Wherein, the default term vector set includes the semanteme based on the sign word obtained to presetting corpus data to be trained The set of the term vector of the degree of association.
14. server according to claim 11, wherein, it is described by the index minimum with the target word Vectors matching degree Reply data corresponding to term vector, which feeds back to targeted customer, to be included:
It is determined that the index vector minimum with the target word Vectors matching degree;
The default mapping table of inquiry determines the reply data corresponding to the minimum index vector of the matching degree;
The reply data is fed back into targeted customer.
15. server according to claim 11, wherein, the matching degree comprises at least one of the following:
Euclidean distance, COS distance, manhatton distance.
CN201710616525.1A 2017-07-26 2017-07-26 Data answering processing method, device and server Pending CN107688604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710616525.1A CN107688604A (en) 2017-07-26 2017-07-26 Data answering processing method, device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710616525.1A CN107688604A (en) 2017-07-26 2017-07-26 Data answering processing method, device and server

Publications (1)

Publication Number Publication Date
CN107688604A true CN107688604A (en) 2018-02-13

Family

ID=61153005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710616525.1A Pending CN107688604A (en) 2017-07-26 2017-07-26 Data answering processing method, device and server

Country Status (1)

Country Link
CN (1) CN107688604A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984666A (en) * 2018-06-29 2018-12-11 阿里巴巴集团控股有限公司 Data processing method, data processing equipment and server
CN109284279A (en) * 2018-09-06 2019-01-29 厦门市法度信息科技有限公司 A kind of hearing problem selection method, terminal device and storage medium
WO2020244065A1 (en) * 2019-06-04 2020-12-10 平安科技(深圳)有限公司 Character vector definition method, apparatus and device based on artificial intelligence, and storage medium
CN113761184A (en) * 2020-09-29 2021-12-07 北京沃东天骏信息技术有限公司 Text data classification method, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412903A (en) * 2013-07-31 2013-11-27 无锡安拓思科技有限责任公司 Method and system for interested object prediction based real-time search of Internet of Things
CN104462357A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for realizing personalized search
CN104657346A (en) * 2015-01-15 2015-05-27 深圳市前海安测信息技术有限公司 Question matching system and question matching system in intelligent interaction system
CN106407311A (en) * 2016-08-30 2017-02-15 北京百度网讯科技有限公司 Method and device for obtaining search result
CN106407280A (en) * 2016-08-26 2017-02-15 合网络技术(北京)有限公司 Query target matching method and device
CN106484664A (en) * 2016-10-21 2017-03-08 竹间智能科技(上海)有限公司 Similarity calculating method between a kind of short text
CN106951558A (en) * 2017-03-31 2017-07-14 广东睿盟计算机科技有限公司 A kind of data processing method of the tax intelligent consulting platform based on deep search

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412903A (en) * 2013-07-31 2013-11-27 无锡安拓思科技有限责任公司 Method and system for interested object prediction based real-time search of Internet of Things
CN104462357A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for realizing personalized search
CN104657346A (en) * 2015-01-15 2015-05-27 深圳市前海安测信息技术有限公司 Question matching system and question matching system in intelligent interaction system
CN106407280A (en) * 2016-08-26 2017-02-15 合网络技术(北京)有限公司 Query target matching method and device
CN106407311A (en) * 2016-08-30 2017-02-15 北京百度网讯科技有限公司 Method and device for obtaining search result
CN106484664A (en) * 2016-10-21 2017-03-08 竹间智能科技(上海)有限公司 Similarity calculating method between a kind of short text
CN106951558A (en) * 2017-03-31 2017-07-14 广东睿盟计算机科技有限公司 A kind of data processing method of the tax intelligent consulting platform based on deep search

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984666A (en) * 2018-06-29 2018-12-11 阿里巴巴集团控股有限公司 Data processing method, data processing equipment and server
CN109284279A (en) * 2018-09-06 2019-01-29 厦门市法度信息科技有限公司 A kind of hearing problem selection method, terminal device and storage medium
CN109284279B (en) * 2018-09-06 2021-02-05 厦门市法度信息科技有限公司 Interrogation problem selection method, terminal equipment and storage medium
WO2020244065A1 (en) * 2019-06-04 2020-12-10 平安科技(深圳)有限公司 Character vector definition method, apparatus and device based on artificial intelligence, and storage medium
CN113761184A (en) * 2020-09-29 2021-12-07 北京沃东天骏信息技术有限公司 Text data classification method, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111066021B (en) Text data representation learning using random document embedding
WO2022116537A1 (en) News recommendation method and apparatus, and electronic device and storage medium
US10672155B2 (en) Non-linear, multi-resolution visualization of a graph
CN107688604A (en) Data answering processing method, device and server
US20170185835A1 (en) Assisting people with understanding charts
CN110348462A (en) A kind of characteristics of image determination, vision answering method, device, equipment and medium
CN108170667A (en) Term vector processing method, device and equipment
CN108228665A (en) Determine object tag, the method and device for establishing tab indexes, object search
US10909606B2 (en) Real-time in-venue cognitive recommendations to user based on user behavior
CN107506181A (en) Business processing, data processing method, device and electronic equipment
CN106484766B (en) Searching method and device based on artificial intelligence
CN109086961A (en) A kind of Information Risk monitoring method and device
CN107506376A (en) Obtain the client of information point data in region
CN107818487A (en) A kind of product information processing method, device, equipment and client
CN108984656A (en) Medicine label recommendation method and device
CN109902233A (en) Smart object recommended method, device, equipment and storage medium
US20210279279A1 (en) Automated graph embedding recommendations based on extracted graph features
CN110263161A (en) A kind of processing method of information, device and equipment
CN109086265A (en) A kind of semanteme training method, multi-semantic meaning word disambiguation method in short text
CN110209810A (en) Similar Text recognition methods and device
CN109271587A (en) A kind of page generation method and device
CN107402945A (en) Word stock generating method and device, short text detection method and device
WO2016175785A1 (en) Topic identification based on functional summarization
CN110232131A (en) Intention material searching method and device based on intention label
CN117556067B (en) Data retrieval method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1248866

Country of ref document: HK

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: Greater Cayman, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180213