CN111859939A - Text matching method and system and computer equipment - Google Patents

Text matching method and system and computer equipment Download PDF

Info

Publication number
CN111859939A
CN111859939A CN202010743264.1A CN202010743264A CN111859939A CN 111859939 A CN111859939 A CN 111859939A CN 202010743264 A CN202010743264 A CN 202010743264A CN 111859939 A CN111859939 A CN 111859939A
Authority
CN
China
Prior art keywords
candidate
target
cls
text
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010743264.1A
Other languages
Chinese (zh)
Other versions
CN111859939B (en
Inventor
庞帅
张扬
马建强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010743264.1A priority Critical patent/CN111859939B/en
Publication of CN111859939A publication Critical patent/CN111859939A/en
Application granted granted Critical
Publication of CN111859939B publication Critical patent/CN111859939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a text matching method, which comprises the following steps: receiving a target text sent by a client terminal; performing coding operation on the target text to obtain a target code; acquiring a plurality of candidate codes according to the target code, wherein the candidate codes are obtained by pre-coding a plurality of candidate texts; calculating the recognition degrees of the target text and a plurality of candidate texts corresponding to the candidate codes according to the target code and the candidate codes to obtain a candidate text with the highest recognition degree with the target text; and outputting the candidate text with the highest recognition degree with the target text. The embodiment of the invention improves the speed and the accuracy of text matching by coding and interacting the target text and the candidate text.

Description

Text matching method and system and computer equipment
Technical Field
The embodiment of the invention relates to the field of artificial intelligence, in particular to a text matching method, a text matching system, computer equipment and a computer readable storage medium.
Background
Artificial intelligence technology is beginning to be used in a large number of fields, such as finance. In the financial field, various financial businesses, such as enterprise risk business, advertisement business and the like, can be analyzed by artificial intelligence. By taking advertisement service as an example, the method is different from the traditional text matching scheme based on the word, semantic similarity can be considered through an artificial intelligence technology, and the meaning of a root deep layer in information provided by a client is mined. Taking NLP (Natural language processing) in the intelligent technology of workers as an example, with the increasingly deep study in the NLP field, new depth matching model methods are emerging continuously, the advantages of the method are that semantic focus can be well grasped, the context importance is reasonably modeled, and the early stage is represented by network structures such as DSSM, CDSMM and ARC-I. However, the core problem of these models is that the obtained sentence expression loses the semantic focus, semantic shift is easy to occur, and the context importance of the word is difficult to measure. Therefore, how to improve the linkage capture capability of the semantic focus, so as to further improve the accuracy of text matching and the speed of text matching becomes one of the technical problems to be solved urgently at present.
Disclosure of Invention
In view of the above, there is a need to provide a text matching method, system, computer device and computer readable storage medium, so as to solve the technical problems that when matching a current text, a sentence obtained is easy to lose semantic focus, semantic shift is easy to occur, and the context importance of a word is difficult to measure.
In order to achieve the above object, an embodiment of the present invention provides a text matching method, where the method includes:
receiving a target text sent by a user through a client terminal;
performing coding operation on the target text to obtain a target code;
acquiring a plurality of candidate codes according to the target code, wherein the candidate codes are obtained by pre-coding a plurality of candidate texts;
calculating the recognition degrees of the target text and a plurality of candidate texts corresponding to the candidate codes according to the target code and the candidate codes to obtain a candidate text with the highest recognition degree with the target text;
and outputting the candidate text with the highest recognition degree with the target text.
Illustratively, calculating the degree of recognition of the target text with a plurality of candidate texts corresponding to a plurality of candidate codes according to the target code and the plurality of candidate codes to obtain a candidate text with the highest degree of recognition with the target text, includes:
performing interactive encoding operation on the target codes and the candidate codes to obtain a plurality of target word vectors corresponding to the target text and a plurality of candidate word vectors corresponding to each candidate text;
determining a joint vector according to the plurality of target word vectors and the plurality of candidate word vectors; and
and calculating the recognition degrees of the candidate texts and the target text according to the joint vector to obtain the candidate text with the highest recognition degree with the target text.
Illustratively, the determining a joint vector from the plurality of target word vectors and the plurality of candidate word vectors comprises:
filling the CLS before the first word in the target text and each candidate text so as to take the CLS as the first word of the target text and each candidate text;
respectively carrying out coding operation on a target CLS corresponding to the target text and a candidate CLS corresponding to the candidate text to obtain a target CLS code and a candidate CLS code;
obtaining a target head vector A according to the target CLS codeCLSMeanwhile, a candidate head vector B is obtained according to the candidate CLS codingCLS
Encoding each target word vector of a target text into the target CLS to obtain a target combination vector A'CLSAnd encoding each candidate word vector of the candidate text into the candidate CLS to obtain a candidate combination directionAmount B'CLS
Encoding the target CLS and the target combination vector A'CLSSplicing to obtain a target splicing vector NACLSSimultaneously encoding the candidate CLS and the candidate combined vector B'CLSSplicing is carried out to obtain a candidate splicing vector NBCLS(ii) a And
according to the target splicing vector NACLSAnd the candidate stitching vector NBCLSAnd (5) performing joint processing to obtain a joint vector.
Illustratively, the splicing vector NA according to the targetCLSAnd the candidate stitching vector NBCLSJoint processing to obtain a joint vector, comprising:
splicing the target with the vector NA through a mapping function prepared in advanceCLSAnd the candidate stitching vector NBCLSRespectively carrying out mapping operation to obtain a target interaction vector RA and a candidate interaction vector RB;
and carrying out interactive processing on the target interactive vector RA and the candidate interactive vector RB to obtain a joint vector.
In order to achieve the above object, an embodiment of the present invention further provides a text matching system, including:
the receiving module is used for receiving a target text sent by a user through a client terminal;
the encoding module is used for performing encoding operation on the target text to obtain a target code;
the acquisition module is used for acquiring a plurality of candidate codes according to the target code, wherein the candidate codes are obtained by pre-coding a plurality of candidate texts;
the calculation module is used for calculating the recognition degrees of a plurality of candidate texts corresponding to the target text and the plurality of candidate codes according to the target code and the plurality of candidate codes;
and the output module is used for outputting the candidate text with the highest recognition degree with the target text.
Illustratively, the computing module is further configured to:
performing interactive encoding operation on the target codes and the candidate codes to obtain a plurality of target word vectors corresponding to the target text and a plurality of candidate word vectors corresponding to each candidate text;
determining a joint vector according to the plurality of target word vectors and the plurality of candidate word vectors; and
and calculating the recognition degrees of the candidate texts and the target text according to the joint vector to obtain the candidate text with the highest recognition degree with the target text.
Illustratively, the computing module is further configured to:
filling the CLS before the first word in the target text and each candidate text so as to take the CLS as the first word of the target text and each candidate text;
respectively carrying out coding operation on a target CLS corresponding to the target text and a candidate CLS corresponding to the candidate text to obtain a target CLS code and a candidate CLS code;
obtaining a target head vector A according to the target CLS codeCLSMeanwhile, a candidate head vector B is obtained according to the candidate CLS codingCLS
Encoding each target word vector of a target text into the target CLS to obtain a target combination vector A'CLSAnd encoding each candidate word vector of the candidate text into the candidate CLS to obtain a candidate combination vector B'CLS
Encoding the target CLS and the target combination vector A'CLSSplicing to obtain a target splicing vector NACLSSimultaneously encoding the candidate CLS and the candidate combined vector B'CLSSplicing is carried out to obtain a candidate splicing vector NBCLS(ii) a And
according to the target splicing vector NACLSAnd the candidate stitching vector NBCLSAnd (5) performing joint processing to obtain a joint vector.
Illustratively, the computing module is further configured to:
splicing the targets into a target by a mapping function prepared in advanceQuantity NACLSAnd the candidate stitching vector NBCLSRespectively carrying out mapping operation to obtain a target interaction vector RA and a candidate interaction vector RB;
and carrying out interactive processing on the target interactive vector RA and the candidate interactive vector RB to obtain a joint vector.
To achieve the above object, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when executed by the processor, the computer program implements the steps of the text matching method as described above.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program is executable by at least one processor to cause the at least one processor to execute the steps of the text matching method as described above. Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The text matching method, the text matching system, the computer equipment and the computer readable storage medium provided by the embodiment of the invention provide a faster text matching method for language identification; according to the invention, through coding and interacting the target text and the candidate text, the up-down relation of words in the text is enhanced so as to reduce the phenomenon of semantic deviation; and vector dimensionality is reduced to avoid the problems of vector sparsity and vector oversize, and the speed and accuracy of text matching are improved.
Drawings
Fig. 1 is a flowchart illustrating a text matching method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of program modules of a second embodiment of the text matching system of the present invention.
Fig. 3 is a schematic diagram of a hardware structure of a third embodiment of the computer device according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
In the following embodiments, the computer device 2 will be exemplarily described as an execution subject.
Example one
Referring to fig. 1, a flow chart of steps of a text matching method according to an embodiment of the invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is made by way of example with the computer device 2 as the execution subject. The details are as follows.
Step S100, receiving a target text sent by a user through a client terminal.
The target text may be a question text sent by the user through the client terminal, for example, the question text may be "i want to consult insurance services", "view my insurance bills", and "my insurance remaining time", etc.
And step S102, carrying out coding operation on the target text to obtain a target code.
Illustratively, the step S102 may further include: encoding the target text TA by the BERT to obtain a target code Va: va ═ BERT _ encoder (ta).
For example, the text matching system may perform an encoding operation on the target text TA after receiving the target text TA sent by the user through the client terminal, and the text matching system may perform an encoding operation on the target text TA through an Encoder, where the Encoder may be a BERT (Bidirectional Encoder) or a Roberta (a robust Optimized BERT pretraining application language model). For example, encoding the target text TA by the BERT may result in a target encoding Va:
Va=BERT_Encoder(TA)
step S104, a plurality of candidate codes are obtained according to the target code, wherein the candidate codes are obtained by pre-coding a plurality of candidate texts.
The text matching system may obtain a plurality of candidate codes from a database according to the target code, the candidate codes are obtained by pre-coding a plurality of candidate texts, and the candidate texts and the target text may be coded by the same coder. For example, encoding the candidate text TB by the BERT may result in a candidate encoding Vb:
Vb=BERT_Encoder(TB)
the candidate text is prepared in advance, namely the candidate text is fixed and unchanged, and the candidate text and the target text can use the same encoder, so that the candidate codes corresponding to the candidate text can be calculated off-line in advance and stored in a database; when the online real work is carried out, the text matching system can take out the candidate codes from the database according to the target codes.
And step S106, calculating the recognition degrees of the target text and a plurality of candidate texts corresponding to a plurality of candidate codes according to the target code and the candidate codes to obtain the candidate text with the highest recognition degree with the target text.
Illustratively, the step S106 may further include steps 200 to 204:
step S200, interactive coding operation is carried out on the target codes and the candidate codes to obtain a plurality of target word vectors corresponding to the target text and a plurality of candidate word vectors corresponding to each candidate text.
The interactive encoding operation is that the target encoding Va and the candidate encoding Vb can re-encode themselves through vectors of each other. Target coding Va is coding of a target text by BERT coding, and the coding is in a vector form, and each word in the target text has a target word vector:
Figure RE-GDA0002638083360000091
the candidate code Vb is a code corresponding to a candidate text, and as with the target code Va, each word in the candidate text has a candidate word vector:
Figure RE-GDA0002638083360000092
wherein, said eijDistance for target code Va and candidate code Vb:
Figure RE-GDA0002638083360000093
Laand LbRepresenting the number of included words of the target text and the candidate text, respectively.
Step S202, determining a joint vector according to the target word vectors and the candidate word vectors.
Illustratively, the step S202 may further include steps 300 to 310:
and step S300, filling the CLS before the first word in the target text and each candidate text is filled by the CLS, so that the CLS is used as the first word of the target text and each candidate text.
Illustratively, CLS is used as the first word of the target text and each candidate text.
Step S302, respectively carrying out coding operation on the target CLS corresponding to the target text and the candidate CLS corresponding to the candidate text so as to obtain a target CLS code and a candidate CLS code.
Illustratively, the target CLS corresponding to the target text is encoded to obtain a target CLS code VaCLSAnd coding the candidate CLS corresponding to the candidate text to obtain a candidate CLS code VbCLS
Step S304, obtaining a target head vector A according to the target CLS codeCLSMeanwhile, a candidate head vector B is obtained according to the candidate CLS codingCLS
Exemplary of said VaCLSAnd said VbCLSRe-encoding respectively using the opposite vectors to obtain the target head vector ACLSAnd the candidate head vector BCLS
Step S306, encoding each target word vector of the target text into the first word of the target text to obtain a target combination vector A'CLSAnd encoding each candidate word vector of the candidate text into the first word of the candidate text to obtain a candidate combination vector B'CLS
Illustratively, each target word vector of the target text is encoded into the first word of the target text to obtain a target combination vector A'CLS
Figure RE-GDA0002638083360000101
Wherein e isAiFor the distance from the first word of the target text to other words in the target text: e.g. of the typeAi=ACLS·Ai(ii) a Encoding each candidate word vector of the candidate text into the first word of the candidate text to obtain a candidate combination vector B'CLS
Figure RE-GDA0002638083360000111
Wherein e isBjFor the distance from the first word of the candidate text to other words in the candidate text: e.g. of the typeBj=BCLS·Bj
For example, first, the first word of each sentence is CLS, that is, a sentence is "i want to consult insurance services", and after the word is divided, the sentence is "i want to consult insurance services", and then the sentence is input into the system to be changed, so that "CLS i want to consult insurance services" is obtained, and at this time, CLS is the first word in the sentence. Encoding information of each word in a sentence to a first word by using self-annotation method, continuing to take "CLS i want to consult insurance services" as an example, after encoding, each word in the sentence has a vector, represented by V0, V1, V2, V3, V4, and V5, we calculate dot products of V0 and Vi respectively, to obtain 6 dot product values D0, D1, D2, D3, D4, D5, which can be regarded as V0 and V0, V0 and V1, V0 and V2, V0 and V3, V0 and V4, and V0 and V5 respectively (the distance can be understood as a simplified distance in european space), and finally, encoding of CLS is performed by adding up the weighted vector V0+ V0 to obtain a weighted vector, which is a weighted vector of V0+ V0, and the weighted vector is obtained by adding up the weighted vector V0+ V0, which is a weighted vector 0+ V0, the ratio is determined according to the distance between the first word and other words, and is relative to the first word, so that the information of each word in the sentence can be considered to be encoded into the first word.
Step S308, the target head vector and the target combination vector corresponding to the first word of the target text are spliced to obtain a target splicing vector NACLSAnd simultaneously splicing the candidate head vector and the candidate combination vector corresponding to the first word of the candidate text to obtain a candidate splicing vector NBCLS
Illustratively, VaCLSAnd A'CLSSpliced into a vector NACLS: NACLS=[VaCLS,A′CLS](ii) a At the same time convert VbCLSAnd B'CLSSpliced into a vector NBCLS: NBCLS=[VbCLS,B′CLS]。
Step S310, splicing the vector NA according to the targetCLSAnd the candidate stitching vector NBCLSThe combination treatment is carried out on the raw materials,to obtain a joint vector.
Illustratively, the step S310 may further include the steps 400 to 402:
step S400, splicing the target with the vector NA through a mapping function prepared in advanceCLSAnd the candidate stitching vector NBCLSRespectively carrying out mapping operation to obtain a target interaction vector RA and a candidate interaction vector RB; step S402, the target interactive vector RA and the candidate interactive vector RB are interactively processed to obtain a joint vector.
Exemplarily, the final interactive representations RA and RB of the target text and the candidate text are obtained by respectively passing the spliced vectors through a mapping matrix: RA ═ f (NA)CLS×W+b)、 RB=f(NBCLSX W + b). Further, the RA and the RB are interacted to obtain a joint vector REP:
Figure RE-GDA0002638083360000121
wherein:
P1=RA
P2=RB
P3=element_wise_abs(RA-RB)
P4=element_wise_max(RA,RB)
P5=element_wise_dot(RA,RB)
ei=Pi*KcT(Kc obtained from training)
The joint vector REP may be used for the final degree of identity calculation as a final joint representation of the target text and the candidate text. The Attention-posing strategy is used for further interacting the target text and the candidate text on the one hand, and for reducing vector dimensions, avoiding sparse and overlarge vectors on the other hand, so that the effect and the computing speed are guaranteed.
Step S204, calculating the recognition degrees of the candidate texts and the target text according to the joint vector to obtain a candidate text with the highest recognition degree with the target text.
And step S108, outputting the candidate text with the highest recognition degree with the target text.
In some embodiments, the text matching method can also be applied to various fields, such as finance and the like. In the financial field, various financial businesses, such as enterprise risk business, advertisement business, and the like, can be analyzed. Taking advertisement service as an example, the text matching method can consider the similarity of advertisement semantics of the advertisement service and excavate the meaning of a root deep layer in information provided by a client; more accurate advertisement push service can be provided for users, and the like.
Example two
Fig. 2 is a schematic diagram of program modules of a second embodiment of the text matching system of the present invention. The text matching system 20 may include or be divided into one or more program modules, which are stored in a storage medium and executed by one or more processors to implement the present invention and implement the text matching methods described above. The program module referred to in the embodiments of the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than the program itself for describing the execution process of the text matching system 20 in the storage medium. The following description will specifically describe the functions of the program modules of the present embodiment:
a receiving module 200, configured to receive a target text sent by a user through a client terminal.
And the encoding module 202 is configured to perform an encoding operation on the target text to obtain a target code.
An obtaining module 204, configured to obtain a plurality of candidate codes according to the target code, where the candidate codes are obtained by performing pre-coding processing on a plurality of candidate texts.
A calculating module 206, configured to calculate, according to the target code and the candidate codes, acquaintances of multiple candidate texts corresponding to the target text and the candidate codes.
Illustratively, the calculating module 206 is further configured to: performing interactive encoding operation on the target codes and the candidate codes to obtain a plurality of target word vectors corresponding to the target text and a plurality of candidate word vectors corresponding to each candidate text; determining a joint vector according to the plurality of target word vectors and the plurality of candidate word vectors; and calculating the recognition degrees of the candidate texts and the target text according to the joint vector to obtain the candidate text with the highest recognition degree with the target text.
Illustratively, the calculating module 206 is further configured to: filling the CLS before the first word in the target text and each candidate text so as to take the CLS as the first word of the target text and each candidate text; respectively carrying out coding operation on a target CLS corresponding to the target text and a candidate CLS corresponding to the candidate text to obtain a target CLS code and a candidate CLS code; obtaining a target head vector A according to the target CLS codeCLSMeanwhile, a candidate head vector B is obtained according to the candidate CLS codingCLS(ii) a Encoding each target word vector of a target text into the target CLS to obtain a target combination vector A'CLSAnd encoding each candidate word vector of the candidate text into the candidate CLS to obtain a candidate combination vector B'CLS(ii) a Encoding the target CLS and the target combination vector A'CLSSplicing to obtain a target splicing vector NACLSSimultaneously encoding the candidate CLS and the candidate combined vector B'CLSSplicing is carried out to obtain a candidate splicing vector NBCLS(ii) a And splicing the vectors NA according to the targetCLSAnd the candidate stitching vector NBCLSAnd (5) performing joint processing to obtain a joint vector.
Illustratively, the calculating module 206 is further configured to: splicing the target with the vector NA through a mapping function prepared in advanceCLSAnd the candidate stitching vector NBCLSRespectively carrying out mapping operation to obtain a target interaction vector RA and a candidate interaction vector RB; and carrying out interactive processing on the target interactive vector RA and the candidate interactive vector RB to obtain a joint vector.
And the output module 208 is configured to output the candidate text with the highest recognition degree with the target text.
EXAMPLE III
Fig. 3 is a schematic diagram of a hardware architecture of a computer device according to a third embodiment of the present invention. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), and the like. As shown, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a text matching system 20 communicatively coupled to each other via a system bus.
In this embodiment, the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both internal and external memory units of the computer device 2. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 2 and various application software, such as the program codes of the text matching system 20 in the second embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to execute the program code stored in the memory 21 or process data, for example, execute the text matching system 20, so as to implement the text matching method of the first embodiment.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing communication connection between the computer device 2 and other electronic apparatuses. For example, the network interface 23 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
It is noted that fig. 3 only shows the computer device 2 with components 20-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the text matching system 20 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 22) to complete the present invention.
For example, fig. 2 is a schematic diagram of program modules for implementing the text matching system 20 according to the second embodiment of the present invention, in which the text matching system 20 may be divided into a receiving module 200, an encoding module 202, an obtaining module 204, a calculating module 206, and an outputting module 208. The program module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the text matching system 20 in the computer device 2. The specific functions of the program modules 200 and 208 have been described in detail in the second embodiment, and are not described herein again.
Example four
The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer-readable storage medium of the embodiment is used in the text matching system 20, and when executed by a processor, the text matching method of the first embodiment is implemented.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method of text matching, the method comprising:
receiving a target text sent by a client terminal;
performing coding operation on the target text to obtain a target code;
acquiring a plurality of candidate codes according to the target code, wherein the candidate codes are obtained by pre-coding a plurality of candidate texts;
calculating the recognition degrees of the target text and a plurality of candidate texts corresponding to the candidate codes according to the target code and the candidate codes to obtain a candidate text with the highest recognition degree with the target text;
and outputting the candidate text with the highest recognition degree with the target text.
2. The text matching method of claim 1, wherein the calculating the degree of recognition of the target text and the candidate texts corresponding to the candidate codes according to the target code and the candidate codes to obtain the candidate text with the highest degree of recognition with the target text comprises:
performing interactive encoding operation on the target codes and the candidate codes to obtain a plurality of target word vectors corresponding to the target text and a plurality of candidate word vectors corresponding to each candidate text;
determining a joint vector according to the plurality of target word vectors and the plurality of candidate word vectors; and
and calculating the recognition degrees of the candidate texts and the target text according to the joint vector to obtain the candidate text with the highest recognition degree with the target text.
3. The text matching method of claim 2, wherein said determining a joint vector from the plurality of target word vectors and the plurality of candidate word vectors comprises:
filling the CLS before the first word in the target text and each candidate text so as to take the CLS as the first word of the target text and each candidate text;
respectively carrying out coding operation on a target CLS corresponding to the target text and a candidate CLS corresponding to the candidate text to obtain a target CLS code and a candidate CLS code;
obtaining a target head vector A according to the target CLS codeCLSMeanwhile, a candidate head vector B is obtained according to the candidate CLS codingCLS
Encoding each target word vector of a target text into the target CLS to obtain a target combination vector A'CLSAnd combining the candidate textEach candidate word vector is coded into the candidate CLS to obtain a candidate combined vector B'CLS
Encoding the target CLS and the target combination vector A'CLSSplicing to obtain a target splicing vector NACLSSimultaneously encoding the candidate CLS and the candidate combined vector B'CLSSplicing is carried out to obtain a candidate splicing vector NBCLS(ii) a And
according to the target splicing vector NACLSAnd the candidate stitching vector NBCLSAnd (5) performing joint processing to obtain a joint vector.
4. The text matching method of claim 3 wherein said target-based stitching vector NACLSAnd the candidate stitching vector NBCLSJoint processing to obtain a joint vector, comprising:
splicing the target with the vector NA through a mapping function prepared in advanceCLSAnd the candidate stitching vector NBCLSRespectively carrying out mapping operation to obtain a target interaction vector RA and a candidate interaction vector RB;
and carrying out interactive processing on the target interactive vector RA and the candidate interactive vector RB to obtain a joint vector.
5. A text matching system, comprising:
the receiving module is used for receiving a target text sent by a user through a client terminal;
the encoding module is used for performing encoding operation on the target text to obtain a target code;
the acquisition module is used for acquiring a plurality of candidate codes according to the target code, wherein the candidate codes are obtained by pre-coding a plurality of candidate texts;
the calculation module is used for calculating the recognition degrees of a plurality of candidate texts corresponding to the target text and the plurality of candidate codes according to the target code and the plurality of candidate codes;
and the output module is used for outputting the candidate text with the highest recognition degree with the target text.
6. The text matching system of claim 5, wherein the computing module is further to:
performing interactive encoding operation on the target codes and the candidate codes to obtain a plurality of target word vectors corresponding to the target text and a plurality of candidate word vectors corresponding to each candidate text;
determining a joint vector according to the plurality of target word vectors and the plurality of candidate word vectors; and
and calculating the recognition degrees of the candidate texts and the target text according to the joint vector to obtain the candidate text with the highest recognition degree with the target text.
7. The text matching system of claim 6, wherein the computing module is further to:
filling the CLS before the first word in the target text and each candidate text so as to take the CLS as the first word of the target text and each candidate text;
respectively carrying out coding operation on a target CLS corresponding to the target text and a candidate CLS corresponding to the candidate text to obtain a target CLS code and a candidate CLS code;
obtaining a target head vector A according to the target CLS codeCLSMeanwhile, a candidate head vector B is obtained according to the candidate CLS codingCLS
Encoding each target word vector of a target text into the target CLS to obtain a target combination vector A'CLSAnd encoding each candidate word vector of the candidate text into the candidate CLS to obtain a candidate combination vector B'CLS
Encoding the target CLS and the target combination vector A'CLSSplicing to obtain a target splicing vector NACLSSimultaneously encoding the candidate CLS and the candidate combined vector B'CLSSplicing is carried out to obtain a candidate splicing vector NBCLS(ii) a And
according to the target splicing vector NACLSAnd the candidate stitching vector NBCLSAnd (5) performing joint processing to obtain a joint vector.
8. The text matching system of claim 7, wherein the computing module is further configured to:
splicing the target with the vector NA through a mapping function prepared in advanceCLSAnd the candidate stitching vector NBCLSRespectively carrying out mapping operation to obtain a target interaction vector RA and a candidate interaction vector RB;
and carrying out interactive processing on the target interactive vector RA and the candidate interactive vector RB to obtain a joint vector.
9. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program, when executed by the processor, carries out the steps of the text matching method according to any one of claims 1 to 4.
10. A computer-readable storage medium, in which a computer program is stored which is executable by at least one processor for causing the at least one processor to carry out the steps of the text matching method according to any one of claims 1 to 4.
CN202010743264.1A 2020-07-29 2020-07-29 Text matching method, system and computer equipment Active CN111859939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010743264.1A CN111859939B (en) 2020-07-29 2020-07-29 Text matching method, system and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010743264.1A CN111859939B (en) 2020-07-29 2020-07-29 Text matching method, system and computer equipment

Publications (2)

Publication Number Publication Date
CN111859939A true CN111859939A (en) 2020-10-30
CN111859939B CN111859939B (en) 2023-07-25

Family

ID=72945048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010743264.1A Active CN111859939B (en) 2020-07-29 2020-07-29 Text matching method, system and computer equipment

Country Status (1)

Country Link
CN (1) CN111859939B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580325A (en) * 2020-12-25 2021-03-30 建信金融科技有限责任公司 Rapid text matching method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740126A (en) * 2019-01-04 2019-05-10 平安科技(深圳)有限公司 Text matching technique, device and storage medium, computer equipment
CN111339256A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for text processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740126A (en) * 2019-01-04 2019-05-10 平安科技(深圳)有限公司 Text matching technique, device and storage medium, computer equipment
CN111339256A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for text processing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580325A (en) * 2020-12-25 2021-03-30 建信金融科技有限责任公司 Rapid text matching method and device
CN112580325B (en) * 2020-12-25 2023-04-07 建信金融科技有限责任公司 Rapid text matching method and device

Also Published As

Publication number Publication date
CN111859939B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
WO2022007438A1 (en) Emotional voice data conversion method, apparatus, computer device, and storage medium
CN112307168B (en) Artificial intelligence-based inquiry session processing method and device and computer equipment
EP4336378A1 (en) Data processing method and related device
CN110309282A (en) A kind of answer determines method and device
US20230169101A1 (en) Conversion of tabular format data to machine readable text for qa operations
CN113947095B (en) Multilingual text translation method, multilingual text translation device, computer equipment and storage medium
CN115269512A (en) Object recommendation method, device and storage medium for realizing IA by combining RPA and AI
CN106407381A (en) Method and device for pushing information based on artificial intelligence
CN112463989A (en) Knowledge graph-based information acquisition method and system
CN113052262A (en) Form generation method and device, computer equipment and storage medium
CN112232052A (en) Text splicing method and device, computer equipment and storage medium
CN113887237A (en) Slot position prediction method and device for multi-intention text and computer equipment
CN114445832A (en) Character image recognition method and device based on global semantics and computer equipment
CN112464642A (en) Method, device, medium and electronic equipment for adding punctuation to text
CN114358023B (en) Intelligent question-answer recall method, intelligent question-answer recall device, computer equipment and storage medium
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN114780701A (en) Automatic question-answer matching method, device, computer equipment and storage medium
CN111859939B (en) Text matching method, system and computer equipment
CN113505595A (en) Text phrase extraction method and device, computer equipment and storage medium
CN113723072A (en) RPA (resilient packet Access) and AI (Artificial Intelligence) combined model fusion result acquisition method and device and electronic equipment
CN113486659A (en) Text matching method and device, computer equipment and storage medium
CN117195886A (en) Text data processing method, device, equipment and medium based on artificial intelligence
CN112182253B (en) Data processing method, data processing equipment and computer readable storage medium
CN110780850B (en) Requirement case auxiliary generation method and device, computer equipment and storage medium
CN114490969A (en) Question and answer method and device based on table and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant