Summary of the invention
The embodiment of the invention provides a kind of information matching method and information matches device, can be improved from unstructured text
The accuracy of useful information is extracted in this.
In a first aspect, the embodiment of the invention provides a kind of information matching methods, including:
At least one non-structured urtext is obtained in advance;
At least one information to be matched is extracted from urtext described in each respectively;
Receive at least one target information that user sends;
For information to be matched described in each of urtext described in each, each described target information is determined
In whether there is at least one close information, wherein semanteme and the language of each close information of the information to be matched
The similar value of justice is more than or equal to preset first threshold;
If so, the label information to be matched;
When receiving the user according to the feedback result that the information to be matched of label is sent is to match correct, note
The information to be matched is recorded to match with target information described in each.
Preferably, before the label information to be matched, further comprise if it is later described:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing
Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission
Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
Preferably, it whether there is at least one close information in each described target information of the determination, wherein described
The semantic similar value of the semanteme of information to be matched and each close information, is more than or equal to preset threshold value, if so,
The information to be matched is marked, including:
When at least one described close information is not present,
It determines in each described target information with the presence or absence of at least one verification information, wherein the information to be matched
Semanteme and the semantic similar value of each verification information, be more than or equal to preset second threshold, if so, executing
Otherwise S2 is sent to the user without match information.
Preferably, after the label information to be matched, further comprise:
The user is being received according to the information to be matched of label, the feedback result of transmission is matching error and the
When three threshold values, the first threshold is updated to the third threshold value;
It records the information to be matched and each described target information mismatches.
Preferably, described to obtain at least one non-structured urtext in advance, including:
At least one non-structured text is obtained in advance;
For non-structured text described in each, the hypertext markup language label of the non-structured text is removed
Html tag, and determine that the non-structured text for being removed html tag is non-structured urtext.
Second aspect, the embodiment of the invention provides a kind of information matches devices, including:
Acquiring unit, for obtaining at least one non-structured urtext in advance;Receive user sends at least one
A target information;
Extraction unit, for extracting at least one from the urtext that acquiring unit described in each obtains respectively
A information to be matched;
Matching unit, for described in being extracted for extraction unit described in each of each described urtext to
Match information, determining in each described target information of the acquiring unit acquisition whether there is at least one close information,
Wherein, the semantic similar value of the semanteme of the information to be matched and each close information is more than or equal to preset the
One threshold value;If so, the label information to be matched;It is sent receiving the user according to the information to be matched of label
Feedback result be match it is correct when, record the information to be matched and match with target information described in each.
Preferably, the matching unit is further used for executing:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing
Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission
Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
Preferably, the matching unit, for determining described in each when at least one described close information is not present
It whether there is at least one verification information in target information, wherein semanteme and each described verifying of the information to be matched
The semantic similar value of information, is more than or equal to preset second threshold, if so, executing S2, otherwise, sends nothing to the user
Match information.
Preferably, the matching unit is sent for receiving the user according to the information to be matched of label
Feedback result be matching error and third threshold value when, the first threshold is updated to the third threshold value;Record it is described to
Match information and each described target information mismatch.
Preferably, the acquiring unit, for obtaining at least one non-structured text in advance;For non-described in each
Structured text removes the hypertext markup language label H TML label of the non-structured text, and determination is removed HTML
The non-structured text of label is non-structured urtext.
In embodiments of the present invention, it after getting at least one non-structured urtext, needs from each original
At least one information to be matched is extracted in beginning text, so that when receiving at least one target information of user's transmission,
Each information to be matched can be compared with each target information respectively, intelligent Matching is out and similar in each close information semantic
Information to be matched extracts information without manually taking a substantial amount of time from non-structured text, so as to improve from non-
The efficiency of information is extracted in structured text, and manual synchronizing is carried out to the information to be matched of label, can also constantly be mentioned
Height extracts the accuracy rate of information from non-structured text.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, the embodiment of the invention provides a kind of information matching methods, including:
Step 101:At least one non-structured urtext is obtained in advance;
Step 102:At least one information to be matched is extracted from urtext described in each respectively;
Step 103:Receive at least one target information that user sends;
Step 104:For information to be matched described in each of urtext described in each, determine described in each
In target information whether there is at least one close information, wherein the information to be matched semanteme with it is close described in each
The semantic similar value of information, is more than or equal to preset first threshold;
Step 105:If so, the label information to be matched;
Step 106:It is matching receiving the user according to the feedback result that the information to be matched of label is sent
When correct, record the information to be matched and match with target information described in each.
In embodiments of the present invention, it after getting at least one non-structured urtext, needs from each original
At least one information to be matched is extracted in beginning text, so that when receiving at least one target information of user's transmission,
Each information to be matched can be compared with each target information respectively, intelligent Matching is out and similar in each close information semantic
Information to be matched extracts information without manually taking a substantial amount of time from non-structured text, so as to improve from non-
The efficiency of information is extracted in structured text, and manual synchronizing is carried out to the information to be matched of label, can also constantly be mentioned
Height extracts the accuracy rate of information from non-structured text.
To sum up, by manually correcting after matching information to be matched corresponding with target information, and artificial rectify is recorded
Just as a result, forming closed loop feedback information matching, when so as to match target information again, it can be improved and mentioned from non-structured text
It wins the confidence the accuracy rate of breath.
In an embodiment of the present invention, described if it is later, before the label information to be matched, into one
Step includes:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing
Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission
Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
In embodiments of the present invention, there are at least one close information and information to be matched in determining each target information
After semantic similarity, it is also necessary to determine whether the quantity of close information is not less than preset number of matches, so as to determine letter to be matched
The degree that is close with the semanteme of each target information is ceased, after the quantity in close information is not less than number of matches, i.e.,
Information to be matched can be marked, temporarily determines that information to be matched is required information, so as to be matched information of the user according to label
Manual synchronizing is carried out, to constantly improve information while improving the efficiency for extracting information from non-structured text and mention
The accuracy rate taken.
In an embodiment of the present invention, it whether there is at least one close letter in each described target information of the determination
Breath, wherein the semantic similar value of the semanteme of the information to be matched and each close information is more than or equal to preset
Threshold value, if so, the label information to be matched, including:
When at least one described close information is not present,
It determines in each described target information with the presence or absence of at least one verification information, wherein the information to be matched
Semanteme and the semantic similar value of each verification information, be more than or equal to preset second threshold, if so, executing
Otherwise S2 is sent to the user without match information.
In embodiments of the present invention, when at least one close information is not present in each target information, it is also necessary to determine
In each target information, if be more than or equal to second threshold there are the similar value of at least one and information semantic to be verified and be less than
The verification information of first threshold, so that by data input database to be matched after there are verification information, so that by artificial
Determine whether to match with target information, avoid regarding as information to be matched after close information is not present to mismatch information,
And influence to extract useful information from non-structured text, and after determining the information for being not present and matching with target information,
It needs to send to user without match information, so that user determines that it fails to match.
In an embodiment of the present invention, after the label information to be matched, further comprise:
The user is being received according to the information to be matched of label, the feedback result of transmission is matching error and the
When three threshold values, the first threshold is updated to the third threshold value, wherein the third threshold value is greater than the first threshold;
It records the information to be matched and each described target information mismatches.
In embodiments of the present invention, it when receiving the feedback result of user's transmission is matching error and third threshold value, says
Bright information to be matched is mismatched with each target information and first threshold is too low, so as to cause the letter to be matched for having matched mistake
Breath avoids too low threshold value from the matching of excessive mistake occur, and record therefore, it is necessary to which first threshold is updated to third threshold value
Information to be matched and each target information mismatch, so as to update backstage match pattern, improve when matching target information again
Accuracy.
In an embodiment of the present invention, described to obtain at least one non-structured urtext in advance, including:
At least one non-structured text is obtained in advance;
For non-structured text described in each, the hypertext markup language label of the non-structured text is removed
Html tag, and determine that the non-structured text for being removed html tag is non-structured urtext.
In embodiments of the present invention, it after getting at least one non-structured text, needs first to pre-process, that is, removes
The html tag of non-structured text extracts information to be matched to improve so as to reduce the influence for extracting information to be matched
Accuracy.
It is anti-to the embodiment of the invention provides the first in order to more clearly illustrate technical solution of the present invention and advantage
The information matching method of feedback is described in detail, as shown in Fig. 2, this method may comprise steps of:
Step 201:At least one non-structured text is obtained in advance.
Specifically, by obtaining each non-structured text, each non-structured text can be carried out dividing and extracted respectively
A information to be matched, and then extract useful information.
For example, non-structured text a and non-structured text b is obtained.
Step 202:For each non-structured text, the label of non-structured text is removed, determination is removed label
Non-structured text be non-structured urtext.
Specifically, after getting non-structured text, the label of non-structured text unless each is needed, so as to reduce
Difficulty when information to be matched is extracted from non-structured text.
For example, it removes and defines boldface type, definition words direction and definition document type in non-structured text a
Html tag, the non-structured urtext a after obtaining removal html tag;
The html tag for removing definition annotation in non-structured text a, defining words direction and definition document type, obtains
Non-structured urtext b after removing html tag.
Step 203:At least one information to be matched is extracted from each urtext respectively.
Specifically, after pre-processing to each non-structured text, that is, the html tag of non-structured text is removed
Afterwards, at least one information to be matched can be extracted from urtext.
For example, information to be matched is extracted in urtext a as acceptance of the bid enterprise, didding enterprise and the acceptance of the bid amount of money;
It is the customer information for buying A product and client's letter of purchase B product that information to be matched is extracted in urtext b
Breath.
Step 204:Receive at least one target information that user sends;
Specifically, user can send at least one if you need to obtain useful information from non-structured urtext
Target information, so as to can carry out intelligent Matching after receiving each target information, lookup matches with each target information
Information to be matched.
For example, receiving the target information that user sends is bid winner and company winning a bid.
Step 205:For each of each urtext information to be matched, determining in each target information is
It is no that there are at least one close information, wherein the semantic similar value of the semanteme information close with each of information to be matched, greatly
In being equal to preset first threshold, if so, executing step 206, otherwise, step 208 is executed.
Specifically, it after the target information for receiving user's transmission, can be extracted to from non-structured urtext
Carry out intelligent Matching to be matched, so as to obtain the information to be matched to match with target information.
For example, preset first threshold is semantic similarity 80%.
Acceptance of the bid enterprise, didding enterprise and the acceptance of the bid amount of money are compared with the semanteme of bid winner and company winning a bid respectively,
Determine that information acceptance of the bid enterprise to be matched and the similar value for the semanteme that target information is bid winner and company winning a bid are more than or equal to
80%, it is possible to determine that target information is bid winner and company winning a bid, be close information.
Step 206:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, holding
Otherwise row step 207 executes step 209.
Specifically, after determination is more than or equal to first threshold with information semantic similarity to be matched there are at least one, also
It needs to be determined that whether the quantity of close information is more than or equal to preset second threshold so that improve determine information to be matched whether be
The accuracy of user's information needed.
For example, preset second threshold is 2;
The quantity of close information is 2, is equal to second threshold 2, it is possible to mark information acceptance of the bid to be matched enterprise.
Step 207:Information to be matched is marked, step 210 is executed.
Specifically, it is more than or equal to preset second threshold with analog information quantity similar in information semantic to be matched determining
Afterwards, information to be matched can be marked, so that user is corrected according to label.
Step 208:It determines in each target information with the presence or absence of at least one verification information, wherein information to be matched
Semanteme and the semantic similar value of each verification information, be more than or equal to preset second threshold, if so, executing step
209, otherwise, execute step 211.
Specifically, when there is no at least one close information in each target information, it is thus necessary to determine that whether each target is believed
It is less than first threshold with the presence or absence of at least one semantic similar value in breath and is more than or equal to the verification information of second threshold, if it is
It then needs data input database to be matched otherwise can determine that information non-user to be matched wants matched information.
Step 209:It will be in the preset database of data input to be matched.
Specifically, in determining each target information there are at least one with information semantic similar value to be matched less than first
Threshold value, and when being more than or equal to the verification information of second threshold, and at least one close letter is not present in each target information
Breath, but there are when at least one verification information, it needs by data input database to be matched, so that user is from database to letter
Breath is corrected, to improve the accuracy for obtaining information.
Step 210:Receive and record the feedback result of user's transmission.
Specifically, by data input database to be matched, or match information label is treated, it can be by manually being spot-check
Correction, i.e., thinking to be audited with match information similar in matching threshold, so as to return to audit correction as a result, so that perfect
With data bank, accuracy when matching target information again is improved, so that used manpower also can be fewer and fewer.
Step 211:It sends to user without match information.
Specifically, when i.e. there is no the phases that at least one semantic similar value is more than or equal to first threshold in each target information
Nearly information, or when being more than or equal to the verification information of second threshold there is no at least one semantic similar value, then can determine to be matched
Information is mismatched with each target information, can be sent to user without match information, so that user determines that it fails to match.
As shown in figure 3, the embodiment of the invention provides a kind of information matches devices, including:
Acquiring unit 301, for obtaining at least one non-structured urtext in advance;User is received to send at least
One target information;
Extraction unit 302, for being extracted from the urtext that acquiring unit 301 described in each obtains respectively
At least one information to be matched;
Matching unit 303, for what is extracted for extraction unit 302 described in each of each described urtext
The information to be matched determines in each described target information of the acquisition of acquiring unit 301 with the presence or absence of at least one
Close information, wherein the semantic similar value of the semanteme of the information to be matched and each close information is more than or equal to
Preset first threshold;If so, the label information to be matched;The user is being received according to the described to be matched of label
The feedback result that information is sent is when matching correct, to record the information to be matched and match with target information described in each.
In embodiments of the present invention, it after acquiring unit gets at least one non-structured urtext, needs to mention
Unit is taken to extract at least one information to be matched from each urtext, so that acquiring unit is receiving user's hair
When at least one target information sent, can by matching unit by each information to be matched respectively with each target information pair
Than, intelligent Matching go out with information to be matched similar in each close information semantic, without manually taking a substantial amount of time from non-
Information is extracted in structured text, so as to improve the efficiency for extracting information from non-structured text, and to label
Information to be matched carries out manual synchronizing, can also constantly improve the accuracy rate that information is extracted from non-structured text.
In an embodiment of the present invention, the matching unit is further used for executing:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing
Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission
Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
In an embodiment of the present invention, the matching unit is used for when at least one described close information is not present, really
Whether there is at least one verification information in each fixed described target information, wherein the semanteme of the information to be matched with it is every
The semantic similar value of one verification information, is more than or equal to preset second threshold, if so, S2 is executed, otherwise, to institute
User is stated to send without match information.
In an embodiment of the present invention, the matching unit, for receive the user according to label to
The first threshold is updated to the third threshold when feedback result of transmission is matching error and third threshold value by match information
Value;It records the information to be matched and each described target information mismatches.
In an embodiment of the present invention, the acquiring unit, for obtaining at least one non-structured text in advance;For
Each described non-structured text removes the hypertext markup language label H TML label of the non-structured text, and really
Surely the non-structured text for being removed html tag is non-structured urtext.
The each embodiment of the present invention at least has the advantages that:
1, it in an embodiment of the present invention, after getting at least one non-structured urtext, needs from each
At least one information to be matched is extracted in a urtext, so that at least one target information for receiving user's transmission
When, each information to be matched can be compared with each target information respectively, intelligent Matching goes out and each close information semantic phase
Close information to be matched, extracts information without manually taking a substantial amount of time from non-structured text, so as to improve
The efficiency of information is extracted from non-structured text, and manual synchronizing is carried out to the information to be matched of label, it can also be constantly
Improve the accuracy rate that information is extracted from non-structured text in ground.
2, in an embodiment of the present invention, in determining each target information there are at least one close information with it is to be matched
After information semantic is close, it is also necessary to determine whether the quantity of close information is not less than preset number of matches so that determine to
It is close with the semanteme of each target information degree with information, is not less than number of matches and if only if the quantity in close information
Afterwards, information to be matched can be marked, temporarily determines that information to be matched is required information, so that user is according to the to be matched of label
Information carries out manual synchronizing, to constantly improve letter while improving the efficiency for extracting information from non-structured text
Cease the accuracy rate extracted.
3, in an embodiment of the present invention, when at least one close information is not present in each target information, it is also necessary to
Determine in each target information, if there are the similar value of at least one and information semantic to be verified be more than or equal to second threshold and
Less than the verification information of first threshold, so that by data input database to be matched after there are verification information, so as to pass through
Manually determine whether to match with target information, avoids regarding as information to be matched after close information is not present to mismatch letter
Breath, and influence to extract useful information from non-structured text, and determining that there is no the information to match with target information
Afterwards, it needs to send to user without match information, so that user determines that it fails to match.
It 4, is in an embodiment of the present invention, matching error and third threshold value in the feedback result for receiving user's transmission
When, illustrate that information to be matched is mismatched with each target information and first threshold is too low, so as to cause have matched mistake to
With information, therefore, it is necessary to which first threshold is updated to third threshold value, too low threshold value is avoided the matching of excessive mistake occur, and
It records information to be matched and each target information mismatches, so as to update backstage match pattern, raising matches target information again
When accuracy.
5, it in an embodiment of the present invention, after getting at least one non-structured text, needs first to pre-process, i.e.,
The html tag of non-structured text is removed, so as to reduce the influence for extracting information to be matched, extracts letter to be matched to improve
The accuracy of breath.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity
Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation
Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non-
It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements,
It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment
Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged
Except there is also other identical factors in the process, method, article or apparatus that includes the element.
Finally, it should be noted that:The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate skill of the invention
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention,
Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.