CN108845985A - A kind of information matching method and information matches device - Google Patents

A kind of information matching method and information matches device Download PDF

Info

Publication number
CN108845985A
CN108845985A CN201810521818.6A CN201810521818A CN108845985A CN 108845985 A CN108845985 A CN 108845985A CN 201810521818 A CN201810521818 A CN 201810521818A CN 108845985 A CN108845985 A CN 108845985A
Authority
CN
China
Prior art keywords
information
matched
label
urtext
close
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810521818.6A
Other languages
Chinese (zh)
Other versions
CN108845985B (en
Inventor
李锐
于治楼
段成德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN201810521818.6A priority Critical patent/CN108845985B/en
Publication of CN108845985A publication Critical patent/CN108845985A/en
Application granted granted Critical
Publication of CN108845985B publication Critical patent/CN108845985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of information matching methods and information matches device, including:At least one non-structured urtext is obtained in advance;At least one information to be matched is extracted from urtext described in each respectively;Receive at least one target information that user sends;For information to be matched described in each of urtext described in each, it determines in each described target information with the presence or absence of at least one close information, wherein, the semantic similar value of the semanteme of the information to be matched and each close information, is more than or equal to preset first threshold;If so, the label information to be matched;When receiving the user according to the feedback result that the information to be matched of label is sent is to match correct, records the information to be matched and match with target information described in each.This programme can be improved the accuracy that useful information is extracted from non-structured text.

Description

A kind of information matching method and information matches device
Technical field
The present invention relates to technical field of data processing, in particular to a kind of information matching method and information matches device.
Background technique
With the development of internet, the possible profound impact of information and immense value on internet are gradually by people Pay attention to, business, economic and other field decision-making will increasingly rely on information and analysis.Therefore, how to comform a plurality of The problem of useful information has become urgent need to resolve is obtained in information.
Currently, non-structured text generally requires just be utilized after certain structuring processing, value is generated.Non- knot Structure mode can show many useful informations.By taking bid as an example, for example raw bid winner of information therein, acceptance of the bid time are past Past is that certain enterprises or government organs are interested.And the non-structured text of separate sources may use different vocabularies Up to these information, e.g., bid winner may be write as acceptance of the bid enterprise, the modes such as enterprise.
But the mode for extracting useful information from non-structured text at present relies primarily on artificial extraction.And it is non-structural The information content changed in text is often larger, when this meeting is so that expend excessive when extraction useful information from non-structured text Between, to reduce the efficiency for extracting useful information from non-structured text.
Summary of the invention
The embodiment of the invention provides a kind of information matching method and information matches device, can be improved from unstructured text The accuracy of useful information is extracted in this.
In a first aspect, the embodiment of the invention provides a kind of information matching methods, including:
At least one non-structured urtext is obtained in advance;
At least one information to be matched is extracted from urtext described in each respectively;
Receive at least one target information that user sends;
For information to be matched described in each of urtext described in each, each described target information is determined In whether there is at least one close information, wherein semanteme and the language of each close information of the information to be matched The similar value of justice is more than or equal to preset first threshold;
If so, the label information to be matched;
When receiving the user according to the feedback result that the information to be matched of label is sent is to match correct, note The information to be matched is recorded to match with target information described in each.
Preferably, before the label information to be matched, further comprise if it is later described:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
Preferably, it whether there is at least one close information in each described target information of the determination, wherein described The semantic similar value of the semanteme of information to be matched and each close information, is more than or equal to preset threshold value, if so, The information to be matched is marked, including:
When at least one described close information is not present,
It determines in each described target information with the presence or absence of at least one verification information, wherein the information to be matched Semanteme and the semantic similar value of each verification information, be more than or equal to preset second threshold, if so, executing Otherwise S2 is sent to the user without match information.
Preferably, after the label information to be matched, further comprise:
The user is being received according to the information to be matched of label, the feedback result of transmission is matching error and the When three threshold values, the first threshold is updated to the third threshold value;
It records the information to be matched and each described target information mismatches.
Preferably, described to obtain at least one non-structured urtext in advance, including:
At least one non-structured text is obtained in advance;
For non-structured text described in each, the hypertext markup language label of the non-structured text is removed Html tag, and determine that the non-structured text for being removed html tag is non-structured urtext.
Second aspect, the embodiment of the invention provides a kind of information matches devices, including:
Acquiring unit, for obtaining at least one non-structured urtext in advance;Receive user sends at least one A target information;
Extraction unit, for extracting at least one from the urtext that acquiring unit described in each obtains respectively A information to be matched;
Matching unit, for described in being extracted for extraction unit described in each of each described urtext to Match information, determining in each described target information of the acquiring unit acquisition whether there is at least one close information, Wherein, the semantic similar value of the semanteme of the information to be matched and each close information is more than or equal to preset the One threshold value;If so, the label information to be matched;It is sent receiving the user according to the information to be matched of label Feedback result be match it is correct when, record the information to be matched and match with target information described in each.
Preferably, the matching unit is further used for executing:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
Preferably, the matching unit, for determining described in each when at least one described close information is not present It whether there is at least one verification information in target information, wherein semanteme and each described verifying of the information to be matched The semantic similar value of information, is more than or equal to preset second threshold, if so, executing S2, otherwise, sends nothing to the user Match information.
Preferably, the matching unit is sent for receiving the user according to the information to be matched of label Feedback result be matching error and third threshold value when, the first threshold is updated to the third threshold value;Record it is described to Match information and each described target information mismatch.
Preferably, the acquiring unit, for obtaining at least one non-structured text in advance;For non-described in each Structured text removes the hypertext markup language label H TML label of the non-structured text, and determination is removed HTML The non-structured text of label is non-structured urtext.
In embodiments of the present invention, it after getting at least one non-structured urtext, needs from each original At least one information to be matched is extracted in beginning text, so that when receiving at least one target information of user's transmission, Each information to be matched can be compared with each target information respectively, intelligent Matching is out and similar in each close information semantic Information to be matched extracts information without manually taking a substantial amount of time from non-structured text, so as to improve from non- The efficiency of information is extracted in structured text, and manual synchronizing is carried out to the information to be matched of label, can also constantly be mentioned Height extracts the accuracy rate of information from non-structured text.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow chart for information matching method that one embodiment of the invention provides;
Fig. 2 is the flow chart for another information matching method that one embodiment of the invention provides;
Fig. 3 is a kind of structural schematic diagram for information matches device that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, the embodiment of the invention provides a kind of information matching methods, including:
Step 101:At least one non-structured urtext is obtained in advance;
Step 102:At least one information to be matched is extracted from urtext described in each respectively;
Step 103:Receive at least one target information that user sends;
Step 104:For information to be matched described in each of urtext described in each, determine described in each In target information whether there is at least one close information, wherein the information to be matched semanteme with it is close described in each The semantic similar value of information, is more than or equal to preset first threshold;
Step 105:If so, the label information to be matched;
Step 106:It is matching receiving the user according to the feedback result that the information to be matched of label is sent When correct, record the information to be matched and match with target information described in each.
In embodiments of the present invention, it after getting at least one non-structured urtext, needs from each original At least one information to be matched is extracted in beginning text, so that when receiving at least one target information of user's transmission, Each information to be matched can be compared with each target information respectively, intelligent Matching is out and similar in each close information semantic Information to be matched extracts information without manually taking a substantial amount of time from non-structured text, so as to improve from non- The efficiency of information is extracted in structured text, and manual synchronizing is carried out to the information to be matched of label, can also constantly be mentioned Height extracts the accuracy rate of information from non-structured text.
To sum up, by manually correcting after matching information to be matched corresponding with target information, and artificial rectify is recorded Just as a result, forming closed loop feedback information matching, when so as to match target information again, it can be improved and mentioned from non-structured text It wins the confidence the accuracy rate of breath.
In an embodiment of the present invention, described if it is later, before the label information to be matched, into one Step includes:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
In embodiments of the present invention, there are at least one close information and information to be matched in determining each target information After semantic similarity, it is also necessary to determine whether the quantity of close information is not less than preset number of matches, so as to determine letter to be matched The degree that is close with the semanteme of each target information is ceased, after the quantity in close information is not less than number of matches, i.e., Information to be matched can be marked, temporarily determines that information to be matched is required information, so as to be matched information of the user according to label Manual synchronizing is carried out, to constantly improve information while improving the efficiency for extracting information from non-structured text and mention The accuracy rate taken.
In an embodiment of the present invention, it whether there is at least one close letter in each described target information of the determination Breath, wherein the semantic similar value of the semanteme of the information to be matched and each close information is more than or equal to preset Threshold value, if so, the label information to be matched, including:
When at least one described close information is not present,
It determines in each described target information with the presence or absence of at least one verification information, wherein the information to be matched Semanteme and the semantic similar value of each verification information, be more than or equal to preset second threshold, if so, executing Otherwise S2 is sent to the user without match information.
In embodiments of the present invention, when at least one close information is not present in each target information, it is also necessary to determine In each target information, if be more than or equal to second threshold there are the similar value of at least one and information semantic to be verified and be less than The verification information of first threshold, so that by data input database to be matched after there are verification information, so that by artificial Determine whether to match with target information, avoid regarding as information to be matched after close information is not present to mismatch information, And influence to extract useful information from non-structured text, and after determining the information for being not present and matching with target information, It needs to send to user without match information, so that user determines that it fails to match.
In an embodiment of the present invention, after the label information to be matched, further comprise:
The user is being received according to the information to be matched of label, the feedback result of transmission is matching error and the When three threshold values, the first threshold is updated to the third threshold value, wherein the third threshold value is greater than the first threshold;
It records the information to be matched and each described target information mismatches.
In embodiments of the present invention, it when receiving the feedback result of user's transmission is matching error and third threshold value, says Bright information to be matched is mismatched with each target information and first threshold is too low, so as to cause the letter to be matched for having matched mistake Breath avoids too low threshold value from the matching of excessive mistake occur, and record therefore, it is necessary to which first threshold is updated to third threshold value Information to be matched and each target information mismatch, so as to update backstage match pattern, improve when matching target information again Accuracy.
In an embodiment of the present invention, described to obtain at least one non-structured urtext in advance, including:
At least one non-structured text is obtained in advance;
For non-structured text described in each, the hypertext markup language label of the non-structured text is removed Html tag, and determine that the non-structured text for being removed html tag is non-structured urtext.
In embodiments of the present invention, it after getting at least one non-structured text, needs first to pre-process, that is, removes The html tag of non-structured text extracts information to be matched to improve so as to reduce the influence for extracting information to be matched Accuracy.
It is anti-to the embodiment of the invention provides the first in order to more clearly illustrate technical solution of the present invention and advantage The information matching method of feedback is described in detail, as shown in Fig. 2, this method may comprise steps of:
Step 201:At least one non-structured text is obtained in advance.
Specifically, by obtaining each non-structured text, each non-structured text can be carried out dividing and extracted respectively A information to be matched, and then extract useful information.
For example, non-structured text a and non-structured text b is obtained.
Step 202:For each non-structured text, the label of non-structured text is removed, determination is removed label Non-structured text be non-structured urtext.
Specifically, after getting non-structured text, the label of non-structured text unless each is needed, so as to reduce Difficulty when information to be matched is extracted from non-structured text.
For example, it removes and defines boldface type, definition words direction and definition document type in non-structured text a Html tag, the non-structured urtext a after obtaining removal html tag;
The html tag for removing definition annotation in non-structured text a, defining words direction and definition document type, obtains Non-structured urtext b after removing html tag.
Step 203:At least one information to be matched is extracted from each urtext respectively.
Specifically, after pre-processing to each non-structured text, that is, the html tag of non-structured text is removed Afterwards, at least one information to be matched can be extracted from urtext.
For example, information to be matched is extracted in urtext a as acceptance of the bid enterprise, didding enterprise and the acceptance of the bid amount of money;
It is the customer information for buying A product and client's letter of purchase B product that information to be matched is extracted in urtext b Breath.
Step 204:Receive at least one target information that user sends;
Specifically, user can send at least one if you need to obtain useful information from non-structured urtext Target information, so as to can carry out intelligent Matching after receiving each target information, lookup matches with each target information Information to be matched.
For example, receiving the target information that user sends is bid winner and company winning a bid.
Step 205:For each of each urtext information to be matched, determining in each target information is It is no that there are at least one close information, wherein the semantic similar value of the semanteme information close with each of information to be matched, greatly In being equal to preset first threshold, if so, executing step 206, otherwise, step 208 is executed.
Specifically, it after the target information for receiving user's transmission, can be extracted to from non-structured urtext Carry out intelligent Matching to be matched, so as to obtain the information to be matched to match with target information.
For example, preset first threshold is semantic similarity 80%.
Acceptance of the bid enterprise, didding enterprise and the acceptance of the bid amount of money are compared with the semanteme of bid winner and company winning a bid respectively, Determine that information acceptance of the bid enterprise to be matched and the similar value for the semanteme that target information is bid winner and company winning a bid are more than or equal to 80%, it is possible to determine that target information is bid winner and company winning a bid, be close information.
Step 206:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, holding Otherwise row step 207 executes step 209.
Specifically, after determination is more than or equal to first threshold with information semantic similarity to be matched there are at least one, also It needs to be determined that whether the quantity of close information is more than or equal to preset second threshold so that improve determine information to be matched whether be The accuracy of user's information needed.
For example, preset second threshold is 2;
The quantity of close information is 2, is equal to second threshold 2, it is possible to mark information acceptance of the bid to be matched enterprise.
Step 207:Information to be matched is marked, step 210 is executed.
Specifically, it is more than or equal to preset second threshold with analog information quantity similar in information semantic to be matched determining Afterwards, information to be matched can be marked, so that user is corrected according to label.
Step 208:It determines in each target information with the presence or absence of at least one verification information, wherein information to be matched Semanteme and the semantic similar value of each verification information, be more than or equal to preset second threshold, if so, executing step 209, otherwise, execute step 211.
Specifically, when there is no at least one close information in each target information, it is thus necessary to determine that whether each target is believed It is less than first threshold with the presence or absence of at least one semantic similar value in breath and is more than or equal to the verification information of second threshold, if it is It then needs data input database to be matched otherwise can determine that information non-user to be matched wants matched information.
Step 209:It will be in the preset database of data input to be matched.
Specifically, in determining each target information there are at least one with information semantic similar value to be matched less than first Threshold value, and when being more than or equal to the verification information of second threshold, and at least one close letter is not present in each target information Breath, but there are when at least one verification information, it needs by data input database to be matched, so that user is from database to letter Breath is corrected, to improve the accuracy for obtaining information.
Step 210:Receive and record the feedback result of user's transmission.
Specifically, by data input database to be matched, or match information label is treated, it can be by manually being spot-check Correction, i.e., thinking to be audited with match information similar in matching threshold, so as to return to audit correction as a result, so that perfect With data bank, accuracy when matching target information again is improved, so that used manpower also can be fewer and fewer.
Step 211:It sends to user without match information.
Specifically, when i.e. there is no the phases that at least one semantic similar value is more than or equal to first threshold in each target information Nearly information, or when being more than or equal to the verification information of second threshold there is no at least one semantic similar value, then can determine to be matched Information is mismatched with each target information, can be sent to user without match information, so that user determines that it fails to match.
As shown in figure 3, the embodiment of the invention provides a kind of information matches devices, including:
Acquiring unit 301, for obtaining at least one non-structured urtext in advance;User is received to send at least One target information;
Extraction unit 302, for being extracted from the urtext that acquiring unit 301 described in each obtains respectively At least one information to be matched;
Matching unit 303, for what is extracted for extraction unit 302 described in each of each described urtext The information to be matched determines in each described target information of the acquisition of acquiring unit 301 with the presence or absence of at least one Close information, wherein the semantic similar value of the semanteme of the information to be matched and each close information is more than or equal to Preset first threshold;If so, the label information to be matched;The user is being received according to the described to be matched of label The feedback result that information is sent is when matching correct, to record the information to be matched and match with target information described in each.
In embodiments of the present invention, it after acquiring unit gets at least one non-structured urtext, needs to mention Unit is taken to extract at least one information to be matched from each urtext, so that acquiring unit is receiving user's hair When at least one target information sent, can by matching unit by each information to be matched respectively with each target information pair Than, intelligent Matching go out with information to be matched similar in each close information semantic, without manually taking a substantial amount of time from non- Information is extracted in structured text, so as to improve the efficiency for extracting information from non-structured text, and to label Information to be matched carries out manual synchronizing, can also constantly improve the accuracy rate that information is extracted from non-structured text.
In an embodiment of the present invention, the matching unit is further used for executing:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, executing Otherwise S1 executes S2;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback knot of transmission Fruit is to execute described to record the information to be matched and match with target information described in each when matching correct.
In an embodiment of the present invention, the matching unit is used for when at least one described close information is not present, really Whether there is at least one verification information in each fixed described target information, wherein the semanteme of the information to be matched with it is every The semantic similar value of one verification information, is more than or equal to preset second threshold, if so, S2 is executed, otherwise, to institute User is stated to send without match information.
In an embodiment of the present invention, the matching unit, for receive the user according to label to The first threshold is updated to the third threshold when feedback result of transmission is matching error and third threshold value by match information Value;It records the information to be matched and each described target information mismatches.
In an embodiment of the present invention, the acquiring unit, for obtaining at least one non-structured text in advance;For Each described non-structured text removes the hypertext markup language label H TML label of the non-structured text, and really Surely the non-structured text for being removed html tag is non-structured urtext.
The each embodiment of the present invention at least has the advantages that:
1, it in an embodiment of the present invention, after getting at least one non-structured urtext, needs from each At least one information to be matched is extracted in a urtext, so that at least one target information for receiving user's transmission When, each information to be matched can be compared with each target information respectively, intelligent Matching goes out and each close information semantic phase Close information to be matched, extracts information without manually taking a substantial amount of time from non-structured text, so as to improve The efficiency of information is extracted from non-structured text, and manual synchronizing is carried out to the information to be matched of label, it can also be constantly Improve the accuracy rate that information is extracted from non-structured text in ground.
2, in an embodiment of the present invention, in determining each target information there are at least one close information with it is to be matched After information semantic is close, it is also necessary to determine whether the quantity of close information is not less than preset number of matches so that determine to It is close with the semanteme of each target information degree with information, is not less than number of matches and if only if the quantity in close information Afterwards, information to be matched can be marked, temporarily determines that information to be matched is required information, so that user is according to the to be matched of label Information carries out manual synchronizing, to constantly improve letter while improving the efficiency for extracting information from non-structured text Cease the accuracy rate extracted.
3, in an embodiment of the present invention, when at least one close information is not present in each target information, it is also necessary to Determine in each target information, if there are the similar value of at least one and information semantic to be verified be more than or equal to second threshold and Less than the verification information of first threshold, so that by data input database to be matched after there are verification information, so as to pass through Manually determine whether to match with target information, avoids regarding as information to be matched after close information is not present to mismatch letter Breath, and influence to extract useful information from non-structured text, and determining that there is no the information to match with target information Afterwards, it needs to send to user without match information, so that user determines that it fails to match.
It 4, is in an embodiment of the present invention, matching error and third threshold value in the feedback result for receiving user's transmission When, illustrate that information to be matched is mismatched with each target information and first threshold is too low, so as to cause have matched mistake to With information, therefore, it is necessary to which first threshold is updated to third threshold value, too low threshold value is avoided the matching of excessive mistake occur, and It records information to be matched and each target information mismatches, so as to update backstage match pattern, raising matches target information again When accuracy.
5, it in an embodiment of the present invention, after getting at least one non-structured text, needs first to pre-process, i.e., The html tag of non-structured text is removed, so as to reduce the influence for extracting information to be matched, extracts letter to be matched to improve The accuracy of breath.
It should be noted that, in this document, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or equipment for including a series of elements not only includes those elements, It but also including other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged Except there is also other identical factors in the process, method, article or apparatus that includes the element.
Finally, it should be noted that:The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate skill of the invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.

Claims (10)

1. a kind of information matching method, which is characterized in that including:
At least one non-structured urtext is obtained in advance;
At least one information to be matched is extracted from urtext described in each respectively;
Receive at least one target information that user sends;
For information to be matched described in each of urtext described in each, determining in each described target information is No there are at least one close information, wherein the semanteme of the semanteme and each close information of the information to be matched Similar value is more than or equal to preset first threshold;
If so, the label information to be matched;
When receiving the user according to the feedback result that the information to be matched of label is sent is to match correct, institute is recorded Information to be matched is stated to match with target information described in each.
2. the method according to claim 1, wherein
Described if it is later, before the label information to be matched, further comprise:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, S1 is executed, Otherwise, S2 is executed;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback result of transmission is When matching correct, executes and described record the information to be matched and match with target information described in each.
3. according to the method described in claim 2, it is characterized in that,
It whether there is at least one close information in each described target information of the determination, wherein the information to be matched Semanteme and the semantic similar value of each close information, be more than or equal to preset threshold value, if so, label it is described to Match information, including:
When at least one described close information is not present,
It determines in each described target information with the presence or absence of at least one verification information, wherein the language of the information to be matched The semantic similar value of justice and each verification information, is more than or equal to preset second threshold, if so, S2 is executed, it is no Then, it sends to the user without match information.
4. the method according to claim 1, wherein
After the label information to be matched, further comprise:
The user is being received according to the information to be matched of label, the feedback result of transmission is matching error and third threshold When value, the first threshold is updated to the third threshold value;
It records the information to be matched and each described target information mismatches.
5. according to claim 1 to any method in 4, which is characterized in that
It is described to obtain at least one non-structured urtext in advance, including:
At least one non-structured text is obtained in advance;
For non-structured text described in each, the hypertext markup language label H TML mark of the non-structured text is removed Label, and determine that the non-structured text for being removed html tag is non-structured urtext.
6. a kind of information matches device, which is characterized in that including:
Acquiring unit, for obtaining at least one non-structured urtext in advance;Receive at least one mesh that user sends Mark information;
Extraction unit is waited for for extracting at least one from the urtext that acquiring unit described in each obtains respectively Match information;
Matching unit, it is described to be matched for being extracted for extraction unit described in each of each described urtext Information determines in each described target information of the acquiring unit acquisition with the presence or absence of at least one close information, wherein The semantic similar value of the semanteme of the information to be matched and each close information, is more than or equal to preset first threshold Value;If so, the label information to be matched;Receiving the anti-of the to be matched information transmission of the user according to label Presenting result is when matching correct, to record the information to be matched and match with target information described in each.
7. device according to claim 6, which is characterized in that
The matching unit is further used for executing:
S0:Determine whether the quantity of each close information is more than or equal to preset number of matches, if so, S1 is executed, Otherwise, S2 is executed;
S1:Execute the label information to be matched;
S2:It will be in the preset database of data input to be matched;
S2:The user is being received according to the information to be matched in the database, the feedback result of transmission is When matching correct, executes and described record the information to be matched and match with target information described in each.
8. device according to claim 7, which is characterized in that
The matching unit, for determining in each described target information when at least one described close information is not present With the presence or absence of at least one verification information, wherein the semanteme and the semanteme of each verification information of the information to be matched Similar value, be more than or equal to preset second threshold, if so, execute S2, otherwise, to the user send without match information.
9. device according to claim 6, which is characterized in that
The matching unit, for receiving the user according to the information to be matched of label, the feedback result of transmission When for matching error and third threshold value, the first threshold is updated to the third threshold value;Record the information to be matched with Each described target information mismatches.
10. according to the device any in claim 6 to 9, which is characterized in that
The acquiring unit, for obtaining at least one non-structured text in advance;For non-structured text described in each, The hypertext markup language label H TML label of the non-structured text is removed, and determination is removed the described non-of html tag Structured text is non-structured urtext.
CN201810521818.6A 2018-05-28 2018-05-28 Information matching method and information matching device Active CN108845985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810521818.6A CN108845985B (en) 2018-05-28 2018-05-28 Information matching method and information matching device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810521818.6A CN108845985B (en) 2018-05-28 2018-05-28 Information matching method and information matching device

Publications (2)

Publication Number Publication Date
CN108845985A true CN108845985A (en) 2018-11-20
CN108845985B CN108845985B (en) 2022-02-18

Family

ID=64207778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810521818.6A Active CN108845985B (en) 2018-05-28 2018-05-28 Information matching method and information matching device

Country Status (1)

Country Link
CN (1) CN108845985B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143374A (en) * 2019-12-31 2020-05-12 杭州依图医疗技术有限公司 Data auxiliary identification method, system, computing equipment and storage medium
CN111258295A (en) * 2020-01-15 2020-06-09 重庆长安汽车股份有限公司 System and method for verifying big data acquisition and uploading accuracy

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704453A (en) * 2017-10-23 2018-02-16 深圳市前海众兴电子商务有限公司 A kind of word semantic analysis, word semantic analysis terminal and storage medium
CN107784041A (en) * 2016-08-31 2018-03-09 北京国双科技有限公司 Judgement document's case by acquisition methods and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784041A (en) * 2016-08-31 2018-03-09 北京国双科技有限公司 Judgement document's case by acquisition methods and device
CN107704453A (en) * 2017-10-23 2018-02-16 深圳市前海众兴电子商务有限公司 A kind of word semantic analysis, word semantic analysis terminal and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUANGYUAN HUANG ET AL.: "Measuring Similarity between Sentence Fragments", 《2012 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS》 *
张德龙 等: "相似度技术在资料信息化中的应用研究", 《电子设计工程》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143374A (en) * 2019-12-31 2020-05-12 杭州依图医疗技术有限公司 Data auxiliary identification method, system, computing equipment and storage medium
CN111143374B (en) * 2019-12-31 2023-04-25 杭州依图医疗技术有限公司 Data auxiliary identification method, system, computing device and storage medium
CN111258295A (en) * 2020-01-15 2020-06-09 重庆长安汽车股份有限公司 System and method for verifying big data acquisition and uploading accuracy

Also Published As

Publication number Publication date
CN108845985B (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN108256074B (en) Verification processing method and device, electronic equipment and storage medium
CN106897559B (en) A kind of symptom and sign class entity recognition method and device towards multi-data source
CN107239440B (en) Junk text recognition method and device
US7281001B2 (en) Data quality system
CN104391881B (en) A kind of daily record analytic method and system based on segmentation methods
US8577155B2 (en) System and method for duplicate text recognition
CN102654874A (en) Bill data management method and system
CN111723870B (en) Artificial intelligence-based data set acquisition method, apparatus, device and medium
US20200090058A1 (en) Model variable candidate generation device and method
CN108845985A (en) A kind of information matching method and information matches device
CN113841156B (en) Control method and device based on image recognition
CN117520561A (en) Entity relation extraction method and system for knowledge graph construction in helicopter assembly field
CN111782892A (en) Similar character recognition method, device, apparatus and storage medium based on prefix tree
CN110569237A (en) System and method for realizing real-time data cleaning processing
CN110472231B (en) Method and device for identifying legal document case
CN109960707B (en) College recruitment data acquisition method and system based on artificial intelligence
US8903754B2 (en) Programmatically identifying branding within assets
CN112818693A (en) Automatic extraction method and system for electronic component model words
CN110362828B (en) Network information risk identification method and system
CN115618264A (en) Method, apparatus, device and medium for topic classification of data assets
CN115859191A (en) Fault diagnosis method and device, computer readable storage medium and computer equipment
CN115062615A (en) Financial field event extraction method and device
CN114265931A (en) Big data text mining-based consumer policy perception analysis method and system
CN111125319A (en) Enterprise basic law intelligent consultation terminal, system and method
CN111191529B (en) Method and system for processing abnormal worksheets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220121

Address after: 250100 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province

Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd.

Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province

Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181120

Assignee: INSPUR SOFTWARE Co.,Ltd.

Assignor: Shandong Inspur Scientific Research Institute Co.,Ltd.

Contract record no.: X2023980030294

Denomination of invention: An information matching method and information matching device

Granted publication date: 20220218

License type: Exclusive License

Record date: 20230110

EE01 Entry into force of recordation of patent licensing contract