CN105045853A - Industry data matching method and device - Google Patents

Industry data matching method and device Download PDF

Info

Publication number
CN105045853A
CN105045853A CN201510394585.4A CN201510394585A CN105045853A CN 105045853 A CN105045853 A CN 105045853A CN 201510394585 A CN201510394585 A CN 201510394585A CN 105045853 A CN105045853 A CN 105045853A
Authority
CN
China
Prior art keywords
participle
technical term
result set
matching result
duplicate removal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510394585.4A
Other languages
Chinese (zh)
Inventor
张立珠
宋伟伟
邵辉
张壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN201510394585.4A priority Critical patent/CN105045853A/en
Publication of CN105045853A publication Critical patent/CN105045853A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an industry data matching method and device. The method comprises the following steps: configuring a terminology lexicon, and determining data to be inquired; according to terminologies in the terminology lexicon, carrying out first matching word segmentation on the data to be inquired, and returning at least one terminology matching result set; carrying out duplicate removal on segmented words in the at least one terminology matching result set; according to the repeated frequencies of the segmented words, sorting the segmented words; and according to the segmented words subjected to the duplicate removal and the sequence of the segmented words, retrieving and returning result information so as to realize the accurate word segmentation of the terminologies.

Description

A kind of method and apparatus of industry Data Matching
Technical field
The present invention relates to computer application field, particularly a kind of method and apparatus of industry Data Matching.
Background technology
Search engine has become the critical services instrument of all trades and professions obtaining information, and it provides the main process of information for user: after receiving user data, carry out participle according to basic dictionary to user data, and according to participle, return the information retrieved.But for the industry data including technical term, prior art still can accurately not carry out participle to technical term.
Summary of the invention
The invention provides a kind of method and apparatus of industry Data Matching, achieve and accurately participle is carried out to technical term.
A method for industry Data Matching, configuration technical term dictionary, also comprises:
Determine data to be checked;
According to the technical term in described technical term dictionary, the first coupling participle is carried out to described data to be checked, and return at least one technical term matching result set;
Duplicate removal is carried out to the participle at least one technical term matching result set described;
According to the multiplicity of described participle, described participle is sorted;
According to the sequence of the participle after described duplicate removal and described participle, retrieve and return results information.
Preferably, said method comprises further: configure base dictionary;
Described return at least one technical term matching result set after, comprise further: judge whether at least one technical term matching result set described is empty,
If so, then according to described basic dictionary, the second coupling participle is carried out to described data to be checked, and return at least one basic word matching result set, and duplicate removal is carried out to the participle at least one basic word matching result set described;
Otherwise, perform and described duplicate removal carried out to the participle at least one technical term matching result set described.
Preferably, said method comprises further: configuration synonym dictionary;
Described return at least one technical term matching result set after, described duplicate removal is carried out to the participle at least one technical term matching result set described before, comprise further: according to described synonym dictionary, synonym coupling is carried out to the participle at least one technical term matching result set described, returns the technical term matching result set that at least one is new;
Participle at least one technical term matching result set described carries out duplicate removal, comprising: carry out duplicate removal to the participle at least one new technical term matching result set described.
Preferably, comprise further before returning results information in described retrieval:
Judge whether the number of described technical term matching result set is more than or equal to two, if so, then determine that the set retrieval priority that participle number is fewer is higher;
The described sequence according to the participle after described duplicate removal and described participle, retrieve and return results information, comprise: according to the sequence of the participle after the priority of at least one technical term matching result set described, described duplicate removal and described participle, retrieve successively and return object information corresponding to each technical term matching result set.
Preferably, described in return results information, comprising:
Duplicate removal is carried out to described object information;
According to the multiplicity of each information in described object information, each information described is sorted, and returns the object information after sequence.
A kind of industry data matching device, comprising:
Dispensing unit, for configuring technical term dictionary;
Coupling technical term unit, for determining data to be checked, according to the technical term in the technical term dictionary that described dispensing unit configures, the first coupling participle is carried out to the data to be checked that described determining unit is determined, and return at least one technical term matching result set;
First duplicate removal sequencing unit, carries out duplicate removal for the participle at least one technical term matching result set described in returning described coupling technical term unit, and according to the multiplicity of described participle, sorts to described participle;
Retrieval unit, for the sequence according to the participle after described duplicate removal sequencing unit duplicate removal and described participle, retrieves and returns results information.
Preferably, said apparatus comprises further: the first judging unit and the basic word unit of coupling, wherein,
Described dispensing unit, is further used for configure base dictionary;
Described first judging unit, for judging whether at least one technical term matching result set described in described coupling technical term unit is empty, if, then trigger the basic word unit of described coupling, otherwise, trigger described first duplicate removal sequencing unit and perform and described duplicate removal is carried out to the participle at least one technical term matching result set described;
The basic word unit of described coupling, for then according to described basic dictionary, carries out the second coupling participle to the described data to be checked determined, and returns at least one basic word matching result set;
Described first duplicate removal sequencing unit, the participle be further used at least one the basic word matching result set returned described coupling basic word unit carries out duplicate removal, and according to the multiplicity of described participle, sorts to described participle.
Preferably, said apparatus comprises further: coupling synonym unit, wherein,
Described dispensing unit, is further used for configuration synonym dictionary;
Described coupling synonym unit, for according to described synonym dictionary, the participle at least one the technical term matching result set return described coupling technical term unit carries out synonym coupling, returns the technical term matching result set that at least one is new;
Described first duplicate removal sequencing unit, the participle be further used at least one the new technical term matching result set returned described coupling synonym unit carries out duplicate removal, according to the multiplicity of described participle, sorts to described participle.
Preferably, said apparatus comprises further: the second judging unit, wherein,
Described second judging unit, for judging whether the number of described technical term matching result set is more than or equal to two, and triggers described retrieval unit;
Described retrieval unit, be further used for when described second judging unit judges that the number of described technical term matching result set is more than or equal to two, determine that the set retrieval priority that participle number is fewer is higher, and according to the sequence of the participle after the priority of at least one technical term matching result set described, described duplicate removal and described participle, retrieve successively and return object information corresponding to each technical term matching result set; When described second judging unit judges that the number of described technical term matching result set equals one, according to this technical term matching result set, retrieve and return object information corresponding to this technical term matching result set.
Preferably, said apparatus comprises further: the second duplicate removal sequencing unit, for carrying out duplicate removal to described object information, according to the multiplicity of each information in described object information, sorting, and return the object information after sequence to each information described.
Embodiments provide a kind of method and apparatus of industry Data Matching, by configuration technical term dictionary, after determining data to be checked, can according to the technical term in described technical term dictionary, first coupling participle is carried out to described data to be checked, and return at least one technical term matching result set, duplicate removal is carried out to the participle at least one technical term matching result set described; According to the multiplicity of described participle, described participle is sorted; According to the sequence of the participle after described duplicate removal and described participle, retrieve and return results information, by this process, achieving and accurately participle is carried out to technical term.
Accompanying drawing explanation
The method flow diagram of a kind of industry Data Matching that Fig. 1 provides for the embodiment of the present invention;
The method flow diagram of a kind of industry Data Matching that Fig. 2 provides for another embodiment of the present invention;
A kind of industry data matching device structural representation that Fig. 3 provides for the embodiment of the present invention;
A kind of industry data matching device structural representation that Fig. 4 provides for another embodiment of the present invention;
A kind of industry data matching device structural representation that Fig. 5 provides for further embodiment of this invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, the method for a kind of industry Data Matching that the embodiment of the present invention provides, the method can comprise the following steps:
Step 101: configuration technical term dictionary;
Step 102: determine data to be checked;
Step 103: according to the technical term in described technical term dictionary, carries out the first coupling participle to described data to be checked, and returns at least one technical term matching result set;
Step 104: duplicate removal is carried out to the participle at least one technical term matching result set described;
Step 105: according to the multiplicity of described participle, sorts to described participle;
Step 106: according to the sequence of the participle after described duplicate removal and described participle, retrieve and return results information.
In an embodiment of the invention, in order to make the method for the sector Data Matching also can be applied to non-industry Data Matching, namely apply widely to make the method have, the present invention comprises further: configure base dictionary; And after above-mentioned steps 103, comprise further: judge whether at least one technical term matching result set described is empty, if, then according to described basic dictionary, second coupling participle is carried out to described data to be checked, and return at least one basic word matching result set, and duplicate removal is carried out to the participle at least one basic word matching result set described; Otherwise, perform above-mentioned steps 104.
In an embodiment of the invention, carrying out in matching process, in order to make the synonym of technical term also can be retrieved, perfect as much as possible to make object information retrieve, improve recall precision, the embodiment of the present invention comprises further simultaneously: configuration synonym dictionary; After above-mentioned steps 103, before above-mentioned steps 104, comprise further: according to described synonym dictionary, synonym coupling is carried out to the participle at least one technical term matching result set described, returns the technical term matching result set that at least one is new; The embodiment of above-mentioned steps 104: duplicate removal is carried out to the participle at least one new technical term matching result set described.
In an embodiment of the invention, in order to increase the accuracy of result for retrieval information further, before above-mentioned steps 106, comprise further: judge whether the number of described technical term matching result set is more than or equal to two, if so, then determine that the set retrieval priority that participle number is fewer is higher; The embodiment of above-mentioned steps 106 is: according to the sequence of the participle after the priority of at least one technical term matching result set described, described duplicate removal and described participle, retrieve successively and return object information corresponding to each technical term matching result set.
In an embodiment of the invention, in order to reduce the repeatability of object information, the retrieval improving recall precision and user is further experienced, and after above-mentioned steps 106, comprises further: carry out duplicate removal to described object information; According to the multiplicity of each information in described object information, each information described is sorted, and returns the object information after sequence.
As shown in Figure 2, the embodiment of the present invention is for the Data Matching of Grain Trade, and launch the method that industry Data Matching is described, the method can comprise the following steps:
Step 201: the synonym dictionary that the technical term dictionary of configuration Grain Trade, the technical term of Grain Trade are corresponding and basic dictionary;
In this step can by setting up intelligent thesaurus, and this intelligent thesaurus all put in synonym dictionary corresponding for the technical term of the technical term dictionary of Grain Trade, Grain Trade and basic dictionary realize.In addition, technical term dictionary and synonym dictionary according to industry development, can increase technical term and synonym.
Step 202: determine data to be checked;
Step 203: according to the technical term in the technical term dictionary of Grain Trade, carries out the first coupling participle to described data to be checked, and returns at least one technical term matching result set;
Such as: in the technical term dictionary of Grain Trade, include rate of should depositing, compliance rate, lowest price purchase grain, lowest price purchase, grain of preventing stock prices from falling, rotation are maked somebody a mere figurehead, rotation headroom position etc. technical term, if user search " lowest price purchase grain ", participle can be carried out for " lowest price purchase grain " by this step, return the set of two technical term matching results, be respectively { lowest price purchase grain } and { lowest price purchase, grain }.
Step 204: judge whether at least one technical term matching result set described is empty, and if so, then order performs step 205 and step 206, otherwise, perform step 207;
Such as: user search data " card of surfing Internet tariff information ", do not include the technical term dictionary of Grain Trade, so, can not realize carrying out participle to these data by step 203, the technical term matching result set namely returned is for empty.
Step 205: according to described basic dictionary, carries out the second coupling participle to described data to be checked, and returns at least one basic word matching result set;
Step 206: duplicate removal is carried out to the participle at least one basic word matching result set described, and according to the participle after described duplicate removal, retrieves and return results information;
Step 205 and step 206 achieve carries out participle to amateur term, such as: when user search data " card of surfing Internet tariff information ", due to technical term participle cannot be carried out according to step 203, when searching for these data, then according to basic dictionary, multiple set can be divided into, as { card of surfing Internet, rate, information }, { online, card, rate, information } and { upper, network interface card rate, information } etc.So, duplicate removal can be carried out to " rate " and " information ", retrieve according to participle card of surfing Internet, rate, information, online, card, upper and network interface card rate etc., and return containing the one or more object information in these participles.
Step 207: according to described synonym dictionary, carries out synonym coupling to the participle at least one technical term matching result set described, returns the technical term matching result set that at least one is new;
Such as: in { lowest price purchase grain } and { lowest price purchase, grain }, the synonym of purchase can for buying, minimum synonym can be minimum, so, these synonyms are also added in these set, form { lowest price purchase grain, minimum valency buy grain } and { lowest price purchase, grain, minimum valency are bought }.
Step 208: judge whether the number of new technical term matching result set is more than or equal to two, if so, then performs step 209, otherwise, perform step 210;
Step 209: determine that the set retrieval priority that participle number is fewer is higher;
Such as: for these two set { lowest price purchase grain, minimum valency buy grain } and { lowest price purchase, grain, minimum valency are bought }, first set has 2 participles, second set has 3 participles, from participle, the set few containing participle is more close to user's input information, so, the set retrieval priority of { lowest price purchase grain, minimum valency buy grain } is higher than { lowest price purchase, grain, minimum valency are bought }.
Step 210: according to the participle in the technical term matching result set that this is new, retrieve and return results information;
If by participle, only obtain a set, so, only need, according to the participle in this set, carry out retrieving and returning results information.
Step 211: duplicate removal is carried out to the participle at least one new technical term matching result set described;
Such as: { lowest price purchase grain, minimum valency buy grain } and { lowest price purchase, grain, minimum valency are bought } these two set, do not exist the participle of repetition, then do not need duplicate removal; If in a retrieve data, containing depositing rate in a set, and also containing rate should be deposited in another one set, then need should the rate of depositing carry out duplicate removal to this.
Step 212: according to the multiplicity of described participle, sorts to described participle;
Such as: " should deposit rate " occurs in each set, and other participles only occur in one or two set, so, the multiplicity of " should deposit rate " is maximum, then will be classified as first term.
Step 213: according to the sequence of the participle after the priority of at least one technical term matching result set described, described duplicate removal and described participle, retrieve the object information that each technical term matching result set is corresponding successively;
Such as: information 1, information 2, information 3 are retrieved for { lowest price purchase grain, minimum valency buy grain }; { lowest price purchase, grain, minimum valency are bought } retrieves information 5, information 6, information 7, information 8.
Step 214: carry out duplicate removal to described object information, according to the multiplicity of each information in described object information, sorts to each information described, and returns the object information after sequence.
Such as: step 213 retrieved message 1, information 2 and information 5 repeat, information 3 and information 7 repeat, and so, the DISPLAY ORDER of the object information returned is followed successively by: information 1, information 3, information 6 and information 8.
As shown in Figure 3, in an embodiment of the invention, a kind of industry data matching device, this device comprises:
Dispensing unit 301, for configuring technical term dictionary;
Coupling technical term unit 302, for determining data to be checked, according to the technical term in the technical term dictionary that described dispensing unit configures, the first coupling participle is carried out to the data to be checked that described determining unit is determined, and return at least one technical term matching result set;
First duplicate removal sequencing unit 303, carries out duplicate removal for the participle at least one technical term matching result set described in returning described coupling technical term unit, and according to the multiplicity of described participle, sorts to described participle;
Retrieval unit 304, for the sequence according to the participle after described duplicate removal sequencing unit duplicate removal and described participle, retrieves and returns results information.
As shown in Figure 4, in an alternative embodiment of the invention, above-mentioned a kind of industry data matching device, comprises further: the first judging unit 401 and the basic word unit 402 of coupling, wherein,
Described dispensing unit 301, is further used for configure base dictionary;
Described first judging unit 401, for judging whether at least one technical term matching result set described in described coupling technical term unit is empty, if, then trigger the basic word unit 402 of described coupling, otherwise, trigger described first duplicate removal sequencing unit 303 and perform and described duplicate removal is carried out to the participle at least one technical term matching result set described;
The basic word unit 402 of described coupling, for then according to described basic dictionary, carries out the second coupling participle to described data to be checked, and returns at least one basic word matching result set;
Described first duplicate removal sequencing unit 303, the participle be further used at least one the basic word matching result set returned described coupling basic word unit carries out duplicate removal, and according to the multiplicity of described participle, sorts to described participle.
As described in Figure 5, in an alternative embodiment of the invention, above-mentioned a kind of industry data matching device, comprises further: coupling synonym unit 501, wherein,
Described dispensing unit 301, is further used for configuration synonym dictionary;
Described coupling synonym unit 501, for according to described synonym dictionary, participle at least one the technical term matching result set return described coupling technical term unit 302 carries out synonym coupling, returns the technical term matching result set that at least one is new;
Described first duplicate removal sequencing unit 303, the participle be further used at least one the new technical term matching result set returned described coupling synonym unit carries out duplicate removal, according to the multiplicity of described participle, sorts to described participle.
In still another embodiment of the process, above-mentioned a kind of industry data matching device, comprises further: the second judging unit (not shown), wherein,
Described second judging unit, for judging whether the number of described technical term matching result set is more than or equal to two, and triggers described retrieval unit;
Described retrieval unit 304, be further used for when described second judging unit judges that the number of described technical term matching result set is more than or equal to two, determine that the set retrieval priority that participle number is fewer is higher, and according to the sequence of the participle after the priority of at least one technical term matching result set described, described duplicate removal and described participle, retrieve successively and return object information corresponding to each technical term matching result set; When described second judging unit judges that the number of described technical term matching result set equals 1, according to this technical term matching result set, retrieve and return object information corresponding to this technical term matching result set.
In still another embodiment of the process, above-mentioned a kind of industry data matching device, comprise further: the second duplicate removal sequencing unit (not shown), for carrying out duplicate removal to described object information, according to the multiplicity of each information in described object information, each information described is sorted, and returns the object information after sequence.
Above-described embodiment at least can reach following beneficial effect:
1. by configuration technical term dictionary, after determining data to be checked, can according to the technical term in described technical term dictionary, first coupling participle is carried out to described data to be checked, and return at least one technical term matching result set, duplicate removal is carried out to the participle at least one technical term matching result set described; According to the multiplicity of described participle, described participle is sorted; According to the sequence of the participle after described duplicate removal and described participle, retrieve and return results information, by this process, achieving and accurately participle is carried out to technical term.
2. by configure base dictionary, judge whether at least one technical term matching result set described is empty, if, then according to described basic dictionary, second coupling participle is carried out to described data to be checked, and return at least one basic word matching result set, and duplicate removal is carried out to the participle at least one basic word matching result set described, make also can carry out participle for amateur term, thus make the method for the sector Data Matching also can be applied to non-industry Data Matching, namely the method has and applies widely.
If 3. do not carry out the Auto-matching of industry slang, a lot of irrelevant result can be returned when searching for, even the information that user really needs is flooded, and by the method for the embodiment of the present invention, participle is carried out to technical term, the retrieval of technical term can be made more pointed, thus add retrieval rate, meanwhile, the satisfaction of user is added.
4. by configuration synonym dictionary, and according to described synonym dictionary, synonym coupling is carried out to the participle at least one technical term matching result set described, return the technical term matching result set that at least one is new, by this process, can make the synonym of technical term also can be retrieved, thus it is more perfect that object information is retrieved, and also improves the recall precision of effective information simultaneously.
5. by arranging priority for a point set of words, namely judge whether the number of described technical term matching result set is more than or equal to two, if so, then determine that the set retrieval priority that participle number is fewer is higher, the accuracy of result for retrieval information is also improved accordingly; And by carrying out duplicate removal to described object information; According to the multiplicity of each information in described object information, sort, and return the object information after sequence, effectively reduce the repeatability of object information to each information described, the retrieval also improving recall precision and user is further experienced.
It should be noted that, in this article, the relational terms of such as first and second and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element " being comprised a 〃 〃 〃 〃 〃 〃 " limited by statement, and be not precluded within process, method, article or the equipment comprising described key element and also there is other same factor.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (10)

1. a method for industry Data Matching, is characterized in that, configuration technical term dictionary, also comprises:
Determine data to be checked;
According to the technical term in described technical term dictionary, the first coupling participle is carried out to described data to be checked, and return at least one technical term matching result set;
Duplicate removal is carried out to the participle at least one technical term matching result set described;
According to the multiplicity of described participle, described participle is sorted;
According to the sequence of the participle after described duplicate removal and described participle, retrieve and return results information.
2. method according to claim 1, is characterized in that, comprises further: configure base dictionary;
Described return at least one technical term matching result set after, comprise further: judge whether at least one technical term matching result set described is empty,
If so, then according to described basic dictionary, the second coupling participle is carried out to described data to be checked, and return at least one basic word matching result set, and duplicate removal is carried out to the participle at least one basic word matching result set described;
Otherwise, perform and described duplicate removal carried out to the participle at least one technical term matching result set described.
3. method according to claim 1, is characterized in that, comprises further: configuration synonym dictionary;
Described return at least one technical term matching result set after, described duplicate removal is carried out to the participle at least one technical term matching result set described before, comprise further: according to described synonym dictionary, synonym coupling is carried out to the participle at least one technical term matching result set described, returns the technical term matching result set that at least one is new;
Participle at least one technical term matching result set described carries out duplicate removal, comprising: carry out duplicate removal to the participle at least one new technical term matching result set described.
4. the method according to claim 1 or 3, is characterized in that, to comprise further before returning results information in described retrieval:
Judge whether the number of described technical term matching result set is more than or equal to two, if so, then determine that the set retrieval priority that participle number is fewer is higher;
The described sequence according to the participle after described duplicate removal and described participle, retrieve and return results information, comprise: according to the sequence of the participle after the priority of at least one technical term matching result set described, described duplicate removal and described participle, retrieve successively and return object information corresponding to each technical term matching result set.
5., according to the arbitrary described method of Claims 1-4, it is characterized in that, described in return results information, comprising:
Duplicate removal is carried out to described object information;
According to the multiplicity of each information in described object information, each information described is sorted, and returns the object information after sequence.
6. an industry data matching device, is characterized in that, comprising:
Dispensing unit, for configuring technical term dictionary;
Coupling technical term unit, for determining data to be checked, according to the technical term in the technical term dictionary that described dispensing unit configures, the first coupling participle is carried out to the data to be checked that described determining unit is determined, and return at least one technical term matching result set;
First duplicate removal sequencing unit, carries out duplicate removal for the participle at least one technical term matching result set described in returning described coupling technical term unit, and according to the multiplicity of described participle, sorts to described participle;
Retrieval unit, for the sequence according to the participle after described duplicate removal sequencing unit duplicate removal and described participle, retrieves and returns results information.
7. device according to claim 6, is characterized in that, comprises further: the first judging unit and the basic word unit of coupling, wherein,
Described dispensing unit, is further used for configure base dictionary;
Described first judging unit, for judging whether at least one technical term matching result set described in described coupling technical term unit is empty, if, then trigger the basic word unit of described coupling, otherwise, trigger described first duplicate removal sequencing unit and perform and described duplicate removal is carried out to the participle at least one technical term matching result set described;
The basic word unit of described coupling, for then according to described basic dictionary, carries out the second coupling participle to the described data to be checked determined, and returns at least one basic word matching result set;
Described first duplicate removal sequencing unit, the participle be further used at least one the basic word matching result set returned described coupling basic word unit carries out duplicate removal, and according to the multiplicity of described participle, sorts to described participle.
8. device according to claim 6, is characterized in that, comprises further: coupling synonym unit, wherein,
Described dispensing unit, is further used for configuration synonym dictionary;
Described coupling synonym unit, for according to described synonym dictionary, the participle at least one the technical term matching result set return described coupling technical term unit carries out synonym coupling, returns the technical term matching result set that at least one is new;
Described first duplicate removal sequencing unit, the participle be further used at least one the new technical term matching result set returned described coupling synonym unit carries out duplicate removal, according to the multiplicity of described participle, sorts to described participle.
9. the device according to claim 6 or 8, is characterized in that, comprises further: the second judging unit, wherein,
Described second judging unit, for judging whether the number of described technical term matching result set is more than or equal to two, and triggers described retrieval unit;
Described retrieval unit, be further used for when described second judging unit judges that the number of described technical term matching result set is more than or equal to two, determine that the set retrieval priority that participle number is fewer is higher, and according to the sequence of the participle after the priority of at least one technical term matching result set described, described duplicate removal and described participle, retrieve successively and return object information corresponding to each technical term matching result set; When described second judging unit judges that the number of described technical term matching result set equals one, according to this technical term matching result set, retrieve and return object information corresponding to this technical term matching result set.
10., according to the arbitrary described device of claim 6 to 9, it is characterized in that, comprise further:
Second duplicate removal sequencing unit, for carrying out duplicate removal to described object information, according to the multiplicity of each information in described object information, sorting to each information described, and returning the object information after sequence.
CN201510394585.4A 2015-07-07 2015-07-07 Industry data matching method and device Pending CN105045853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510394585.4A CN105045853A (en) 2015-07-07 2015-07-07 Industry data matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510394585.4A CN105045853A (en) 2015-07-07 2015-07-07 Industry data matching method and device

Publications (1)

Publication Number Publication Date
CN105045853A true CN105045853A (en) 2015-11-11

Family

ID=54452400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510394585.4A Pending CN105045853A (en) 2015-07-07 2015-07-07 Industry data matching method and device

Country Status (1)

Country Link
CN (1) CN105045853A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021553A (en) * 2017-09-30 2018-05-11 北京颐圣智能科技有限公司 Word treatment method, device and the computer equipment of disease term

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6976053B1 (en) * 1999-10-14 2005-12-13 Arcessa, Inc. Method for using agents to create a computer index corresponding to the contents of networked computers
CN101118562A (en) * 2006-08-21 2008-02-06 凌强 Herbalist doctor clinical reference system
CN102043812A (en) * 2009-10-13 2011-05-04 北京大学 Method and system for retrieving medical information
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102411568A (en) * 2010-09-20 2012-04-11 苏州同程旅游网络科技有限公司 Chinese word segmentation method based on travel industry feature word stock
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 Searching method and searching system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6976053B1 (en) * 1999-10-14 2005-12-13 Arcessa, Inc. Method for using agents to create a computer index corresponding to the contents of networked computers
CN101118562A (en) * 2006-08-21 2008-02-06 凌强 Herbalist doctor clinical reference system
CN102043812A (en) * 2009-10-13 2011-05-04 北京大学 Method and system for retrieving medical information
CN102411568A (en) * 2010-09-20 2012-04-11 苏州同程旅游网络科技有限公司 Chinese word segmentation method based on travel industry feature word stock
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 Searching method and searching system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
罗浩: "基于CLucene和Larbin的企业搜索引擎的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
郑阳等: "基于专业术语提取的中文分词方法", 《大众科技》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021553A (en) * 2017-09-30 2018-05-11 北京颐圣智能科技有限公司 Word treatment method, device and the computer equipment of disease term

Similar Documents

Publication Publication Date Title
CN104077306B (en) The result ordering method and system of a kind of search engine
US9117006B2 (en) Recommending keywords
CN105069086B (en) A kind of method and system for optimizing ecommerce commercial articles searching
CN102446326B (en) A kind of method of information pushing, system and equipment
WO2021057250A1 (en) Commodity search query strategy generation method and apparatus
CN105653562B (en) The calculation method and device of correlation between a kind of content of text and inquiry request
TW201401089A (en) Search ranking method and device based on click through rates
JP2013504118A (en) Information retrieval based on query semantic patterns
CN105205188A (en) Method and device for recommending purchase material suppliers
CN110046298A (en) Query word recommendation method and device, terminal device and computer readable medium
WO2011112238A1 (en) Determining word information entropies
CN109299383A (en) Generate method, apparatus, electronic equipment and the storage medium for recommending word
CN104636429A (en) Trademark category retrieval method and device
CN103559313B (en) Searching method and device
CN108920665A (en) Recommendation score method and device based on network structure and comment text
CN116308684B (en) Online shopping platform store information pushing method and system
CN104881504A (en) Information search method and device
CN103136213A (en) Method and device for providing related words
CN105740480A (en) Air ticket recommending method and system
CN106934679A (en) information matching method and device
CN109558462A (en) Data statistical approach and device
CN102737038B (en) Degree of association defining method and device, information providing method and device
CN105095203B (en) Determination, searching method and the server of synonym
CN110827101A (en) Shop recommendation method and device
CN113536156B (en) Search result ordering method, model building method, device, equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151111