CN109635285A - Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium - Google Patents
Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109635285A CN109635285A CN201811416724.9A CN201811416724A CN109635285A CN 109635285 A CN109635285 A CN 109635285A CN 201811416724 A CN201811416724 A CN 201811416724A CN 109635285 A CN109635285 A CN 109635285A
- Authority
- CN
- China
- Prior art keywords
- abbreviation
- enterprise
- referred
- full name
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000003860 storage Methods 0.000 title claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 48
- 238000004590 computer program Methods 0.000 claims description 28
- 238000000605 extraction Methods 0.000 claims description 23
- 230000008602 contraction Effects 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 11
- 241001269238 Data Species 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 6
- 238000012790 confirmation Methods 0.000 abstract description 10
- 230000001149 cognitive effect Effects 0.000 abstract description 6
- 230000001535 kindling effect Effects 0.000 description 10
- 230000005611 electricity Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 3
- 229910052782 aluminium Inorganic materials 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000004411 aluminium Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/268—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This application involves big data technical field, a kind of enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium are provided.Method includes: to carry out abbreviation identifying processing to the text comprising abbreviation to be identified, obtain alternative referred to as set, obtain word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set, according to the word frequency of each alternative abbreviation, target is determined referred to as, traverse default abbreviation abbreviation library, obtain the abbreviation abbreviation with target abbreviation matching, it obtains and determines that enterprise's full name and target abbreviation matching are successful when finding text of the target referred to as with enterprise full name co-occurrence with abbreviation referred to as corresponding enterprise's full name.On the one hand by being screened to the abbreviation of identification, improve the data accuracy in abbreviation cognitive phase, on the other hand after obtaining enterprise's full name referred to as corresponding with target, by confirmation target, referred to as whether co-occurrence is in one text with corresponding enterprise's full name, it is confirmed whether successful match, improves the accuracy of matching result.
Description
Technical field
This application involves big data technical fields, more particularly to a kind of enterprise's full name and abbreviation matching method, apparatus, meter
Calculate machine equipment and storage medium.
Background technique
With the development of big data technology, there is the analysis of public opinion technology, in the feelings of the full abbreviation corresponding relationship of unknown enterprise
Under condition, it is always the work that public sentiment alanysis can't steer clear of that the full abbreviation corresponding relationship of enterprise is excavated from text.In daily life,
Enterprise name longer for full name, habit are replaced with its abbreviation sanctified by usage, such as " Bank of China Co., Ltd. "
Often occur in the form of abbreviation, such as " Bank of China " or " middle row ".
The appearance of abbreviation brings certain difficulty, the processing of traditional full name and abbreviation Corresponding matching for the analysis of public opinion
Mode, according to text search mainly from webpage, according to similarity or other Rules Filterings candidate's text pair, or according to
Enterprise's full name generates according to word vector or term vector and referred to as has the problem that referred to as mistake is corresponding with mistake, causes enterprise's full name
The not high problem with abbreviation matching accuracy
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide it is a kind of can be improved matching accuracy enterprise's full name with
Abbreviation matching method, apparatus, computer equipment and storage medium.
A kind of enterprise's full name and abbreviation matching method, which comprises
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter
The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described
The success of target abbreviation matching.
The described pair of text comprising abbreviation to be identified carries out abbreviation identifying processing in one of the embodiments, obtains standby
Before choosing is referred to as gathered, further includes:
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, is obtained
Take the sample data set for carrying referred to as mark;
According to the sample data set, training obtains Named Entity Extraction Model, and the Named Entity Extraction Model is used for
Carry out abbreviation identifying processing.
It is described in one of the embodiments, to obtain in the alternative abbreviation set each alternative abbreviation in pre-set text library
Word frequency determine that target referred to as includes: according to the word frequency of each alternative abbreviation
When abbreviation alternative there are multiclass in the text comprising abbreviation to be identified, according to the word sequence of alternative abbreviation
Column, to alternatively referred to as classifying in the alternative abbreviation set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one of the embodiments, it is described according to the target referred to as, traverse default abbreviation abbreviation library, obtain with it is described
Before the abbreviation abbreviation of target abbreviation matching, further includes:
Obtain enterprise's full name library, according to the compositional model of enterprise's full name, to enterprise's full name in enterprise's full name library into
Row classification;
According to default contraction rule corresponding with the compositional model, abbreviation processing is carried out to all kinds of enterprise's full name,
Abbreviation corresponding with enterprise's full name is obtained referred to as to gather;
Referred to as gathered according to the abbreviation, constructs the default abbreviation abbreviation library corresponding with enterprise's full name library.
The basis default contraction rule corresponding with the compositional model in one of the embodiments, to all kinds of institutes
It states enterprise's full name and carries out abbreviation processing, before acquisition abbreviation corresponding with enterprise's full name is referred to as gathered, further includes:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name described in the sample data, according to the letter of enterprise described in the sample data
Claim, determines the default contraction rule corresponding with compositional model.
It is described when the text for finding the target abbreviation and enterprise's full name co-occurrence in one of the embodiments,
When, after determining that enterprise's full name and the target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and the target are referred to as updated to the full abbreviation matching data of preset enterprise
Library.
Enterprise's full name by successful match is referred to as updated to pre- with the target in one of the embodiments,
If the full abbreviation matching database of enterprise after, further includes:
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in the text and referred to as, is updated to the full abbreviation of preset enterprise
Matching database.
A kind of enterprise's full name and abbreviation matching device, described device include:
Alternative referred to as set obtains module, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains
It is alternative referred to as to gather;
Target abbreviation determining module, for obtaining in the alternative abbreviation set each alternative abbreviation in pre-set text library
Word frequency determines target referred to as according to the word frequency of each alternative abbreviation;
Abbreviation referred to as obtains module, for referred to as, traversing default abbreviation abbreviation library according to the target, obtains and the mesh
Mark the abbreviation of abbreviation matching referred to as;
Enterprise's full name obtains module, for obtaining enterprise's full name referred to as corresponding with the abbreviation;
Matching result determining module, for when finding the target referred to as and when the text of enterprise's full name co-occurrence,
Determine the successful match of enterprise's full name Yu the target abbreviation.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device performs the steps of when executing the computer program
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter
The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described
The success of target abbreviation matching.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
It is performed the steps of when row
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter
The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described
The success of target abbreviation matching.
Above-mentioned enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium, by comprising to be identified
The text of abbreviation carries out abbreviation identifying processing, obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as,
By traversal default abbreviation abbreviation library referred to as corresponding with enterprise, enterprise's full name with target abbreviation matching is obtained, and by looking into
Looking for text confirmation target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name with referred to as whether match into
Function.In entire scheme, on the one hand by screening to the abbreviation of identification, the data improved in abbreviation cognitive phase are accurate
Property, on the other hand after obtaining enterprise's full name referred to as corresponding with target, pass through confirmation target abbreviation and corresponding enterprise's full name
Whether co-occurrence in one text, is confirmed whether successful match, improves the accuracy of matching result.
Detailed description of the invention
Fig. 1 is the application scenario diagram of enterprise's full name and abbreviation matching method in one embodiment;
Fig. 2 is the flow diagram of enterprise's full name and abbreviation matching method in one embodiment;
Fig. 3 is the flow diagram of enterprise's full name and abbreviation matching method in another embodiment;
Fig. 4 is the structural block diagram of enterprise's full name and abbreviation matching device in one embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Enterprise's full name provided by the present application and abbreviation matching method, can be applied in application environment as shown in Figure 1.Its
In, terminal 102 is communicated with server 104 by network by network.Server 104 is to the text comprising abbreviation to be identified
Progress abbreviation identifying processing obtains alternative referred to as set, and each alternative abbreviation is in pre-set text library in the alternative referred to as set of acquisition
Word frequency target is determined referred to as according to the word frequency of each alternative abbreviation, referred to as according to target, traverse default abbreviation abbreviation library, obtain
Referred to as with the abbreviation of target abbreviation matching, enterprise's full name referred to as corresponding with abbreviation is obtained, when finding target abbreviation and enterprise
When the text of full name co-occurrence, enterprise's full name and the success of target abbreviation matching are determined, and by the enterprise's full name and target of successful match
Referred to as push to terminal 102.Wherein, terminal 102 can be, but not limited to be various personal computers, laptop, intelligent hand
Machine, tablet computer and portable wearable device, server 104 can be formed with the either multiple servers of independent server
Server cluster realize.
In one embodiment, it as shown in Fig. 2, providing a kind of enterprise's full name and abbreviation matching method, answers in this way
For being illustrated for the server in Fig. 1, comprising the following steps:
Step S200 carries out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains alternative referred to as set.
Referred to as refer to the brief word form being compressed by long, complicated title, wherein for being related to peculiar name
Word such as specific enterprise referred to as also belongs to official's appellation that official is duly admitted, for the succinct of expression, enterprise longer for full name
Title is generally described using the mode of abbreviation, especially requires stringent public sentiment text to occur in numbers of words such as headline
In, often recorded in the form of enterprise's abbreviation.Text comprising abbreviation to be identified can use web crawlers algorithm, obtain
It takes comprising public sentiment text to be identified, by obtaining the text comprising abbreviation to be identified for the progress subordinate sentence processing of public sentiment text,
In some embodiments, the text comprising abbreviation to be identified can be the mark of the news category public sentiment text including abbreviation to be identified
Lead part includes the sentence etc. of abbreviation to be identified in topic or news.Referred to as identification refers to by for abbreviation identifying processing
Named Entity Extraction Model, to comprising abbreviation to be identified text carry out feature vector extraction and name Entity recognition, obtain
The process for the multiple alternative abbreviations that may include into the text.Named Entity Extraction Model is by carrying the sample referred to as marked
Data set training obtains, and according to the feature vector of text, the name entity of identification is the abbreviation in text.Wherein, due to comprising
The succinct expression of the text of abbreviation to be identified, identified abbreviation is there are multiple, for example, the text comprising abbreviation to be identified are as follows:
" Space Dynamic: quasi- open be listed transfers the possession of west boat 70.94% equity of Aluminum ", can by the alternative abbreviation that abbreviation identifying processing obtains
It can include " space flight ", " power ", " Space Dynamic " and " west boat Aluminum ", " west boat aluminium ", " west boat " etc..In embodiment, it adopts
Abbreviation identifying processing is carried out with Named Entity Extraction Model, word segmentation processing is carried out by the text that will include abbreviation to be identified, is obtained
The sequence of terms for taking the text comprising abbreviation to be identified generates feature according to the sequence of terms of the text comprising abbreviation to be identified
Feature vector is inputted trained Named Entity Extraction Model in advance, identified in the text comprising abbreviation to be identified by vector
The multiple abbreviations that may include form alternative referred to as set.
Step S300 obtains word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set, according to each alternative
The word frequency of abbreviation determines target referred to as.
Multiple alternative abbreviations in alternative referred to as set can be to be extracted from one text, such as " refreshing kindling
Electricity ", " south mind kindling electricity ", in " refreshing kindling ", only one of them is correctly, to obtain and alternative referred to as relevant pre-set text
Library obtains word frequency of each alternative abbreviation in pre-set text library, when the word frequency of the alternative abbreviation of difference in one text is identical
Or when close, take the longest alternative abbreviation of string length as target referred to as, word frequency it is low then as target abbreviation can
Energy property is small, for example, identical as the word frequency of " kindling electricity " at " refreshing kindling electricity ", the word frequency of " south mind kindling is electric " is lower, therefore, it will " mind
Kindling electricity " is as target abbreviation.
Step S400 referred to as according to target traverses default abbreviation abbreviation library, obtains the abbreviation letter with target abbreviation matching
Claim.
Default abbreviation abbreviation library, which refers to, abridges according to existing enterprise's full name data according to the contraction rule of setting
The database that the abbreviation abbreviation data obtained after processing are constituted by setting contraction rule or can pass through in embodiment
Model abridge referred to as to realize the abbreviation processing of enterprise's full name.By traversing default abbreviation abbreviation library, obtain and target abbreviation phase
With abbreviation abbreviation when, referred to as abridge to obtain with from enterprise's full name by the target that identifies from the text comprising abbreviation to be identified
Abbreviation abbreviation matching, realize being associated with for enterprise's full name and enterprise's abbreviation.
Step S500 obtains enterprise's full name referred to as corresponding with abbreviation.
It obtains and presets the abbreviation associated enterprise's full name in abbreviation library library, closed according to the mapping of enterprise's full name and abbreviation abbreviation
System referred to as according to determining abbreviation can determine that enterprise's full name referred to as corresponding with the abbreviation can pass through in embodiment
The industrial and commercial data acquisition for obtaining each enterprise constructs enterprise's full name library according to enterprise's full name data of each enterprise to enterprise's full name.
Step S600 determines enterprise's full name and target letter when finding text of the target referred to as with enterprise full name co-occurrence
Claim successful match.
Co-occurrence refers to the phenomenon that feature vocabulary occurs jointly, and feature vocabulary here can be target referred to as and enterprise is complete
Claim, with target referred to as and enterprise's full name be search target, public sentiment data is scanned for, when get simultaneously comprising target abbreviation
When with the text of enterprise full name, target is determined referred to as and enterprise's full name successful match, conversely, if it does not exist simultaneously comprising target letter
Claim the text with enterprise's full name, it fails to match.
Above-mentioned enterprise's full name and abbreviation matching method, by being carried out at abbreviation identification to the text comprising abbreviation to be identified
Reason obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as, referred to as corresponding with enterprise by traversal
Default abbreviation abbreviation library obtains enterprise's full name with target abbreviation matching, and by searching for text confirmation target abbreviation and enterprise
Whether co-occurrence is in one text for full name, confirm enterprise's full name and referred to as whether successful match.In entire scheme, on the one hand by pair
The abbreviation of identification is screened, and the data accuracy in abbreviation cognitive phase is improved, and is on the other hand being obtained with target referred to as
After corresponding enterprise's full name, by confirmation target, referred to as whether co-occurrence in one text, is confirmed whether with corresponding enterprise's full name
Successful match, the word vector or term vector avoided only according to enterprise's full name generates referred to as, and directly carries out enterprise's full name
The matching bring error in library and enterprise's abbreviation library, improves the accuracy of matching result.
In some embodiments, the above method can also apply to the organ of constituted by law, cause, enterprise, corporations and its
The unit of his nomocracy may include the full name of government department, R&D institution, all kinds of universities and colleges, incorporated business, international organization etc.
With the matching of abbreviation.
In one embodiment, as shown in figure 3, step S200, carries out abbreviation identification to the text comprising abbreviation to be identified
Processing obtains before alternatively referred to as gathering, further includes:
Step S120 obtains multiple sample datas comprising enterprise's abbreviation.
Step S140 carries out the processing of abbreviation mark to each sample data, obtains according to the corresponding known abbreviation of each sample data
Take the sample data set for carrying referred to as mark.
Step S160, according to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is used for
Carry out abbreviation identifying processing.
Sample data comprising enterprise's abbreviation refers to the text of known abbreviation, and referred to as mark, which refers to, is divided sample data
Word processing, and known abbreviation is labeled sample data, and the sample data after mark is trained to term vector, the word to
Amount carries referred to as mark label, using the corresponding term vector of multiple sample datas as the input number of Named Entity Extraction Model
According to, Named Entity Extraction Model is trained, Named Entity Extraction Model be Bi-LSTM+CRF model, wherein Bi-LSTM+
CRF model is the output sequence that global optimum is obtained with CRF, is equivalent to the recycling to LSTM information, Bi-LSTM is called two-way
LSTM, while considering the feature (by rear to procedure extraction) of past feature (extracting by forward process) and future, phase
When in two LSTM, a positive list entries, a reversed list entries, then the output of the two is combined as final
Result.The training tool of term vector can be gensimword2vec, glove etc..It is named using input data training
Entity recognition model, after the completion of training, using accuracy rate as the evaluation parameter of Named Entity Extraction Model, when accuracy rate does not reach
When to given threshold range, model parameter is adjusted, the optimization of entity recognition model is named.Name Entity recognition mould
Type can be used for inputting the term vector of the text comprising abbreviation to be identified, and identification comprising that may include in the text of abbreviation to be identified
Abbreviation, and export alternative abbreviation that may be present, formed and alternative referred to as gathered.
In one embodiment, as shown in figure 3, step S300, obtains each alternative abbreviation in alternative referred to as set and presetting
Word frequency in text library determines that target referred to as includes: according to the word frequency of each alternative abbreviation
Step S320, when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified, according to the word of alternative abbreviation
Word order column, to alternatively referred to as classifying in alternative abbreviation set.
Step S330 obtains word frequency of each alternative abbreviation of each classification in pre-set text library,
Step S340 determines the target of each classification referred to as according to the word frequency of each alternative abbreviation of each classification.
Sequence of terms refers to the incidence relation between the multiple words and each word of composition word, can be true using sequence labelling method
Determine sequence of terms, according to sequence of terms, alternative abbreviation is sorted out.During the preliminary treatment of Named Entity Extraction Model not
Evitable to generate some noises, there are partial noise data in obtained multiple alternative abbreviations, in order to denoise, by default
Text library text carries out intersection denoising.By taking the alternative abbreviation of one type as an example, each spare abbreviation in such is obtained respectively
It is spare to filter out every one kind according to the length of sequence of terms and word frequency height for word frequency in multiple texts in pre-set text library
Target in abbreviation referred to as, the step for be denoising process, can first gather one kind similar word in one of the embodiments, such as
It when there is the word comprising identical sequence of terms, is classified as together, such as by " refreshing kindling electricity ", " south mind kindling electricity " is " refreshing
Kindling " is classified as one kind, then, needs to count word frequency of each of this kind word in multiple newsletter archives, when same
When the word frequency of different terms in one newsletter archive is identical or close, take sequence of terms length is longest to be used as target referred to as,
Word frequency it is low then become target abbreviation a possibility that it is small, the low spare abbreviation of word frequency can retain abbreviation data and word frequency information.
In one embodiment, as shown in figure 3, step S400 referred to as according to target traverses default abbreviation abbreviation library, obtains
Before taking the abbreviation abbreviation with target abbreviation matching, further includes:
Step S360 obtains enterprise's full name library, complete to the enterprise in enterprise's full name library according to the compositional model of enterprise's full name
Title is classified.
Step S370 carries out abbreviation processing to all kinds of enterprise's full name according to default contraction rule corresponding with compositional model,
Abbreviation corresponding with enterprise's full name is obtained referred to as to gather.
Step S380 referred to as gathers according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In embodiment, by the industrial and commercial data acquisition of each enterprise of acquisition to enterprise's full name, the enterprise according to each enterprise is complete
Claim data, constructs enterprise's full name library.According to the compositional model of enterprise's full name, the compositional model of full name can be divided into several
Class, the first kind are " places+name+category of employment+company attributes ", such as meet the having of this mode " Tencent, Shenzhen calculates
Machine System Co., Ltd ", " Jiangsu Ya Bang fuel limited liability company ";Second class is " name+category of employment+company attributes ",
There are also " names+place+company attributes ", and " name+company attributes " etc..When full name data generate abbreviation by abbreviation,
Number of words can will be limited as five words and following.The contraction rule for the spare abbreviation that this kind of full name directly generates usually has several classes,
One kind is only name, such as " Tencent ", " Ya Bang ";One kind is name+industry, and such as " Tencent's computer ", " sub- nation's dyestuff ", " sea is logical
Security ";One kind is location/location abbreviation+name, such as " Chinese safety ";It is name+company attributes abbreviation there are also one kind, such as " rises
News are holding ", " Apple Inc. ".In general, usually can be using abbreviation when name, industry attribute etc. are more than four words, or use
Word is taken out at preceding two word or interval, such as " middle petrochemical industry " (China PetroChemical Corporation).It is complete for one according to this generation logic
Claim, the set of the spare abbreviation of a series of production can be generated, such as Chang'an Co., Ltd, Ford Motor, " Ford " will be generated,
" Ford Motor ", " Chang'an Ford ", " Ford " these abbreviations referred to as collect to form abbreviation corresponding with enterprise's full name
It closes.Since enterprise full name inventory is in multiple enterprise's full name, referred to as gathered according to the abbreviation of enterprise's full name, building and enterprise's full name library
Abbreviation abbreviation library and enterprise's full name inventory are preset in mapping relations in corresponding default abbreviation abbreviation library.
In one embodiment, step S370 is complete to all kinds of enterprises according to default contraction rule corresponding with compositional model
Title carries out abbreviation processing, before acquisition abbreviation corresponding with enterprise's full name is referred to as gathered, further includes:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship.
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould
The corresponding default contraction rule of formula.
According to known business referred to as with the sample data of enterprise full name, available enterprise referred to as with the matching of enterprise full name
Relationship, referred to as according to the compositional model of enterprise's full name and corresponding enterprise, the contracting of enterprise's full name in the available sample data
Rule is write, by counting the contraction rule in multiple sample datas to enterprise's full name, determines that the default abbreviation to enterprise's full name is advised
Then, in some embodiments, enterprise's full name may exist multiple corresponding default contraction rules.
In one embodiment, as shown in figure 3, step S600, when find target referred to as with the text of enterprise full name co-occurrence
This when, after determining that enterprise's full name and target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching data of preset enterprise by step S720
Library.
Enterprise's full name of successful match and enterprise are referred to as updated to the full abbreviation matching database of preset enterprise, it can be so as to
When to all kinds of progress the analysis of public opinion comprising data, quick determination enterprise's full name referred to as corresponding with enterprise improves public sentiment point
Analyse efficiency.
In one embodiment, as shown in figure 3, step S600, enterprise's full name of successful match and target are referred to as updated
After to the full abbreviation matching database of preset enterprise, further includes:
Step S740, according to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship.
Step S760 extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation of preset enterprise
With database.
Text can be public sentiments text such as news etc., and predetermined keyword can be the word of the full abbreviation of enterprise for identification,
Such as " ... referred to as ... " in embodiment, by scanning news documents, especially headline etc., directly mentioned by preset rules
Full abbreviation matching is taken as a result, the corresponding entity of this kind of data is updated to the full abbreviation of preset enterprise by such as " A abbreviation B " etc.
With database.The case where having a large amount of public sentiment datas and a large amount of enterprise's full name data is being faced when searching, but can not be by text
In enterprise referred to as the problem of finding corresponding full name when, by searching for the full abbreviation matching database of preset enterprise, can keep away
Exempt to occur that a referred to as corresponding multiple full name, referred to as corresponding with full name there are the even complete referred to as completely unrelated feelings of ambiguity
Condition improves matched accuracy.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow,
These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps
Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3
Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps
Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively
It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately
It executes.
In one embodiment, as shown in figure 4, providing a kind of enterprise's full name and abbreviation matching device, comprising:
Alternative referred to as set obtains module 200, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains
It obtains and alternatively referred to as gathers;
Target abbreviation determining module 300, for obtaining in alternative referred to as set each alternative abbreviation in pre-set text library
Word frequency determines target referred to as according to the word frequency of each alternative abbreviation;
Abbreviation referred to as obtains module 400, for referred to as, traversing default abbreviation abbreviation library according to target, obtains and target letter
Claim matched abbreviation referred to as;
Enterprise's full name obtains module 500, for obtaining and referred to as corresponding enterprise's full name of abridging;
Matching result determining module 600, for determining enterprise when finding text of the target referred to as with enterprise full name co-occurrence
The successful match of industry full name and target abbreviation.
In one embodiment, enterprise's full name and abbreviation matching device, further include Named Entity Extraction Model training module,
For obtaining multiple sample datas comprising enterprise's abbreviation, according to the corresponding known abbreviation of each sample data, to each sample data
The processing of abbreviation mark is carried out, the sample data set for carrying referred to as mark is obtained, according to sample data set, it is real that training obtains name
Body identification model, Named Entity Extraction Model is for carrying out abbreviation identifying processing.
In one embodiment, target abbreviation determining module 300 is also used to exist in the text comprising abbreviation to be identified
When the alternative abbreviation of multiclass, alternatively referred to as classifying in alternative abbreviation set is obtained according to the sequence of terms of alternative abbreviation
Word frequency of each alternative abbreviation of each classification in pre-set text library, according to the word frequency of each alternative abbreviation of each classification,
Determine the target of each classification referred to as.
In one embodiment, enterprise's full name and abbreviation matching device further include default abbreviation abbreviation library building module, use
Classified according to the compositional model of enterprise's full name to enterprise's full name in enterprise's full name library in obtaining enterprise's full name library, according to
Default contraction rule corresponding with compositional model carries out abbreviation processing to all kinds of enterprise's full name, obtains corresponding with enterprise's full name
Abbreviation is referred to as gathered, and is referred to as gathered according to abbreviation, is constructed default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it presets abbreviation library of abridging and constructs module, be also used to obtain comprising enterprise's full name and abbreviation
Sample data with relationship analyzes the compositional model of enterprise's full name in sample data, referred to as according to enterprise in sample data, determines
Default contraction rule corresponding with compositional model.
In one embodiment, enterprise's full name and abbreviation matching device, further include the full abbreviation matching database update of enterprise
Module, for enterprise's full name of successful match and target to be referred to as updated to the full abbreviation matching database of preset enterprise.
The full abbreviation matching database update module of enterprise in one embodiment is also used to according to predetermined keyword, search
Text comprising enterprise's full name Yu enterprise's abbreviation matching relationship extracts matched enterprise's full name and enterprise in text and referred to as, updates
To the full abbreviation matching database of preset enterprise.
Above-mentioned enterprise's full name and abbreviation matching device, by being carried out at abbreviation identification to the text comprising abbreviation to be identified
Reason obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as, referred to as corresponding with enterprise by traversal
Default abbreviation abbreviation library obtains enterprise's full name with target abbreviation matching, and by searching for text confirmation target abbreviation and enterprise
Whether co-occurrence is in one text for full name, confirm enterprise's full name and referred to as whether successful match.In entire scheme, on the one hand by pair
The abbreviation of identification is screened, and the data accuracy in abbreviation cognitive phase is improved, and is on the other hand being obtained with target referred to as
After corresponding enterprise's full name, by confirmation target, referred to as whether co-occurrence in one text, is confirmed whether with corresponding enterprise's full name
Successful match improves the accuracy of matching result.
Specific restriction about enterprise's full name and abbreviation matching device may refer to above for enterprise's full name and abbreviation
The restriction of matching process, details are not described herein.Modules in above-mentioned enterprise's full name and abbreviation matching device can whole or portion
Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment
In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold
The corresponding operation of the above modules of row.
In one embodiment, a kind of computer equipment is provided, which can be terminal, internal structure
Figure can be as shown in Figure 5.The computer equipment includes processor, the memory, network interface, display connected by system bus
Screen and input unit.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment is deposited
Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer journey
Sequence.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The network interface of machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor with
Realize a kind of enterprise's full name and abbreviation matching method.The display screen of the computer equipment can be liquid crystal display or electronic ink
Water display screen, the input unit of the computer equipment can be the touch layer covered on display screen, be also possible to computer equipment
Key, trace ball or the Trackpad being arranged on shell can also be external keyboard, Trackpad or mouse etc..
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with
Computer program, the processor perform the steps of when executing computer program
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set is obtained, according to the word of each alternative abbreviation
Frequently, determine target referred to as;
Referred to as according to target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with abbreviation;
When finding text of the target referred to as with enterprise full name co-occurrence, determine enterprise's full name and target abbreviation matching at
Function.
In one embodiment, it is also performed the steps of when processor executes computer program
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition carries
The referred to as sample data set of mark;
According to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is for carrying out referred to as
Identifying processing.
In one embodiment, it is also performed the steps of when processor executes computer program
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified
Alternatively referred to as classifying in alternative referred to as set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one embodiment, it is also performed the steps of when processor executes computer program
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name
Class;
According to default contraction rule corresponding with compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, obtains and looks forward to
The corresponding abbreviation of industry full name is referred to as gathered;
Referred to as gathered according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it is also performed the steps of when processor executes computer program
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould
The corresponding default contraction rule of formula.
In one embodiment, it is also performed the steps of when processor executes computer program
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching database of preset enterprise.
In one embodiment, it is also performed the steps of when processor executes computer program
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation matching database of preset enterprise.
The above-mentioned computer equipment for realizing enterprise's full name and abbreviation matching method, by including abbreviation to be identified
Text carries out abbreviation identifying processing, obtain it is alternative referred to as gather, according to the word frequency of alternative abbreviation, obtain target referred to as, by time
Default abbreviation abbreviation library referred to as corresponding with enterprise is gone through, obtains enterprise's full name with target abbreviation matching, and by searching for text
Confirming target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name and referred to as whether successful match.Entirely
In scheme, on the one hand by screening to the abbreviation of identification, the data accuracy in abbreviation cognitive phase, another party are improved
Face obtain with after target referred to as corresponding enterprise's full name, by confirmation target abbreviation and corresponding enterprise's full name whether co-occurrence in
One text is confirmed whether successful match, improves the accuracy of matching result.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program performs the steps of when being executed by processor
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set is obtained, according to the word of each alternative abbreviation
Frequently, determine target referred to as;
Referred to as according to target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with abbreviation;
When finding text of the target referred to as with enterprise full name co-occurrence, determine enterprise's full name and target abbreviation matching at
Function.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition carries
The referred to as sample data set of mark;
According to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is for carrying out referred to as
Identifying processing.
In one embodiment, it is also performed the steps of when computer program is executed by processor
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified
Alternatively referred to as classifying in alternative referred to as set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name
Class;
According to default contraction rule corresponding with compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, obtains and looks forward to
The corresponding abbreviation of industry full name is referred to as gathered;
Referred to as gathered according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould
The corresponding default contraction rule of formula.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching database of preset enterprise.
In one embodiment, it is also performed the steps of when computer program is executed by processor
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation matching database of preset enterprise.
The above-mentioned computer readable storage medium for realizing enterprise's full name and abbreviation matching method, by comprising wait know
The text of other abbreviation carries out abbreviation identifying processing, obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target letter
Claim, by traversal default abbreviation abbreviation library referred to as corresponding with enterprise, obtains enterprise's full name with target abbreviation matching, and pass through
Searching text confirmation target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name with referred to as whether match into
Function.In entire scheme, on the one hand by screening to the abbreviation of identification, the data improved in abbreviation cognitive phase are accurate
Property, on the other hand after obtaining enterprise's full name referred to as corresponding with target, pass through confirmation target abbreviation and corresponding enterprise's full name
Whether co-occurrence in one text, is confirmed whether successful match, improves the accuracy of matching result.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable
It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen
Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise
Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not
Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art,
Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection scope of the application.
Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. a kind of enterprise's full name and abbreviation matching method, which comprises
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative abbreviation
Word frequency determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding text of the target referred to as with enterprise's full name co-occurrence, enterprise's full name and the target are determined
Abbreviation matching success.
2. the method according to claim 1, wherein the described pair of text comprising abbreviation to be identified carries out abbreviation knowledge
Other places reason, obtains before alternatively referred to as gathering, further includes:
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition is taken
Sample data set with abbreviation mark;
According to the sample data set, training obtains Named Entity Extraction Model, and the Named Entity Extraction Model is for carrying out
Abbreviation identifying processing.
3. the method according to claim 1, wherein described obtain each alternative abbreviation in the alternative abbreviation set
Word frequency in pre-set text library determines that target referred to as includes: according to the word frequency of each alternative abbreviation
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified
Alternatively referred to as classifying in the alternative abbreviation set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
4. the method according to claim 1, wherein described according to the target abbreviation, the default abbreviation letter of traversal
Before the abbreviation abbreviation of title library, acquisition and the target abbreviation matching, further includes:
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name
Class;
According to default contraction rule corresponding with the compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, is obtained
Abbreviation corresponding with enterprise's full name is referred to as gathered;
Referred to as gathered according to the abbreviation, constructs the default abbreviation abbreviation library corresponding with enterprise's full name library.
5. according to the method described in claim 4, it is characterized in that, the basis default abbreviation corresponding with the compositional model
Rule carries out abbreviation processing to all kinds of enterprise's full name, before obtaining abbreviation corresponding with enterprise's full name referred to as set,
Further include:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name described in the sample data, referred to as according to enterprise described in the sample data,
Determine default contraction rule corresponding with the compositional model.
6. the method according to claim 1, wherein it is described when find the target referred to as with it is described corresponding
When the text of enterprise's full name co-occurrence, after determining that enterprise's full name and the target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and the target are referred to as updated to the full abbreviation matching database of preset enterprise.
7. according to the method described in claim 6, it is characterized in that, enterprise's full name by successful match and the mesh
Mark is referred to as updated to after the full abbreviation matching database of preset enterprise, further includes:
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in the text and referred to as, is updated to the full abbreviation matching of preset enterprise
Database.
8. a kind of enterprise's full name and abbreviation matching device, which is characterized in that described device includes:
Alternative referred to as set obtains module, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains alternative
Referred to as gather;
Target abbreviation determining module, for obtaining word of each alternative abbreviation in pre-set text library in the alternative abbreviation set
Frequently, according to the word frequency of each alternative abbreviation, determine target referred to as;
Abbreviation referred to as obtains module, for referred to as, traversing default abbreviation abbreviation library according to the target, obtains and the target letter
Claim matched abbreviation referred to as;
Enterprise's full name obtains module, for obtaining enterprise's full name referred to as corresponding with the abbreviation;
Matching result determining module finds the target abbreviation and the text of corresponding enterprise's full name co-occurrence for working as
When, determine the successful match of enterprise's full name Yu the target abbreviation.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811416724.9A CN109635285A (en) | 2018-11-26 | 2018-11-26 | Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811416724.9A CN109635285A (en) | 2018-11-26 | 2018-11-26 | Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109635285A true CN109635285A (en) | 2019-04-16 |
Family
ID=66069599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811416724.9A Pending CN109635285A (en) | 2018-11-26 | 2018-11-26 | Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109635285A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339319A (en) * | 2020-03-02 | 2020-06-26 | 北京百度网讯科技有限公司 | Disambiguation method and device for enterprise name, electronic equipment and storage medium |
CN111723575A (en) * | 2020-06-12 | 2020-09-29 | 杭州未名信科科技有限公司 | Method, device, electronic equipment and medium for recognizing text |
CN111783460A (en) * | 2020-06-15 | 2020-10-16 | 苏宁金融科技(南京)有限公司 | Enterprise abbreviation extraction method and device, computer equipment and storage medium |
CN111782907A (en) * | 2020-07-01 | 2020-10-16 | 北京知因智慧科技有限公司 | News classification method and device and electronic equipment |
CN111914093A (en) * | 2019-05-09 | 2020-11-10 | 深圳中兴飞贷金融科技有限公司 | Data processing method and apparatus, storage medium, and electronic device |
CN112036172A (en) * | 2020-09-09 | 2020-12-04 | 平安科技(深圳)有限公司 | Entity identification method and device based on abbreviated data of model and computer equipment |
CN112613299A (en) * | 2020-12-25 | 2021-04-06 | 北京知因智慧科技有限公司 | Method and device for constructing enterprise synonym library and electronic equipment |
CN112925961A (en) * | 2019-12-06 | 2021-06-08 | 北京海致星图科技有限公司 | Intelligent question and answer method and device based on enterprise entity |
CN113468315A (en) * | 2021-09-02 | 2021-10-01 | 北京华云安信息技术有限公司 | Vulnerability vendor name matching method |
CN113705194A (en) * | 2021-04-12 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Extraction method and electronic equipment for short |
CN114048304A (en) * | 2021-10-26 | 2022-02-15 | 盐城金堤科技有限公司 | Effective keyword determination method and device, storage medium and electronic equipment |
CN114676319A (en) * | 2022-03-01 | 2022-06-28 | 广州云趣信息科技有限公司 | Method and device for acquiring name of merchant and readable storage medium |
CN115309863A (en) * | 2022-08-09 | 2022-11-08 | 中电金信软件有限公司 | Method and device for expanding list content, electronic equipment and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100161663A1 (en) * | 2008-12-19 | 2010-06-24 | International Business Machines Corporation | Searching For A Business Name In A Database |
CN108170662A (en) * | 2016-12-07 | 2018-06-15 | 富士通株式会社 | The disambiguation method of breviaty word and disambiguation equipment |
CN108460016A (en) * | 2018-02-09 | 2018-08-28 | 中云开源数据技术(上海)有限公司 | A kind of entity name analysis recognition method |
-
2018
- 2018-11-26 CN CN201811416724.9A patent/CN109635285A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100161663A1 (en) * | 2008-12-19 | 2010-06-24 | International Business Machines Corporation | Searching For A Business Name In A Database |
CN108170662A (en) * | 2016-12-07 | 2018-06-15 | 富士通株式会社 | The disambiguation method of breviaty word and disambiguation equipment |
CN108460016A (en) * | 2018-02-09 | 2018-08-28 | 中云开源数据技术(上海)有限公司 | A kind of entity name analysis recognition method |
Non-Patent Citations (1)
Title |
---|
王敬东 等: "基于逆序扫描和共现分析的缩略语快速提取算法", 计算机应用研究, vol. 35, no. 3, pages 700 - 704 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111914093A (en) * | 2019-05-09 | 2020-11-10 | 深圳中兴飞贷金融科技有限公司 | Data processing method and apparatus, storage medium, and electronic device |
CN112925961A (en) * | 2019-12-06 | 2021-06-08 | 北京海致星图科技有限公司 | Intelligent question and answer method and device based on enterprise entity |
CN111339319A (en) * | 2020-03-02 | 2020-06-26 | 北京百度网讯科技有限公司 | Disambiguation method and device for enterprise name, electronic equipment and storage medium |
CN111339319B (en) * | 2020-03-02 | 2023-08-04 | 北京百度网讯科技有限公司 | Enterprise name disambiguation method and device, electronic equipment and storage medium |
CN111723575A (en) * | 2020-06-12 | 2020-09-29 | 杭州未名信科科技有限公司 | Method, device, electronic equipment and medium for recognizing text |
CN111783460A (en) * | 2020-06-15 | 2020-10-16 | 苏宁金融科技(南京)有限公司 | Enterprise abbreviation extraction method and device, computer equipment and storage medium |
CN111782907A (en) * | 2020-07-01 | 2020-10-16 | 北京知因智慧科技有限公司 | News classification method and device and electronic equipment |
CN111782907B (en) * | 2020-07-01 | 2024-03-01 | 北京知因智慧科技有限公司 | News classification method and device and electronic equipment |
CN112036172B (en) * | 2020-09-09 | 2022-04-15 | 平安科技(深圳)有限公司 | Entity identification method and device based on abbreviated data of model and computer equipment |
CN112036172A (en) * | 2020-09-09 | 2020-12-04 | 平安科技(深圳)有限公司 | Entity identification method and device based on abbreviated data of model and computer equipment |
CN112613299A (en) * | 2020-12-25 | 2021-04-06 | 北京知因智慧科技有限公司 | Method and device for constructing enterprise synonym library and electronic equipment |
CN113705194A (en) * | 2021-04-12 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Extraction method and electronic equipment for short |
CN113468315B (en) * | 2021-09-02 | 2021-12-10 | 北京华云安信息技术有限公司 | Vulnerability vendor name matching method |
CN113468315A (en) * | 2021-09-02 | 2021-10-01 | 北京华云安信息技术有限公司 | Vulnerability vendor name matching method |
CN114048304A (en) * | 2021-10-26 | 2022-02-15 | 盐城金堤科技有限公司 | Effective keyword determination method and device, storage medium and electronic equipment |
CN114676319A (en) * | 2022-03-01 | 2022-06-28 | 广州云趣信息科技有限公司 | Method and device for acquiring name of merchant and readable storage medium |
CN114676319B (en) * | 2022-03-01 | 2023-11-24 | 广州云趣信息科技有限公司 | Method and device for acquiring merchant name and readable storage medium |
CN115309863A (en) * | 2022-08-09 | 2022-11-08 | 中电金信软件有限公司 | Method and device for expanding list content, electronic equipment and readable storage medium |
CN115309863B (en) * | 2022-08-09 | 2023-09-19 | 中电金信软件有限公司 | Expansion method and device of list content, electronic equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635285A (en) | Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium | |
CN109684543B (en) | User behavior prediction and information delivery method, device, server and storage medium | |
US8468167B2 (en) | Automatic data validation and correction | |
CN109815333A (en) | Information acquisition method, device, computer equipment and storage medium | |
AU2013329525B2 (en) | System and method for recursively traversing the internet and other sources to identify, gather, curate, adjudicate, and qualify business identity and related data | |
US20210366055A1 (en) | Systems and methods for generating accurate transaction data and manipulation | |
CN109389303A (en) | Querying method, device, computer equipment and the storage medium of business connection | |
CN105512180A (en) | Search recommendation method and device | |
CN110457680A (en) | Entity disambiguation method, device, computer equipment and storage medium | |
CN112651236B (en) | Method and device for extracting text information, computer equipment and storage medium | |
Zhang et al. | Multimodal pre-training based on graph attention network for document understanding | |
JP2024037719A (en) | Domain-specific language interpreter and interactive visual interface for rapid screening | |
CN110362798B (en) | Method, apparatus, computer device and storage medium for judging information retrieval analysis | |
Sun et al. | Analyzing Cross-domain Transportation Big Data of New York City with Semi-supervised and Active Learning. | |
CN111400340B (en) | Natural language processing method, device, computer equipment and storage medium | |
CN109033427A (en) | The screening technique and device of stock, computer equipment and readable storage medium storing program for executing | |
CN117011581A (en) | Image recognition method, medium, device and computing equipment | |
CN104102704A (en) | System control displaying method and system control displaying device | |
US20230138491A1 (en) | Continuous learning for document processing and analysis | |
Altun et al. | SKETRACK: Stroke‐Based Recognition of Online Hand‐Drawn Sketches of Arrow‐Connected Diagrams and Digital Logic Circuit Diagrams | |
Wu et al. | Demonstration of panda: a weakly supervised entity matching system | |
CN117251777A (en) | Data processing method, device, computer equipment and storage medium | |
Ali et al. | Analysis of feature selection methods in software defect prediction models | |
US20230134218A1 (en) | Continuous learning for document processing and analysis | |
Bartoli et al. | Semisupervised wrapper choice and generation for print-oriented documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |