CN109635285A - Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium - Google Patents

Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium Download PDF

Info

Publication number
CN109635285A
CN109635285A CN201811416724.9A CN201811416724A CN109635285A CN 109635285 A CN109635285 A CN 109635285A CN 201811416724 A CN201811416724 A CN 201811416724A CN 109635285 A CN109635285 A CN 109635285A
Authority
CN
China
Prior art keywords
abbreviation
enterprise
referred
full name
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811416724.9A
Other languages
Chinese (zh)
Inventor
张依
汪伟
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811416724.9A priority Critical patent/CN109635285A/en
Publication of CN109635285A publication Critical patent/CN109635285A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application involves big data technical field, a kind of enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium are provided.Method includes: to carry out abbreviation identifying processing to the text comprising abbreviation to be identified, obtain alternative referred to as set, obtain word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set, according to the word frequency of each alternative abbreviation, target is determined referred to as, traverse default abbreviation abbreviation library, obtain the abbreviation abbreviation with target abbreviation matching, it obtains and determines that enterprise's full name and target abbreviation matching are successful when finding text of the target referred to as with enterprise full name co-occurrence with abbreviation referred to as corresponding enterprise's full name.On the one hand by being screened to the abbreviation of identification, improve the data accuracy in abbreviation cognitive phase, on the other hand after obtaining enterprise's full name referred to as corresponding with target, by confirmation target, referred to as whether co-occurrence is in one text with corresponding enterprise's full name, it is confirmed whether successful match, improves the accuracy of matching result.

Description

Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
Technical field
This application involves big data technical fields, more particularly to a kind of enterprise's full name and abbreviation matching method, apparatus, meter Calculate machine equipment and storage medium.
Background technique
With the development of big data technology, there is the analysis of public opinion technology, in the feelings of the full abbreviation corresponding relationship of unknown enterprise Under condition, it is always the work that public sentiment alanysis can't steer clear of that the full abbreviation corresponding relationship of enterprise is excavated from text.In daily life, Enterprise name longer for full name, habit are replaced with its abbreviation sanctified by usage, such as " Bank of China Co., Ltd. " Often occur in the form of abbreviation, such as " Bank of China " or " middle row ".
The appearance of abbreviation brings certain difficulty, the processing of traditional full name and abbreviation Corresponding matching for the analysis of public opinion Mode, according to text search mainly from webpage, according to similarity or other Rules Filterings candidate's text pair, or according to Enterprise's full name generates according to word vector or term vector and referred to as has the problem that referred to as mistake is corresponding with mistake, causes enterprise's full name The not high problem with abbreviation matching accuracy
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide it is a kind of can be improved matching accuracy enterprise's full name with Abbreviation matching method, apparatus, computer equipment and storage medium.
A kind of enterprise's full name and abbreviation matching method, which comprises
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described The success of target abbreviation matching.
The described pair of text comprising abbreviation to be identified carries out abbreviation identifying processing in one of the embodiments, obtains standby Before choosing is referred to as gathered, further includes:
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, is obtained Take the sample data set for carrying referred to as mark;
According to the sample data set, training obtains Named Entity Extraction Model, and the Named Entity Extraction Model is used for Carry out abbreviation identifying processing.
It is described in one of the embodiments, to obtain in the alternative abbreviation set each alternative abbreviation in pre-set text library Word frequency determine that target referred to as includes: according to the word frequency of each alternative abbreviation
When abbreviation alternative there are multiclass in the text comprising abbreviation to be identified, according to the word sequence of alternative abbreviation Column, to alternatively referred to as classifying in the alternative abbreviation set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one of the embodiments, it is described according to the target referred to as, traverse default abbreviation abbreviation library, obtain with it is described Before the abbreviation abbreviation of target abbreviation matching, further includes:
Obtain enterprise's full name library, according to the compositional model of enterprise's full name, to enterprise's full name in enterprise's full name library into Row classification;
According to default contraction rule corresponding with the compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, Abbreviation corresponding with enterprise's full name is obtained referred to as to gather;
Referred to as gathered according to the abbreviation, constructs the default abbreviation abbreviation library corresponding with enterprise's full name library.
The basis default contraction rule corresponding with the compositional model in one of the embodiments, to all kinds of institutes It states enterprise's full name and carries out abbreviation processing, before acquisition abbreviation corresponding with enterprise's full name is referred to as gathered, further includes:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name described in the sample data, according to the letter of enterprise described in the sample data Claim, determines the default contraction rule corresponding with compositional model.
It is described when the text for finding the target abbreviation and enterprise's full name co-occurrence in one of the embodiments, When, after determining that enterprise's full name and the target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and the target are referred to as updated to the full abbreviation matching data of preset enterprise Library.
Enterprise's full name by successful match is referred to as updated to pre- with the target in one of the embodiments, If the full abbreviation matching database of enterprise after, further includes:
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in the text and referred to as, is updated to the full abbreviation of preset enterprise Matching database.
A kind of enterprise's full name and abbreviation matching device, described device include:
Alternative referred to as set obtains module, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains It is alternative referred to as to gather;
Target abbreviation determining module, for obtaining in the alternative abbreviation set each alternative abbreviation in pre-set text library Word frequency determines target referred to as according to the word frequency of each alternative abbreviation;
Abbreviation referred to as obtains module, for referred to as, traversing default abbreviation abbreviation library according to the target, obtains and the mesh Mark the abbreviation of abbreviation matching referred to as;
Enterprise's full name obtains module, for obtaining enterprise's full name referred to as corresponding with the abbreviation;
Matching result determining module, for when finding the target referred to as and when the text of enterprise's full name co-occurrence, Determine the successful match of enterprise's full name Yu the target abbreviation.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device performs the steps of when executing the computer program
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described The success of target abbreviation matching.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor It is performed the steps of when row
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative letter The word frequency of title determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding the target referred to as and when the text of enterprise's full name co-occurrence, determine enterprise's full name with it is described The success of target abbreviation matching.
Above-mentioned enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium, by comprising to be identified The text of abbreviation carries out abbreviation identifying processing, obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as, By traversal default abbreviation abbreviation library referred to as corresponding with enterprise, enterprise's full name with target abbreviation matching is obtained, and by looking into Looking for text confirmation target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name with referred to as whether match into Function.In entire scheme, on the one hand by screening to the abbreviation of identification, the data improved in abbreviation cognitive phase are accurate Property, on the other hand after obtaining enterprise's full name referred to as corresponding with target, pass through confirmation target abbreviation and corresponding enterprise's full name Whether co-occurrence in one text, is confirmed whether successful match, improves the accuracy of matching result.
Detailed description of the invention
Fig. 1 is the application scenario diagram of enterprise's full name and abbreviation matching method in one embodiment;
Fig. 2 is the flow diagram of enterprise's full name and abbreviation matching method in one embodiment;
Fig. 3 is the flow diagram of enterprise's full name and abbreviation matching method in another embodiment;
Fig. 4 is the structural block diagram of enterprise's full name and abbreviation matching device in one embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Enterprise's full name provided by the present application and abbreviation matching method, can be applied in application environment as shown in Figure 1.Its In, terminal 102 is communicated with server 104 by network by network.Server 104 is to the text comprising abbreviation to be identified Progress abbreviation identifying processing obtains alternative referred to as set, and each alternative abbreviation is in pre-set text library in the alternative referred to as set of acquisition Word frequency target is determined referred to as according to the word frequency of each alternative abbreviation, referred to as according to target, traverse default abbreviation abbreviation library, obtain Referred to as with the abbreviation of target abbreviation matching, enterprise's full name referred to as corresponding with abbreviation is obtained, when finding target abbreviation and enterprise When the text of full name co-occurrence, enterprise's full name and the success of target abbreviation matching are determined, and by the enterprise's full name and target of successful match Referred to as push to terminal 102.Wherein, terminal 102 can be, but not limited to be various personal computers, laptop, intelligent hand Machine, tablet computer and portable wearable device, server 104 can be formed with the either multiple servers of independent server Server cluster realize.
In one embodiment, it as shown in Fig. 2, providing a kind of enterprise's full name and abbreviation matching method, answers in this way For being illustrated for the server in Fig. 1, comprising the following steps:
Step S200 carries out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains alternative referred to as set.
Referred to as refer to the brief word form being compressed by long, complicated title, wherein for being related to peculiar name Word such as specific enterprise referred to as also belongs to official's appellation that official is duly admitted, for the succinct of expression, enterprise longer for full name Title is generally described using the mode of abbreviation, especially requires stringent public sentiment text to occur in numbers of words such as headline In, often recorded in the form of enterprise's abbreviation.Text comprising abbreviation to be identified can use web crawlers algorithm, obtain It takes comprising public sentiment text to be identified, by obtaining the text comprising abbreviation to be identified for the progress subordinate sentence processing of public sentiment text, In some embodiments, the text comprising abbreviation to be identified can be the mark of the news category public sentiment text including abbreviation to be identified Lead part includes the sentence etc. of abbreviation to be identified in topic or news.Referred to as identification refers to by for abbreviation identifying processing Named Entity Extraction Model, to comprising abbreviation to be identified text carry out feature vector extraction and name Entity recognition, obtain The process for the multiple alternative abbreviations that may include into the text.Named Entity Extraction Model is by carrying the sample referred to as marked Data set training obtains, and according to the feature vector of text, the name entity of identification is the abbreviation in text.Wherein, due to comprising The succinct expression of the text of abbreviation to be identified, identified abbreviation is there are multiple, for example, the text comprising abbreviation to be identified are as follows: " Space Dynamic: quasi- open be listed transfers the possession of west boat 70.94% equity of Aluminum ", can by the alternative abbreviation that abbreviation identifying processing obtains It can include " space flight ", " power ", " Space Dynamic " and " west boat Aluminum ", " west boat aluminium ", " west boat " etc..In embodiment, it adopts Abbreviation identifying processing is carried out with Named Entity Extraction Model, word segmentation processing is carried out by the text that will include abbreviation to be identified, is obtained The sequence of terms for taking the text comprising abbreviation to be identified generates feature according to the sequence of terms of the text comprising abbreviation to be identified Feature vector is inputted trained Named Entity Extraction Model in advance, identified in the text comprising abbreviation to be identified by vector The multiple abbreviations that may include form alternative referred to as set.
Step S300 obtains word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set, according to each alternative The word frequency of abbreviation determines target referred to as.
Multiple alternative abbreviations in alternative referred to as set can be to be extracted from one text, such as " refreshing kindling Electricity ", " south mind kindling electricity ", in " refreshing kindling ", only one of them is correctly, to obtain and alternative referred to as relevant pre-set text Library obtains word frequency of each alternative abbreviation in pre-set text library, when the word frequency of the alternative abbreviation of difference in one text is identical Or when close, take the longest alternative abbreviation of string length as target referred to as, word frequency it is low then as target abbreviation can Energy property is small, for example, identical as the word frequency of " kindling electricity " at " refreshing kindling electricity ", the word frequency of " south mind kindling is electric " is lower, therefore, it will " mind Kindling electricity " is as target abbreviation.
Step S400 referred to as according to target traverses default abbreviation abbreviation library, obtains the abbreviation letter with target abbreviation matching Claim.
Default abbreviation abbreviation library, which refers to, abridges according to existing enterprise's full name data according to the contraction rule of setting The database that the abbreviation abbreviation data obtained after processing are constituted by setting contraction rule or can pass through in embodiment Model abridge referred to as to realize the abbreviation processing of enterprise's full name.By traversing default abbreviation abbreviation library, obtain and target abbreviation phase With abbreviation abbreviation when, referred to as abridge to obtain with from enterprise's full name by the target that identifies from the text comprising abbreviation to be identified Abbreviation abbreviation matching, realize being associated with for enterprise's full name and enterprise's abbreviation.
Step S500 obtains enterprise's full name referred to as corresponding with abbreviation.
It obtains and presets the abbreviation associated enterprise's full name in abbreviation library library, closed according to the mapping of enterprise's full name and abbreviation abbreviation System referred to as according to determining abbreviation can determine that enterprise's full name referred to as corresponding with the abbreviation can pass through in embodiment The industrial and commercial data acquisition for obtaining each enterprise constructs enterprise's full name library according to enterprise's full name data of each enterprise to enterprise's full name.
Step S600 determines enterprise's full name and target letter when finding text of the target referred to as with enterprise full name co-occurrence Claim successful match.
Co-occurrence refers to the phenomenon that feature vocabulary occurs jointly, and feature vocabulary here can be target referred to as and enterprise is complete Claim, with target referred to as and enterprise's full name be search target, public sentiment data is scanned for, when get simultaneously comprising target abbreviation When with the text of enterprise full name, target is determined referred to as and enterprise's full name successful match, conversely, if it does not exist simultaneously comprising target letter Claim the text with enterprise's full name, it fails to match.
Above-mentioned enterprise's full name and abbreviation matching method, by being carried out at abbreviation identification to the text comprising abbreviation to be identified Reason obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as, referred to as corresponding with enterprise by traversal Default abbreviation abbreviation library obtains enterprise's full name with target abbreviation matching, and by searching for text confirmation target abbreviation and enterprise Whether co-occurrence is in one text for full name, confirm enterprise's full name and referred to as whether successful match.In entire scheme, on the one hand by pair The abbreviation of identification is screened, and the data accuracy in abbreviation cognitive phase is improved, and is on the other hand being obtained with target referred to as After corresponding enterprise's full name, by confirmation target, referred to as whether co-occurrence in one text, is confirmed whether with corresponding enterprise's full name Successful match, the word vector or term vector avoided only according to enterprise's full name generates referred to as, and directly carries out enterprise's full name The matching bring error in library and enterprise's abbreviation library, improves the accuracy of matching result.
In some embodiments, the above method can also apply to the organ of constituted by law, cause, enterprise, corporations and its The unit of his nomocracy may include the full name of government department, R&D institution, all kinds of universities and colleges, incorporated business, international organization etc. With the matching of abbreviation.
In one embodiment, as shown in figure 3, step S200, carries out abbreviation identification to the text comprising abbreviation to be identified Processing obtains before alternatively referred to as gathering, further includes:
Step S120 obtains multiple sample datas comprising enterprise's abbreviation.
Step S140 carries out the processing of abbreviation mark to each sample data, obtains according to the corresponding known abbreviation of each sample data Take the sample data set for carrying referred to as mark.
Step S160, according to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is used for Carry out abbreviation identifying processing.
Sample data comprising enterprise's abbreviation refers to the text of known abbreviation, and referred to as mark, which refers to, is divided sample data Word processing, and known abbreviation is labeled sample data, and the sample data after mark is trained to term vector, the word to Amount carries referred to as mark label, using the corresponding term vector of multiple sample datas as the input number of Named Entity Extraction Model According to, Named Entity Extraction Model is trained, Named Entity Extraction Model be Bi-LSTM+CRF model, wherein Bi-LSTM+ CRF model is the output sequence that global optimum is obtained with CRF, is equivalent to the recycling to LSTM information, Bi-LSTM is called two-way LSTM, while considering the feature (by rear to procedure extraction) of past feature (extracting by forward process) and future, phase When in two LSTM, a positive list entries, a reversed list entries, then the output of the two is combined as final Result.The training tool of term vector can be gensimword2vec, glove etc..It is named using input data training Entity recognition model, after the completion of training, using accuracy rate as the evaluation parameter of Named Entity Extraction Model, when accuracy rate does not reach When to given threshold range, model parameter is adjusted, the optimization of entity recognition model is named.Name Entity recognition mould Type can be used for inputting the term vector of the text comprising abbreviation to be identified, and identification comprising that may include in the text of abbreviation to be identified Abbreviation, and export alternative abbreviation that may be present, formed and alternative referred to as gathered.
In one embodiment, as shown in figure 3, step S300, obtains each alternative abbreviation in alternative referred to as set and presetting Word frequency in text library determines that target referred to as includes: according to the word frequency of each alternative abbreviation
Step S320, when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified, according to the word of alternative abbreviation Word order column, to alternatively referred to as classifying in alternative abbreviation set.
Step S330 obtains word frequency of each alternative abbreviation of each classification in pre-set text library,
Step S340 determines the target of each classification referred to as according to the word frequency of each alternative abbreviation of each classification.
Sequence of terms refers to the incidence relation between the multiple words and each word of composition word, can be true using sequence labelling method Determine sequence of terms, according to sequence of terms, alternative abbreviation is sorted out.During the preliminary treatment of Named Entity Extraction Model not Evitable to generate some noises, there are partial noise data in obtained multiple alternative abbreviations, in order to denoise, by default Text library text carries out intersection denoising.By taking the alternative abbreviation of one type as an example, each spare abbreviation in such is obtained respectively It is spare to filter out every one kind according to the length of sequence of terms and word frequency height for word frequency in multiple texts in pre-set text library Target in abbreviation referred to as, the step for be denoising process, can first gather one kind similar word in one of the embodiments, such as It when there is the word comprising identical sequence of terms, is classified as together, such as by " refreshing kindling electricity ", " south mind kindling electricity " is " refreshing Kindling " is classified as one kind, then, needs to count word frequency of each of this kind word in multiple newsletter archives, when same When the word frequency of different terms in one newsletter archive is identical or close, take sequence of terms length is longest to be used as target referred to as, Word frequency it is low then become target abbreviation a possibility that it is small, the low spare abbreviation of word frequency can retain abbreviation data and word frequency information.
In one embodiment, as shown in figure 3, step S400 referred to as according to target traverses default abbreviation abbreviation library, obtains Before taking the abbreviation abbreviation with target abbreviation matching, further includes:
Step S360 obtains enterprise's full name library, complete to the enterprise in enterprise's full name library according to the compositional model of enterprise's full name Title is classified.
Step S370 carries out abbreviation processing to all kinds of enterprise's full name according to default contraction rule corresponding with compositional model, Abbreviation corresponding with enterprise's full name is obtained referred to as to gather.
Step S380 referred to as gathers according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In embodiment, by the industrial and commercial data acquisition of each enterprise of acquisition to enterprise's full name, the enterprise according to each enterprise is complete Claim data, constructs enterprise's full name library.According to the compositional model of enterprise's full name, the compositional model of full name can be divided into several Class, the first kind are " places+name+category of employment+company attributes ", such as meet the having of this mode " Tencent, Shenzhen calculates Machine System Co., Ltd ", " Jiangsu Ya Bang fuel limited liability company ";Second class is " name+category of employment+company attributes ", There are also " names+place+company attributes ", and " name+company attributes " etc..When full name data generate abbreviation by abbreviation, Number of words can will be limited as five words and following.The contraction rule for the spare abbreviation that this kind of full name directly generates usually has several classes, One kind is only name, such as " Tencent ", " Ya Bang ";One kind is name+industry, and such as " Tencent's computer ", " sub- nation's dyestuff ", " sea is logical Security ";One kind is location/location abbreviation+name, such as " Chinese safety ";It is name+company attributes abbreviation there are also one kind, such as " rises News are holding ", " Apple Inc. ".In general, usually can be using abbreviation when name, industry attribute etc. are more than four words, or use Word is taken out at preceding two word or interval, such as " middle petrochemical industry " (China PetroChemical Corporation).It is complete for one according to this generation logic Claim, the set of the spare abbreviation of a series of production can be generated, such as Chang'an Co., Ltd, Ford Motor, " Ford " will be generated, " Ford Motor ", " Chang'an Ford ", " Ford " these abbreviations referred to as collect to form abbreviation corresponding with enterprise's full name It closes.Since enterprise full name inventory is in multiple enterprise's full name, referred to as gathered according to the abbreviation of enterprise's full name, building and enterprise's full name library Abbreviation abbreviation library and enterprise's full name inventory are preset in mapping relations in corresponding default abbreviation abbreviation library.
In one embodiment, step S370 is complete to all kinds of enterprises according to default contraction rule corresponding with compositional model Title carries out abbreviation processing, before acquisition abbreviation corresponding with enterprise's full name is referred to as gathered, further includes:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship.
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould The corresponding default contraction rule of formula.
According to known business referred to as with the sample data of enterprise full name, available enterprise referred to as with the matching of enterprise full name Relationship, referred to as according to the compositional model of enterprise's full name and corresponding enterprise, the contracting of enterprise's full name in the available sample data Rule is write, by counting the contraction rule in multiple sample datas to enterprise's full name, determines that the default abbreviation to enterprise's full name is advised Then, in some embodiments, enterprise's full name may exist multiple corresponding default contraction rules.
In one embodiment, as shown in figure 3, step S600, when find target referred to as with the text of enterprise full name co-occurrence This when, after determining that enterprise's full name and target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching data of preset enterprise by step S720 Library.
Enterprise's full name of successful match and enterprise are referred to as updated to the full abbreviation matching database of preset enterprise, it can be so as to When to all kinds of progress the analysis of public opinion comprising data, quick determination enterprise's full name referred to as corresponding with enterprise improves public sentiment point Analyse efficiency.
In one embodiment, as shown in figure 3, step S600, enterprise's full name of successful match and target are referred to as updated After to the full abbreviation matching database of preset enterprise, further includes:
Step S740, according to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship.
Step S760 extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation of preset enterprise With database.
Text can be public sentiments text such as news etc., and predetermined keyword can be the word of the full abbreviation of enterprise for identification, Such as " ... referred to as ... " in embodiment, by scanning news documents, especially headline etc., directly mentioned by preset rules Full abbreviation matching is taken as a result, the corresponding entity of this kind of data is updated to the full abbreviation of preset enterprise by such as " A abbreviation B " etc. With database.The case where having a large amount of public sentiment datas and a large amount of enterprise's full name data is being faced when searching, but can not be by text In enterprise referred to as the problem of finding corresponding full name when, by searching for the full abbreviation matching database of preset enterprise, can keep away Exempt to occur that a referred to as corresponding multiple full name, referred to as corresponding with full name there are the even complete referred to as completely unrelated feelings of ambiguity Condition improves matched accuracy.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.
In one embodiment, as shown in figure 4, providing a kind of enterprise's full name and abbreviation matching device, comprising:
Alternative referred to as set obtains module 200, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains It obtains and alternatively referred to as gathers;
Target abbreviation determining module 300, for obtaining in alternative referred to as set each alternative abbreviation in pre-set text library Word frequency determines target referred to as according to the word frequency of each alternative abbreviation;
Abbreviation referred to as obtains module 400, for referred to as, traversing default abbreviation abbreviation library according to target, obtains and target letter Claim matched abbreviation referred to as;
Enterprise's full name obtains module 500, for obtaining and referred to as corresponding enterprise's full name of abridging;
Matching result determining module 600, for determining enterprise when finding text of the target referred to as with enterprise full name co-occurrence The successful match of industry full name and target abbreviation.
In one embodiment, enterprise's full name and abbreviation matching device, further include Named Entity Extraction Model training module, For obtaining multiple sample datas comprising enterprise's abbreviation, according to the corresponding known abbreviation of each sample data, to each sample data The processing of abbreviation mark is carried out, the sample data set for carrying referred to as mark is obtained, according to sample data set, it is real that training obtains name Body identification model, Named Entity Extraction Model is for carrying out abbreviation identifying processing.
In one embodiment, target abbreviation determining module 300 is also used to exist in the text comprising abbreviation to be identified When the alternative abbreviation of multiclass, alternatively referred to as classifying in alternative abbreviation set is obtained according to the sequence of terms of alternative abbreviation Word frequency of each alternative abbreviation of each classification in pre-set text library, according to the word frequency of each alternative abbreviation of each classification, Determine the target of each classification referred to as.
In one embodiment, enterprise's full name and abbreviation matching device further include default abbreviation abbreviation library building module, use Classified according to the compositional model of enterprise's full name to enterprise's full name in enterprise's full name library in obtaining enterprise's full name library, according to Default contraction rule corresponding with compositional model carries out abbreviation processing to all kinds of enterprise's full name, obtains corresponding with enterprise's full name Abbreviation is referred to as gathered, and is referred to as gathered according to abbreviation, is constructed default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it presets abbreviation library of abridging and constructs module, be also used to obtain comprising enterprise's full name and abbreviation Sample data with relationship analyzes the compositional model of enterprise's full name in sample data, referred to as according to enterprise in sample data, determines Default contraction rule corresponding with compositional model.
In one embodiment, enterprise's full name and abbreviation matching device, further include the full abbreviation matching database update of enterprise Module, for enterprise's full name of successful match and target to be referred to as updated to the full abbreviation matching database of preset enterprise.
The full abbreviation matching database update module of enterprise in one embodiment is also used to according to predetermined keyword, search Text comprising enterprise's full name Yu enterprise's abbreviation matching relationship extracts matched enterprise's full name and enterprise in text and referred to as, updates To the full abbreviation matching database of preset enterprise.
Above-mentioned enterprise's full name and abbreviation matching device, by being carried out at abbreviation identification to the text comprising abbreviation to be identified Reason obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target referred to as, referred to as corresponding with enterprise by traversal Default abbreviation abbreviation library obtains enterprise's full name with target abbreviation matching, and by searching for text confirmation target abbreviation and enterprise Whether co-occurrence is in one text for full name, confirm enterprise's full name and referred to as whether successful match.In entire scheme, on the one hand by pair The abbreviation of identification is screened, and the data accuracy in abbreviation cognitive phase is improved, and is on the other hand being obtained with target referred to as After corresponding enterprise's full name, by confirmation target, referred to as whether co-occurrence in one text, is confirmed whether with corresponding enterprise's full name Successful match improves the accuracy of matching result.
Specific restriction about enterprise's full name and abbreviation matching device may refer to above for enterprise's full name and abbreviation The restriction of matching process, details are not described herein.Modules in above-mentioned enterprise's full name and abbreviation matching device can whole or portion Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold The corresponding operation of the above modules of row.
In one embodiment, a kind of computer equipment is provided, which can be terminal, internal structure Figure can be as shown in Figure 5.The computer equipment includes processor, the memory, network interface, display connected by system bus Screen and input unit.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer journey Sequence.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The network interface of machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor with Realize a kind of enterprise's full name and abbreviation matching method.The display screen of the computer equipment can be liquid crystal display or electronic ink Water display screen, the input unit of the computer equipment can be the touch layer covered on display screen, be also possible to computer equipment Key, trace ball or the Trackpad being arranged on shell can also be external keyboard, Trackpad or mouse etc..
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, the processor perform the steps of when executing computer program
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set is obtained, according to the word of each alternative abbreviation Frequently, determine target referred to as;
Referred to as according to target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with abbreviation;
When finding text of the target referred to as with enterprise full name co-occurrence, determine enterprise's full name and target abbreviation matching at Function.
In one embodiment, it is also performed the steps of when processor executes computer program
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition carries The referred to as sample data set of mark;
According to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is for carrying out referred to as Identifying processing.
In one embodiment, it is also performed the steps of when processor executes computer program
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified Alternatively referred to as classifying in alternative referred to as set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one embodiment, it is also performed the steps of when processor executes computer program
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name Class;
According to default contraction rule corresponding with compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, obtains and looks forward to The corresponding abbreviation of industry full name is referred to as gathered;
Referred to as gathered according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it is also performed the steps of when processor executes computer program
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould The corresponding default contraction rule of formula.
In one embodiment, it is also performed the steps of when processor executes computer program
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching database of preset enterprise.
In one embodiment, it is also performed the steps of when processor executes computer program
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation matching database of preset enterprise.
The above-mentioned computer equipment for realizing enterprise's full name and abbreviation matching method, by including abbreviation to be identified Text carries out abbreviation identifying processing, obtain it is alternative referred to as gather, according to the word frequency of alternative abbreviation, obtain target referred to as, by time Default abbreviation abbreviation library referred to as corresponding with enterprise is gone through, obtains enterprise's full name with target abbreviation matching, and by searching for text Confirming target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name and referred to as whether successful match.Entirely In scheme, on the one hand by screening to the abbreviation of identification, the data accuracy in abbreviation cognitive phase, another party are improved Face obtain with after target referred to as corresponding enterprise's full name, by confirmation target abbreviation and corresponding enterprise's full name whether co-occurrence in One text is confirmed whether successful match, improves the accuracy of matching result.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in alternative referred to as set is obtained, according to the word of each alternative abbreviation Frequently, determine target referred to as;
Referred to as according to target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with abbreviation;
When finding text of the target referred to as with enterprise full name co-occurrence, determine enterprise's full name and target abbreviation matching at Function.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition carries The referred to as sample data set of mark;
According to sample data set, training obtains Named Entity Extraction Model, and Named Entity Extraction Model is for carrying out referred to as Identifying processing.
In one embodiment, it is also performed the steps of when computer program is executed by processor
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified Alternatively referred to as classifying in alternative referred to as set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name Class;
According to default contraction rule corresponding with compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, obtains and looks forward to The corresponding abbreviation of industry full name is referred to as gathered;
Referred to as gathered according to abbreviation, constructs default abbreviation abbreviation library corresponding with enterprise's full name library.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name in sample data referred to as according to enterprise in sample data determines and composition mould The corresponding default contraction rule of formula.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Enterprise's full name of successful match and target are referred to as updated to the full abbreviation matching database of preset enterprise.
In one embodiment, it is also performed the steps of when computer program is executed by processor
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in text and referred to as, is updated to the full abbreviation matching database of preset enterprise.
The above-mentioned computer readable storage medium for realizing enterprise's full name and abbreviation matching method, by comprising wait know The text of other abbreviation carries out abbreviation identifying processing, obtains alternative referred to as set, according to the word frequency of alternative abbreviation, obtains target letter Claim, by traversal default abbreviation abbreviation library referred to as corresponding with enterprise, obtains enterprise's full name with target abbreviation matching, and pass through Searching text confirmation target, referred to as whether co-occurrence is in one text with enterprise full name, confirm enterprise's full name with referred to as whether match into Function.In entire scheme, on the one hand by screening to the abbreviation of identification, the data improved in abbreviation cognitive phase are accurate Property, on the other hand after obtaining enterprise's full name referred to as corresponding with target, pass through confirmation target abbreviation and corresponding enterprise's full name Whether co-occurrence in one text, is confirmed whether successful match, improves the accuracy of matching result.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art, Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection scope of the application. Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of enterprise's full name and abbreviation matching method, which comprises
Abbreviation identifying processing is carried out to the text comprising abbreviation to be identified, obtains alternative referred to as set;
Word frequency of each alternative abbreviation in pre-set text library in the alternative abbreviation set is obtained, according to each alternative abbreviation Word frequency determines target referred to as;
Referred to as according to the target, the default abbreviation abbreviation library of traversal obtains the abbreviation abbreviation with the target abbreviation matching;
Obtain enterprise's full name referred to as corresponding with the abbreviation;
When finding text of the target referred to as with enterprise's full name co-occurrence, enterprise's full name and the target are determined Abbreviation matching success.
2. the method according to claim 1, wherein the described pair of text comprising abbreviation to be identified carries out abbreviation knowledge Other places reason, obtains before alternatively referred to as gathering, further includes:
Obtain multiple sample datas comprising enterprise's abbreviation;
According to the corresponding known abbreviation of each sample data, the processing of abbreviation mark is carried out to each sample data, acquisition is taken Sample data set with abbreviation mark;
According to the sample data set, training obtains Named Entity Extraction Model, and the Named Entity Extraction Model is for carrying out Abbreviation identifying processing.
3. the method according to claim 1, wherein described obtain each alternative abbreviation in the alternative abbreviation set Word frequency in pre-set text library determines that target referred to as includes: according to the word frequency of each alternative abbreviation
It is right according to the sequence of terms of alternative abbreviation when abbreviation alternative there are multiclass in the text comprising abbreviation to be identified Alternatively referred to as classifying in the alternative abbreviation set;
Word frequency of each alternative abbreviation of each classification in pre-set text library is obtained,
According to the word frequency of each alternative abbreviation of each classification, the target of each classification is determined referred to as.
4. the method according to claim 1, wherein described according to the target abbreviation, the default abbreviation letter of traversal Before the abbreviation abbreviation of title library, acquisition and the target abbreviation matching, further includes:
Enterprise's full name library is obtained to divide enterprise's full name in enterprise's full name library according to the compositional model of enterprise's full name Class;
According to default contraction rule corresponding with the compositional model, abbreviation processing is carried out to all kinds of enterprise's full name, is obtained Abbreviation corresponding with enterprise's full name is referred to as gathered;
Referred to as gathered according to the abbreviation, constructs the default abbreviation abbreviation library corresponding with enterprise's full name library.
5. according to the method described in claim 4, it is characterized in that, the basis default abbreviation corresponding with the compositional model Rule carries out abbreviation processing to all kinds of enterprise's full name, before obtaining abbreviation corresponding with enterprise's full name referred to as set, Further include:
Obtain the sample data comprising enterprise's full name and abbreviation matching relationship;
The compositional model for analyzing enterprise's full name described in the sample data, referred to as according to enterprise described in the sample data, Determine default contraction rule corresponding with the compositional model.
6. the method according to claim 1, wherein it is described when find the target referred to as with it is described corresponding When the text of enterprise's full name co-occurrence, after determining that enterprise's full name and the target abbreviation matching are successful, further includes:
Enterprise's full name of successful match and the target are referred to as updated to the full abbreviation matching database of preset enterprise.
7. according to the method described in claim 6, it is characterized in that, enterprise's full name by successful match and the mesh Mark is referred to as updated to after the full abbreviation matching database of preset enterprise, further includes:
According to predetermined keyword, search includes the text of enterprise's full name and enterprise's abbreviation matching relationship;
It extracts matched enterprise's full name and enterprise in the text and referred to as, is updated to the full abbreviation matching of preset enterprise Database.
8. a kind of enterprise's full name and abbreviation matching device, which is characterized in that described device includes:
Alternative referred to as set obtains module, for carrying out abbreviation identifying processing to the text comprising abbreviation to be identified, obtains alternative Referred to as gather;
Target abbreviation determining module, for obtaining word of each alternative abbreviation in pre-set text library in the alternative abbreviation set Frequently, according to the word frequency of each alternative abbreviation, determine target referred to as;
Abbreviation referred to as obtains module, for referred to as, traversing default abbreviation abbreviation library according to the target, obtains and the target letter Claim matched abbreviation referred to as;
Enterprise's full name obtains module, for obtaining enterprise's full name referred to as corresponding with the abbreviation;
Matching result determining module finds the target abbreviation and the text of corresponding enterprise's full name co-occurrence for working as When, determine the successful match of enterprise's full name Yu the target abbreviation.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201811416724.9A 2018-11-26 2018-11-26 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium Pending CN109635285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811416724.9A CN109635285A (en) 2018-11-26 2018-11-26 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811416724.9A CN109635285A (en) 2018-11-26 2018-11-26 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109635285A true CN109635285A (en) 2019-04-16

Family

ID=66069599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811416724.9A Pending CN109635285A (en) 2018-11-26 2018-11-26 Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109635285A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339319A (en) * 2020-03-02 2020-06-26 北京百度网讯科技有限公司 Disambiguation method and device for enterprise name, electronic equipment and storage medium
CN111723575A (en) * 2020-06-12 2020-09-29 杭州未名信科科技有限公司 Method, device, electronic equipment and medium for recognizing text
CN111783460A (en) * 2020-06-15 2020-10-16 苏宁金融科技(南京)有限公司 Enterprise abbreviation extraction method and device, computer equipment and storage medium
CN111782907A (en) * 2020-07-01 2020-10-16 北京知因智慧科技有限公司 News classification method and device and electronic equipment
CN111914093A (en) * 2019-05-09 2020-11-10 深圳中兴飞贷金融科技有限公司 Data processing method and apparatus, storage medium, and electronic device
CN112036172A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Entity identification method and device based on abbreviated data of model and computer equipment
CN112613299A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Method and device for constructing enterprise synonym library and electronic equipment
CN112925961A (en) * 2019-12-06 2021-06-08 北京海致星图科技有限公司 Intelligent question and answer method and device based on enterprise entity
CN113468315A (en) * 2021-09-02 2021-10-01 北京华云安信息技术有限公司 Vulnerability vendor name matching method
CN113705194A (en) * 2021-04-12 2021-11-26 腾讯科技(深圳)有限公司 Extraction method and electronic equipment for short
CN114048304A (en) * 2021-10-26 2022-02-15 盐城金堤科技有限公司 Effective keyword determination method and device, storage medium and electronic equipment
CN114676319A (en) * 2022-03-01 2022-06-28 广州云趣信息科技有限公司 Method and device for acquiring name of merchant and readable storage medium
CN115309863A (en) * 2022-08-09 2022-11-08 中电金信软件有限公司 Method and device for expanding list content, electronic equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161663A1 (en) * 2008-12-19 2010-06-24 International Business Machines Corporation Searching For A Business Name In A Database
CN108170662A (en) * 2016-12-07 2018-06-15 富士通株式会社 The disambiguation method of breviaty word and disambiguation equipment
CN108460016A (en) * 2018-02-09 2018-08-28 中云开源数据技术(上海)有限公司 A kind of entity name analysis recognition method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161663A1 (en) * 2008-12-19 2010-06-24 International Business Machines Corporation Searching For A Business Name In A Database
CN108170662A (en) * 2016-12-07 2018-06-15 富士通株式会社 The disambiguation method of breviaty word and disambiguation equipment
CN108460016A (en) * 2018-02-09 2018-08-28 中云开源数据技术(上海)有限公司 A kind of entity name analysis recognition method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王敬东 等: "基于逆序扫描和共现分析的缩略语快速提取算法", 计算机应用研究, vol. 35, no. 3, pages 700 - 704 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914093A (en) * 2019-05-09 2020-11-10 深圳中兴飞贷金融科技有限公司 Data processing method and apparatus, storage medium, and electronic device
CN112925961A (en) * 2019-12-06 2021-06-08 北京海致星图科技有限公司 Intelligent question and answer method and device based on enterprise entity
CN111339319A (en) * 2020-03-02 2020-06-26 北京百度网讯科技有限公司 Disambiguation method and device for enterprise name, electronic equipment and storage medium
CN111339319B (en) * 2020-03-02 2023-08-04 北京百度网讯科技有限公司 Enterprise name disambiguation method and device, electronic equipment and storage medium
CN111723575A (en) * 2020-06-12 2020-09-29 杭州未名信科科技有限公司 Method, device, electronic equipment and medium for recognizing text
CN111783460A (en) * 2020-06-15 2020-10-16 苏宁金融科技(南京)有限公司 Enterprise abbreviation extraction method and device, computer equipment and storage medium
CN111782907A (en) * 2020-07-01 2020-10-16 北京知因智慧科技有限公司 News classification method and device and electronic equipment
CN111782907B (en) * 2020-07-01 2024-03-01 北京知因智慧科技有限公司 News classification method and device and electronic equipment
CN112036172B (en) * 2020-09-09 2022-04-15 平安科技(深圳)有限公司 Entity identification method and device based on abbreviated data of model and computer equipment
CN112036172A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Entity identification method and device based on abbreviated data of model and computer equipment
CN112613299A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Method and device for constructing enterprise synonym library and electronic equipment
CN113705194A (en) * 2021-04-12 2021-11-26 腾讯科技(深圳)有限公司 Extraction method and electronic equipment for short
CN113468315B (en) * 2021-09-02 2021-12-10 北京华云安信息技术有限公司 Vulnerability vendor name matching method
CN113468315A (en) * 2021-09-02 2021-10-01 北京华云安信息技术有限公司 Vulnerability vendor name matching method
CN114048304A (en) * 2021-10-26 2022-02-15 盐城金堤科技有限公司 Effective keyword determination method and device, storage medium and electronic equipment
CN114676319A (en) * 2022-03-01 2022-06-28 广州云趣信息科技有限公司 Method and device for acquiring name of merchant and readable storage medium
CN114676319B (en) * 2022-03-01 2023-11-24 广州云趣信息科技有限公司 Method and device for acquiring merchant name and readable storage medium
CN115309863A (en) * 2022-08-09 2022-11-08 中电金信软件有限公司 Method and device for expanding list content, electronic equipment and readable storage medium
CN115309863B (en) * 2022-08-09 2023-09-19 中电金信软件有限公司 Expansion method and device of list content, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN109635285A (en) Enterprise's full name and abbreviation matching method, apparatus, computer equipment and storage medium
CN109684543B (en) User behavior prediction and information delivery method, device, server and storage medium
US8468167B2 (en) Automatic data validation and correction
CN109815333A (en) Information acquisition method, device, computer equipment and storage medium
AU2013329525B2 (en) System and method for recursively traversing the internet and other sources to identify, gather, curate, adjudicate, and qualify business identity and related data
US20210366055A1 (en) Systems and methods for generating accurate transaction data and manipulation
CN109389303A (en) Querying method, device, computer equipment and the storage medium of business connection
CN105512180A (en) Search recommendation method and device
CN110457680A (en) Entity disambiguation method, device, computer equipment and storage medium
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
Zhang et al. Multimodal pre-training based on graph attention network for document understanding
JP2024037719A (en) Domain-specific language interpreter and interactive visual interface for rapid screening
CN110362798B (en) Method, apparatus, computer device and storage medium for judging information retrieval analysis
Sun et al. Analyzing Cross-domain Transportation Big Data of New York City with Semi-supervised and Active Learning.
CN111400340B (en) Natural language processing method, device, computer equipment and storage medium
CN109033427A (en) The screening technique and device of stock, computer equipment and readable storage medium storing program for executing
CN117011581A (en) Image recognition method, medium, device and computing equipment
CN104102704A (en) System control displaying method and system control displaying device
US20230138491A1 (en) Continuous learning for document processing and analysis
Altun et al. SKETRACK: Stroke‐Based Recognition of Online Hand‐Drawn Sketches of Arrow‐Connected Diagrams and Digital Logic Circuit Diagrams
Wu et al. Demonstration of panda: a weakly supervised entity matching system
CN117251777A (en) Data processing method, device, computer equipment and storage medium
Ali et al. Analysis of feature selection methods in software defect prediction models
US20230134218A1 (en) Continuous learning for document processing and analysis
Bartoli et al. Semisupervised wrapper choice and generation for print-oriented documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination