CN108171276A - For generating the method and apparatus of information - Google Patents
For generating the method and apparatus of information Download PDFInfo
- Publication number
- CN108171276A CN108171276A CN201810045681.1A CN201810045681A CN108171276A CN 108171276 A CN108171276 A CN 108171276A CN 201810045681 A CN201810045681 A CN 201810045681A CN 108171276 A CN108171276 A CN 108171276A
- Authority
- CN
- China
- Prior art keywords
- information
- word
- term vector
- enterprise name
- enterprise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Educational Administration (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present application discloses the method and apparatus for generating information.One specific embodiment of this method includes:The company information of Target Enterprise is extracted, wherein, which includes enterprise name and business scope information;Fisrt feature information is extracted from the enterprise name and the business scope information;Second feature information is extracted from remaining information;The fisrt feature information with second feature information is merged, the characteristic information after fusion is input to industry identification model trained in advance, obtains the category of employment of the Target Enterprise.This embodiment improves the flexibilities of information generation.
Description
Technical field
The invention relates to field of computer technology, and in particular to Internet technical field, it is more particularly, to raw
Into the method and apparatus of information.
Background technology
With the development of computer technology, in order to preferably carry out and enterprise relevant data analysis (such as business risk
Analysis, enterprise's collection of illustrative plates construction etc.), it usually needs enterprise is referred to correct industry, and add industry label for enterprise.
Existing mode is typically by way of rule and dictionary pattern matching, identifies the keyword in company information, is led to
It crosses and manually these keywords is mapped in preset trade classification, it usually needs consume larger manpower.
Invention content
The embodiment of the present application proposes the method and apparatus for generating information.
In a first aspect, the embodiment of the present application provides a kind of method for generating information, this method includes:Extract target
The company information of enterprise, wherein, company information includes enterprise name and business scope information;Believe from enterprise name and business scope
Fisrt feature information is extracted in breath;Second feature information is extracted from remaining information, wherein, remaining information is in company information
, information in addition to enterprise name and business scope information;Fisrt feature information is merged with second feature information, it will
Characteristic information after fusion is input to industry identification model trained in advance, obtains the category of employment of Target Enterprise, wherein, industry
Identification model is used for the correspondence of characteristic feature information and category of employment.
In some embodiments, fisrt feature information is extracted from enterprise name and business scope information, including:It is right respectively
Enterprise name and business scope information are segmented, and determine the term vector of each word after participle;It is extracted from enterprise name crucial
Word;To in enterprise name the term vector of each word, the term vector of keyword, each word in business scope information term vector carry out
Parsing generates fisrt feature information.
In some embodiments, in enterprise name the term vector of each word, the term vector of keyword, business scope information
In the term vector of each word parsed, generate fisrt feature information, including:By the term vector of each word in enterprise name, close
The term vector of keyword, each word in business scope information term vector be separately input into advance trained Feature Selection Model, obtain
To feature vector corresponding with enterprise name, keyword, business scope information respectively, will respectively with enterprise name, keyword, warp
The corresponding feature vector of battalion's range information is determined as fisrt feature information, wherein, Feature Selection Model is used to extract text feature.
In some embodiments, Feature Selection Model is by the convolutional layer of convolutional neural networks trained in advance and maximum pond
Layer composition.
In some embodiments, industry identification model is the full articulamentum of convolutional neural networks.
In some embodiments, enterprise name and business scope information are segmented respectively, determines each word after participle
Term vector, including:Enterprise name and business scope information are segmented respectively, by each word in enterprise name and manage model
It encloses each word in information and is separately input into term vector model trained in advance, obtain the term vector and warp of each word in enterprise name
The term vector of each word in range information is sought, wherein, term vector model is used to generate the term vector of word.
In some embodiments, remaining information includes at least one of following of Target Enterprise:Management position, registration type,
Scale sets up time, place;And second feature information is extracted from remaining information, including:It determines each in remaining information
The corresponding one-hot coding of item;The corresponding one-hot coding of items in remaining information is spliced, generates second feature information.
Second aspect, the embodiment of the present application provide a kind of device for being used to generate information, which includes:First extraction
Unit is configured to the company information of extraction Target Enterprise, wherein, company information includes enterprise name and business scope information;
Second extraction unit is configured to extract fisrt feature information from enterprise name and business scope information;Third extraction unit,
Be configured to extract second feature information from remaining information, wherein, remaining information is in company information, except enterprise name and
Information other than business scope information;Input unit is configured to merge fisrt feature information with second feature information,
Characteristic information after fusion is input to industry identification model trained in advance, obtains the category of employment of Target Enterprise, wherein, row
Industry identification model is used for the correspondence of characteristic feature information and category of employment.
In some embodiments, the second extraction unit includes:Word-dividing mode is configured to respectively to enterprise name and operation
Range information is segmented, and determines the term vector of each word after participle;Extraction module is configured to extract from enterprise name and close
Keyword;Generation module, be configured to in enterprise name the term vector of each word, the term vector of keyword, business scope information
In the term vector of each word parsed, generate fisrt feature information.
In some embodiments, generation module is further configured to:By the term vector of each word in enterprise name, key
The term vector of word, each word in business scope information term vector be separately input into advance trained Feature Selection Model, obtain
Feature vector corresponding with enterprise name, keyword, business scope information respectively, will respectively with enterprise name, keyword, operation
The corresponding feature vector of range information is determined as fisrt feature information, wherein, Feature Selection Model is used to extract text feature.
In some embodiments, Feature Selection Model is by the convolutional layer of convolutional neural networks trained in advance and maximum pond
Layer composition.
In some embodiments, industry identification model is the full articulamentum of convolutional neural networks.
In some embodiments, word-dividing mode is further configured to:Respectively to enterprise name and business scope information into
Each word in each word and business scope information in enterprise name, is separately input into term vector mould trained in advance by row participle
Type obtains the term vector of each word in the term vector and business scope information of each word in enterprise name, wherein, term vector model
For generating the term vector of word.
In some embodiments, remaining information includes at least one of following of Target Enterprise:Management position, registration type,
Scale sets up time, place;And third extraction unit includes:Determining module is configured to determine each in remaining information
The corresponding one-hot coding of item;Concatenation module is configured to splice the corresponding one-hot coding of items in remaining information, raw
Into second feature information.
The third aspect, the embodiment of the present application provide a kind of server, including:One or more processors;Storage device,
For storing one or more programs, when one or more programs are executed by one or more processors so that one or more
The method that processor realizes any embodiment in the method for being such as used for generating information.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey
Sequence, the method that any embodiment in the method for being such as used for generating information is realized when which is executed by processor.
Method and apparatus provided by the embodiments of the present application for generating information are believed by the enterprise for extracting Target Enterprise
Breath, to extract fisrt feature information from enterprise name and business scope information and to extract second feature from remaining information
The fisrt feature information then with second feature information is merged, the characteristic information after fusion is input in advance by information
Trained industry identification model, obtains the category of employment of the Target Enterprise, so as to fully extract the feature in company information
Information, and determine based on the characteristic information extracted the category of employment of enterprise, it does not need to manually carry out Keywords matching, improve
The flexibility of information generation.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart for being used to generate one embodiment of the method for information according to the application;
Fig. 3 is the schematic diagram for being used to generate an application scenarios of the method for information according to the application;
Fig. 4 is the flow chart for being used to generate another embodiment of the method for information according to the application;
Fig. 5 is the structure diagram for being used to generate one embodiment of the device of information according to the application;
Fig. 6 is adapted for the structure diagram of the computer system of the server for realizing the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, illustrated only in attached drawing and invent relevant part with related.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the method for being used to generate information that can apply the application or the example for generating the device of information
Sexual system framework 100.
As shown in Figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 101,102,103 by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications can be installed, such as web browser should on terminal device 101,102,103
With, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, wrap
It includes but is not limited to smart mobile phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as company information is stored and is managed deposit
Store up server.The company information that storage server can upload terminal device 101,102,103 is stored, is managed, is analyzed
Deng processing, and generate handling result (such as enterprise sort).
It should be noted that generally being held for the method that generates information by server 105 of being provided of the embodiment of the present application
Row, correspondingly, the device for generating information is generally positioned in server 105.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need
Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the flow for being used to generate one embodiment of the method for information according to the application is shown
200.The described method for generating information includes the following steps:
Step 201, the company information of Target Enterprise is extracted.
In the present embodiment, for generating electronic equipment (such as the service shown in FIG. 1 of the method for information operation thereon
Device 105) in can be previously stored with the company information of a large amount of enterprise, above-mentioned electronic equipment can therefrom extract Target Enterprise
Company information.Wherein, above-mentioned Target Enterprise can be that not yet the enterprise of label category of employment or technical staff are preassigned
The enterprise of category of employment to be determined.The company information of above-mentioned Target Enterprise can be comprising relevant various with above-mentioned Target Enterprise
The text of information, for example, can include enterprise name with the relevant information of above-mentioned Target Enterprise, (such as " certain International Technology is (deep
Ditch between fields) Co., Ltd "), business scope information (such as " electric type product "), enterprise personnel form, enterprises service crowd etc..
It should be noted that in application scenes, the company information of above-mentioned Target Enterprise can also be terminal device
(such as terminal device shown in FIG. 1 101,102,103) is sent to above-mentioned electronics by wired connection or radio connection
In equipment.It should be pointed out that above-mentioned radio connection can include but is not limited to 3G/4G connections, WiFi connections, bluetooth
Connection, WiMAX connections, Zigbee connections, UWB (ultra wideband) connections and other currently known or exploitations in the future
Radio connection.
Step 202, fisrt feature information is extracted from enterprise name and business scope information.
In the present embodiment, above-mentioned electronic equipment various text features can be utilized from above-mentioned enterprise name and
Fisrt feature information is extracted in above-mentioned business scope information.Wherein, above-mentioned fisrt feature information can be for characterizing above-mentioned enterprise
The information of industry title and the text feature in above-mentioned business scope information, such as characterized in vector form.Above-mentioned text
Feature can be the various information characterized for the fundamental (such as semanteme, keyword, Feature Words etc.) to text.
In some optional realization methods of the present embodiment, the analysis mode to the content of above-mentioned Webpage can be
Statistical analysis mode.For example, the frequency of occurrences of word each present in the above can be counted and be sorted;It
Afterwards, then the forward one or more words of frequency of occurrences sequence are chosen as keyword to be extracted;It can finally utilize various
Term vector generation method (such as using the term vector calculating instrument word2vec to increase income) generates the term vector of keyword, will give birth to
Into term vector be determined as fisrt feature information.
In some optional realization methods of the present embodiment, above-mentioned electronic equipment can utilize the text based on statistics special
Levy extracting method extraction fisrt feature information.It as an example, can be first to above-mentioned enterprise name and above-mentioned business scope information
The processing such as full cutting method are carried out, above-mentioned enterprise name and above-mentioned business scope information are divided into word.It then, can be to gained
The word arrived carries out importance calculating (for example, by using word frequency-reverse document-frequency method (Term Frequency-Inverse
Document Frequency, TF-IDF)), keyword is obtained based on the result of importance calculating.Word frequency-reverse file frequency
The main thought of rate method is, if the frequency (Term Frequency, TF) that some word or phrase occur in an article
Height, and seldom occur in other articles, then it is assumed that this word or phrase have good class discrimination ability, are adapted to
Classification.And reverse document-frequency (Inverse Document Frequency, IDF) is primarily referred to as, if comprising some word or
The document of phrase is fewer, then IDF is bigger, then illustrates that the word or phrase have good class discrimination ability.Word is used as a result,
Frequently-reverse document-frequency method, can calculate the importance of some word or phrase inside certain article.Finally, it can utilize
The term vector of various term vector generation method (such as using the term vector calculating instrument word2vec to increase income) generation keywords, will
The term vector generated is determined as fisrt feature information.It should be noted that above-mentioned full cutting method, word frequency-reverse file frequency
Rate method is the known technology studied and applied extensively at present, and details are not described herein.
In some optional realization methods of the present embodiment, above-mentioned electronic equipment can utilize semantic-based text special
Levy extracting method extraction fisrt feature information.It as an example, can be first respectively to above-mentioned enterprise name and above-mentioned business scope
Information is segmented, and the term vector of each word after participle is determined using various term vector generation methods;It then, can be to being extracted
Term vector parsed, generate fisrt feature information.Herein, Feature Selection Model trained in advance can be utilized to being extracted
Term vector parsed, to extract fisrt feature information.Feature Selection Model can utilize machine learning method and training
Sample, to the various existing model (such as Recognition with Recurrent Neural Network (Recurrent that can realize Text character extraction function
Neural Network, RNN), shot and long term memory network (Long Short-Term Memory, LSTM) etc.) carried out supervision
Training obtains.In practice, the characteristic information of features described above extraction model output can be indicated in vector form.
In some optional realization methods of the present embodiment, also extraction first is special as follows for above-mentioned electronic equipment
Reference ceases:
The first step can utilize various participle modes (such as Forward Maximum Method segmenting method, reverse maximum matching participle
Method etc.) above-mentioned enterprise name and above-mentioned business scope information are segmented respectively, and utilize various term vector generating modes
Determine the term vector (such as using the term vector calculating instrument word2vec to increase income) of each word after participle.
Second step can extract keyword from above-mentioned enterprise name.Herein, above-mentioned keyword can be for enterprise
The word that category of employment plays an important roll.For example, enterprise name is " certain International Technology (Shenzhen) Co., Ltd ", then keyword can
To be " science and technology ".In practice, the mode that string matching is carried out with the preset keyword in preset keyword set may be used
Extract the keyword in above-mentioned enterprise name.
Third walks, can be to the term vector of each word in above-mentioned enterprise name, the term vector of above-mentioned keyword, above-mentioned operation
The term vector of each word in range information is parsed, and generates fisrt feature information.Herein, various preset modes can be utilized
Term vector is parsed.As an example, can the term vector generated be combined as matrix first, it then, can be to the square
Battle array carries out the processing such as convolution, down-sampled, and the processing such as above-mentioned convolution, down-sampled can perform repeatedly, the vector that will be finally obtained
As first eigenvector.
In some optional realization methods of the present embodiment, above-mentioned third step, to each word in above-mentioned enterprise name
Term vector, the term vector of above-mentioned keyword, each word in above-mentioned business scope information term vector parsed, generation first is special
Reference ceases, and can carry out in the following manner:Above-mentioned electronic equipment can by the term vector of each word in above-mentioned enterprise name, on
State the term vector of keyword, the term vector of each word in above-mentioned business scope information is separately input into feature extraction trained in advance
Model, obtains feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively, and will point
Feature vector not corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information is determined as fisrt feature information.
Wherein, features described above extraction model can be used for extracting text feature.Herein, Feature Selection Model can utilize machine learning
Method and training sample, to various existing model (such as the Recognition with Recurrent Neural Network that can realize Text character extraction function
(Recurrent neural Network, RNN), shot and long term memory network (Long Short-Term Memory, LSTM), by
Limit Boltzmann machine (Restricted Boltzmann Machine, RBM) etc.) it carries out Training and obtains.
Step 203, second feature information is extracted from remaining information.
In the present embodiment, above-mentioned electronic equipment can by it is in above-mentioned company information, except enterprise name and business scope
Information other than information is known as remaining information, and second feature information is extracted from remaining above-mentioned information.Herein, above-mentioned electronic equipment
Sharp second feature information can be extracted in various manners.As an example, the text feature based on statistics can be utilized
Extract second feature information.The processing such as full cutting method are carried out to remaining information first, remaining information is divided into word;Then adopt
Importance calculating is carried out to obtained word with word frequency-reverse document-frequency method, is obtained based on the result of importance calculating
Keyword;Finally, the term vector of keyword is generated, the term vector generated is determined as second feature information.Need what is illustrated
It is that above-mentioned second feature information can be indicated in vector form.
In some optional realization methods of the present embodiment, remaining above-mentioned information can include above-mentioned Target Enterprise with
It is at least one of lower:Management position, scale, sets up time, place (such as province, city etc.) at registration type.Above-mentioned electronic equipment
Second feature information can be extracted in accordance with the following steps:The first step, it may be determined that each single item in remaining above-mentioned information is corresponding
Solely heat (One-Hot) coding.In practice, one-hot coding is also known as an efficient coding, and method is to use N (N is positive integer) position
Status register encodes N number of state, register-bit that each state has it independent, and when arbitrary,
In only one effectively.For example, six states are encoded:Natural order code is 000,001,010,011,100,101, then
One-hot coding can be 000001,000010,000100,001000,010000,100000.In general, one-hot coding can be used for
The discrete features of text are handled, also play the effect of augmented features to a certain extent, one-hot coding can be with the shape of vector
Formula represents.It should be noted that above-mentioned one-hot coding method is the known technology studied and applied extensively at present, herein no longer
It repeats.Second step can splice the corresponding one-hot coding of items in remaining above-mentioned information, generation second feature letter
Breath, that is, using spliced coding as second feature information (vector).For example, province one shares 23, one 23 can be constructed
The vector of dimension, per one province of one-dimensional representation, the element in the corresponding vector in province where above-mentioned Target Enterprise is 1, in vector
Other elements be 0.
Step 204, fisrt feature information with second feature information is merged, the characteristic information after fusion is input to
Trained industry identification model in advance, obtains the category of employment of Target Enterprise.
In the present embodiment, above-mentioned electronic equipment can first melt fisrt feature information and second feature information
It closes.Herein, since above-mentioned fisrt feature information and above-mentioned second feature information can be indicated in vector form, because
This, can merge above-mentioned fisrt feature information and above-mentioned second feature information using the mode of vector splicing.Then, on
The characteristic information after fusion can be input to industry identification model trained in advance by stating electronic equipment, obtain the row of Target Enterprise
Industry classification.Wherein, above-mentioned industry identification model can be used for the correspondence of characteristic feature information and category of employment.As showing
Example, above-mentioned industry identification model, which can be technical staff, to be counted based on mass data and is pre-established, each characteristic information and row
The mapping table of industry classification.
In some optional realization methods of the present embodiment, above-mentioned industry identification model can train as follows
It obtains:It is possible, firstly, to preset training sample is extracted, wherein, above-mentioned training sample can include the company information of multiple enterprises
Sample can also include the corresponding enterprise sort mark of each company information sample.It then, can be from each company information
Fisrt feature information, second feature information are extracted in sample, herein, the mode of fisrt feature information and second feature information can be with
Mode used in step 202 and step 203 is respectively adopted, details are not described herein again.It later, can will be from each company information
The fisrt feature information and second feature information extracted in sample are merged, using machine learning method, by the spy after fusion
Reference breath is as input, using the corresponding enterprise sort mark of the company information sample as output, to existing achievable classification
Model (such as model-naive Bayesian (Naive Bayesian Model, NBM), the support vector machines (Support of function
Vector Machine, SVM) or classification function (such as softmax functions etc.) carry out Training, training after model or
Classification function is determined as industry identification model.
It, can after the category of employment of above-mentioned Target Enterprise is obtained in some optional realization methods of the present embodiment
The trade information of the above-mentioned Target Enterprise stored is added profession identity, the sector mark can serve to indicate that above-mentioned target
The category of employment of enterprise.In practice, addition profession identity can be in order to subsequently carrying out business risk analysis, enterprise collection of illustrative plates construction
Deng.
With continued reference to Fig. 3, Fig. 3 is to be illustrated according to the present embodiment for generating one of the application scenarios of the method for information
Figure.In the application scenarios of Fig. 3, the storage server for being stored and being managed to company information is stored first from local
Company information list in be extracted some enterprise company information 301 (comprising enterprise name 302,303 and of business scope information
Remaining information 304 in addition to this two), then the storage server is carried from enterprise name 302 and business scope information 303
Fisrt feature information 305 is taken, second feature information 306 is extracted from remaining information 304 later, finally by the fisrt feature information
It is merged with second feature information, the characteristic information 307 after fusion is input to industry identification model trained in advance, is obtained
The category of employment 308 of the Target Enterprise.
The method that above-described embodiment of the application provides, by extracting the company information of Target Enterprise, so as to from enterprise's name
Claim and business scope information in extraction fisrt feature information and second feature information is extracted from remaining information, then by this
One characteristic information is merged with second feature information, and the characteristic information after fusion is input to industry trained in advance identifies mould
Type obtains the category of employment of the Target Enterprise, so as to fully extract the characteristic information in company information, and is based on being extracted
Characteristic information determine the category of employment of enterprise, do not need to manually carry out Keywords matching, improve the flexibility of information generation.
With further reference to Fig. 4, it illustrates for generating the flow 400 of another embodiment of the method for information.The use
In the flow 400 of the method for generation information, include the following steps:
Step 401, the company information of Target Enterprise is extracted.
In the present embodiment, for generating electronic equipment (such as the service shown in FIG. 1 of the method for information operation thereon
Device 105) in can be previously stored with the company information of a large amount of enterprise, above-mentioned electronic equipment can therefrom extract Target Enterprise
Company information.Wherein, the company information of above-mentioned Target Enterprise can be included and the relevant various information of above-mentioned Target Enterprise
Text, for example, above-mentioned company information can include enterprise name and business scope information.
Step 402, enterprise name and business scope information are segmented respectively, by each word in enterprise name and operation
Each word in range information is separately input into advance trained term vector model, obtain each word in enterprise name term vector and
The term vector of each word in business scope information.
In the present embodiment, above-mentioned electronic equipment can utilize various participle modes (such as Forward Maximum Method participle side
Method, reverse maximum match segmentation etc.) above-mentioned enterprise name and above-mentioned business scope information are segmented respectively, it will be above-mentioned
Each word in each word and above-mentioned business scope information in enterprise name is separately input into term vector model trained in advance, obtains
The term vector of the term vector of each word in above-mentioned enterprise name and each word in above-mentioned business scope information.Wherein, upper predicate to
Amount model can be used for generating the term vector of word.Above-mentioned term vector model can be using machine learning method, based on by a large amount of
Enterprise name and the training sample that forms of business scope information (such as increase income to the existing model that can be used for generation term vector
Term vector calculating instrument word2vec used in model) carry out Training and obtain.It is trained using the training sample
Term vector model afterwards, due to carrying out model training using targetedly training sample, thus, than random initialization vector or adopt
It is more preferable with the effect for not limiting the term vector model after field (such as text unrelated with enterprise) is trained.
Step 403, keyword is extracted from enterprise name.
In the present embodiment, above-mentioned electronic equipment can extract keyword from above-mentioned enterprise name.Herein, above-mentioned key
Word can be the word played an important roll for the category of employment of enterprise.For example, enterprise name is " certain International Technology (Shenzhen) has
Limit company ", then keyword can be " science and technology ".In practice, may be used with the preset keyword in preset keyword set into
The mode of line character String matching extracts the keyword in above-mentioned enterprise name.
Step 404, by the term vector of each word in above-mentioned enterprise name, above-mentioned keyword term vector, above-mentioned manage model
The term vector for enclosing each word in information is separately input into advance trained Feature Selection Model, obtain respectively with above-mentioned enterprise name
Title, above-mentioned keyword, the corresponding feature vector of above-mentioned business scope information, will respectively with above-mentioned enterprise name, above-mentioned keyword,
The corresponding feature vector of above-mentioned business scope information is determined as fisrt feature information.
In the present embodiment, above-mentioned electronic equipment can be by the term vector of each word in above-mentioned enterprise name, above-mentioned key
The term vector of word, each word in above-mentioned business scope information term vector be separately input into Feature Selection Model trained in advance,
Obtain feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively, and will respectively with it is upper
It states enterprise name, above-mentioned keyword, the corresponding feature vector of above-mentioned business scope information and is determined as fisrt feature information.Wherein,
Features described above extraction model can by train in advance convolutional neural networks (Convolutional Neural Network,
CNN convolutional layer and maximum pond layer composition).Herein, above-mentioned convolutional neural networks can include one or more convolutional layers, most
Great Chiization layer, vector splicing layer and full articulamentum (fully connected layers, FC).Convolutional layer can be used for input
Matrix to the convolutional layer carries out convolutional calculation, can also realize and feature extraction and down-sampled is carried out to the matrix of input
(downsample);Maximum pond layer can be used for carrying out down-sampled and output vector to the matrix of input;Vector splicing layer can
It for each vector for exporting maximum pond layer and individually enters to the other feature vector of this layer and splices, will spell
Vector after connecing is input to full articulamentum;Full articulamentum can realize the differentiation of category of employment.In practice, full articulamentum is entire
Play the role of in convolutional neural networks " grader ".Full articulamentum can be by the Feature Mapping learnt to sample labeling sky
Between.Herein, matrix, the above-mentioned keyword that above-mentioned electronic equipment can form the term vector of each word in above-mentioned enterprise name
The matrix of term vector composition, the matrix that the term vector of each word in above-mentioned business scope information forms are separately input into above-mentioned convolution
Neural network (a layer convolutional layer for being input to convolutional neural networks), by the maximum pond layer institute of above-mentioned convolutional neural networks
The vector exported respectively respectively as feature corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information to
Amount, and feature vector corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively is determined as first
Characteristic information.
In practice, above-mentioned convolutional neural networks can be trained and be obtained as follows:It is possible, firstly, to extract it is preset,
For the training sample of training convolutional neural networks, wherein, above-mentioned training sample can include the company information sample of multiple enterprises
This, can also include the corresponding enterprise sort mark of each company information sample.Then, for each company information sample
This, can respectively segment the enterprise name in the company information sample and business scope information, will be in enterprise name
Each word in each word and business scope information is separately input into term vector model trained in advance, obtains each word in enterprise name
Term vector and business scope information in each word term vector.It later, can be from the enterprise in each company information sample
Title extracts keyword.Then, it for each company information sample, can be carried from remaining information of the company information sample
Second feature information is taken (mode that every one-hot coding in remaining information splices to be obtained for example, may be used
Two characteristic informations, and second feature information is indicated in the form of vectors).It finally, can will be in each company information sample
Enterprise name in the term vector of each word, the term vector of keyword, each word in business scope information term vector as pre-
The input (a layer convolutional layer for being input to convolutional neural networks) for the convolutional neural networks first established, by the company information sample
This corresponding second feature information is input to the vector splicing layer of the convolutional neural networks pre-established, by the company information sample
Corresponding enterprise sort marks the output as the convolutional neural networks pre-established, this is built in advance using machine learning method
Vertical convolutional neural networks carry out Training, the convolutional neural networks after being trained.Herein, by extracting enterprise name
In keyword, then the term vector of a keyword part as input is input to the mode that model is trained, can given
The clearer guiding of model is given, improves the accuracy of Model checking;Also, convolutional neural networks have training and forecasting efficiency
The advantages of high, has good nonlinear fitting ability, it is ensured that the precision of trade classification.
Step 405, the corresponding one-hot coding of each single item in remaining information is determined.
In the present embodiment, above-mentioned electronic equipment can determine the corresponding one-hot coding of each single item in remaining information,
In, remaining above-mentioned information can be in the company information of above-mentioned Target Enterprise, except above-mentioned enterprise name and above-mentioned business scope
Information other than information.Remaining above-mentioned information can include at least one of following of above-mentioned Target Enterprise:Management position, registration class
Type, sets up time, place (such as province, city etc.) at scale.Above-mentioned electronic equipment can determine every in remaining above-mentioned information
One corresponding only hot (One-Hot) coding.In practice, one-hot coding can be indicated in vector form.
Step 406, the corresponding one-hot coding of items in remaining information is spliced, generates second feature information.
In the present embodiment, above-mentioned electronic equipment can carry out the corresponding one-hot coding of items in remaining above-mentioned information
Splicing generates second feature information, that is, encodes spliced and (can be indicated by the use of the form of vector) as second feature
Information.
Step 407, fisrt feature information with second feature information is merged, the characteristic information after fusion is input to
Trained industry identification model in advance, obtains the category of employment of Target Enterprise.
In the present embodiment, above-mentioned electronic equipment can first melt fisrt feature information and second feature information
It closes.Herein, since above-mentioned fisrt feature information and above-mentioned second feature information can be indicated in vector form, because
This, can merge above-mentioned fisrt feature information and above-mentioned second feature information using the mode of vector splicing.Then, on
The characteristic information after fusion can be input to industry identification model trained in advance by stating electronic equipment, obtain the row of Target Enterprise
Industry classification.Wherein, above-mentioned industry identification model can be used for the correspondence of characteristic feature information and category of employment.As showing
Example, above-mentioned industry identification model, which can be technical staff, to be counted based on mass data and is pre-established, each characteristic information and row
The mapping table of industry classification.It should be noted that above-mentioned, by fisrt feature information and second feature information merge can be with
It is performed by the vector splicing layer of above-mentioned convolutional neural networks trained in advance, above-mentioned industry identification model can be above-mentioned advance instruction
The full articulamentum of experienced convolutional neural networks, full articulamentum can utilize classification function (such as softmax functions) to industry class
Do not judged.In practice, full articulamentum can export the probability of Target Enterprise input industry-by-industry classification, at this point it is possible to will
The corresponding category of employment of maximum probability value is determined as the category of employment of the Target Enterprise;In addition, full articulamentum can also be directly defeated
Go out the corresponding category of employment of maximum probability value.
It, can after the category of employment of above-mentioned Target Enterprise is obtained in some optional realization methods of the present embodiment
The trade information of the above-mentioned Target Enterprise stored is added profession identity, the sector mark can serve to indicate that above-mentioned target
The category of employment of enterprise.In practice, addition profession identity can be in order to subsequently carrying out business risk analysis, enterprise collection of illustrative plates construction
Deng.
Figure 4, it is seen that compared with the corresponding embodiments of Fig. 2, in the present embodiment for the method that generates information
Flow 400 highlight carry out feature extraction using convolutional neural networks trained in advance, the step of category of employment differentiates, due to
Convolutional neural networks have the advantages that training and forecasting efficiency are high, have good nonlinear fitting ability, thus improve row
The precision of industry classification.In addition, flow 400 is also highlighted by extracting the keyword in enterprise name, then by the word of keyword to
It measures and is trained the step of differentiating with category of employment as a part for the input of model, thus can utilize keyword that can give
The clearer guiding of model is given, improves the accuracy of Model checking.In addition, flow 400 is also highlighted using based on by a large amount of
Enterprise name and the vector model trained of training sample that forms of business scope information the step of carrying out term vector generation, by
In carrying out model training using targetedly training sample, thus, than random initialization vector or using do not limit field (such as
The text unrelated with enterprise) training after term vector model effect it is more preferable.The scheme of the present embodiment description can improve as a result,
The accuracy of category of employment generation.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating letter
One embodiment of the device of breath, the device embodiment is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer
For in various electronic equipments.
As shown in figure 5, include described in the present embodiment for generating the device 500 of information:First extraction unit 501, matches
The company information for extracting Target Enterprise is put, wherein, above-mentioned company information includes enterprise name and business scope information;Second
Extraction unit 502 is configured to extract fisrt feature information from above-mentioned enterprise name and above-mentioned business scope information;Third carries
Unit 503 is taken, is configured to extract second feature information from remaining information, wherein, remaining above-mentioned information is in company information
, information in addition to above-mentioned enterprise name and above-mentioned business scope information;Input unit 504 is configured to above-mentioned first
Characteristic information is merged with second feature information, and the characteristic information after fusion is input to industry trained in advance identifies mould
Type obtains the category of employment of above-mentioned Target Enterprise, wherein, above-mentioned industry identification model is used for characteristic feature information and category of employment
Correspondence.
In some optional realization methods of the present embodiment, above-mentioned second extraction unit 502 can include word-dividing mode,
Extraction module and generation module (not shown).Wherein, above-mentioned word-dividing mode may be configured to respectively to above-mentioned enterprise name
Claim and above-mentioned business scope information is segmented, determine the term vector of each word after participle.Use can be configured in said extracted module
In extracting keyword from above-mentioned enterprise name.Above-mentioned generation module may be configured to each word in above-mentioned enterprise name
Term vector, the term vector of above-mentioned keyword, each word in above-mentioned business scope information term vector parsed, generation first is special
Reference ceases.
In some optional realization methods of the present embodiment, above-mentioned generation module can be further configured to:It will be upper
State the term vector of each word in enterprise name, the term vector of above-mentioned keyword, each word in above-mentioned business scope information word to
Amount is separately input into advance trained Feature Selection Model, obtain respectively with above-mentioned enterprise name, above-mentioned keyword, above-mentioned operation
The corresponding feature vector of range information, will be corresponding with above-mentioned enterprise name, above-mentioned keyword, above-mentioned business scope information respectively
Feature vector is determined as fisrt feature information, wherein, features described above extraction model is used to extract text feature.
In some optional realization methods of the present embodiment, features described above extraction model can be by the convolution trained in advance
The convolutional layer of neural network and maximum pond layer composition.
In some optional realization methods of the present embodiment, above-mentioned industry identification model can be above-mentioned convolutional Neural net
The full articulamentum of network.
In some embodiments, above-mentioned word-dividing mode can be further configured to respectively to above-mentioned enterprise name and above-mentioned
Business scope information is segmented, and each word in each word in above-mentioned enterprise name and above-mentioned business scope information is inputted respectively
To term vector model trained in advance, obtain in the term vector and above-mentioned business scope information of each word in above-mentioned enterprise name
The term vector of each word, wherein, above-mentioned term vector model is used to generate the term vector of word.
In some optional realization methods of the present embodiment, remaining above-mentioned information can include above-mentioned Target Enterprise with
It is at least one of lower:Management position, scale, sets up time, place at registration type.Above-mentioned third extraction unit 503 can include true
Cover half block and concatenation module (not shown).Wherein, above-mentioned determining module may be configured to determine in remaining above-mentioned information
The corresponding one-hot coding of each single item.Above-mentioned concatenation module may be configured to the items in remaining above-mentioned information are corresponding solely
Heat coding is spliced, and generates second feature information.
The device that above-described embodiment of the application provides, the enterprise that Target Enterprise is extracted by the first extraction unit 501 believe
Breath, so that the second extraction unit 502 extracts fisrt feature information and third extraction from enterprise name and business scope information
Unit 503 extracts second feature information from remaining information, and then input unit 504 is by the fisrt feature information and second feature
Information is merged, and the characteristic information after fusion is input to industry identification model trained in advance, obtains the Target Enterprise
Category of employment so as to fully extract the characteristic information in company information, and determines enterprise based on the characteristic information extracted
Category of employment, do not need to manually carry out Keywords matching, improve information generation flexibility.
Below with reference to Fig. 6, it illustrates suitable for being used for realizing the computer system 600 of the server of the embodiment of the present application
Structure diagram.Server shown in Fig. 6 is only an example, should not be to the function of the embodiment of the present application and use scope band
Carry out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in
Program in memory (ROM) 602 or be loaded into program in random access storage device (RAM) 603 from storage section 608 and
Perform various appropriate actions and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data.
CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always
Line 604.
I/O interfaces 605 are connected to lower component:Importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 608 including hard disk etc.;
And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because
The network of spy's net performs communication process.Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 610, as needed in order to be read from thereon
Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product, including being carried on computer-readable medium
On computer program, which includes for the program code of the method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 609 and/or from detachable media
611 are mounted.When the computer program is performed by central processing unit (CPU) 601, perform what is limited in the present processes
Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or
Computer readable storage medium either the two arbitrarily combines.Computer readable storage medium for example can be --- but
It is not limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor or arbitrary above combination.
The more specific example of computer readable storage medium can include but is not limited to:Electrical connection with one or more conducting wires,
Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit
Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory
Part or above-mentioned any appropriate combination.In this application, computer readable storage medium can any be included or store
The tangible medium of program, the program can be commanded the either device use or in connection of execution system, device.And
In the application, computer-readable signal media can include the data letter propagated in a base band or as a carrier wave part
Number, wherein carrying computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but not
It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer
Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use
In by instruction execution system, device either device use or program in connection.It is included on computer-readable medium
Program code any appropriate medium can be used to transmit, including but not limited to:Wirelessly, electric wire, optical cable, RF etc., Huo Zheshang
Any appropriate combination stated.
Flow chart and block diagram in attached drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey
Architectural framework in the cards, function and the operation of sequence product.In this regard, each box in flow chart or block diagram can generation
The part of one module of table, program segment or code, the part of the module, program segment or code include one or more use
In the executable instruction of logic function as defined in realization.It should also be noted that it in some implementations as replacements, is marked in box
The function of note can also be occurred with being different from the sequence marked in attached drawing.For example, two boxes succeedingly represented are actually
It can perform substantially in parallel, they can also be performed in the opposite order sometimes, this is depended on the functions involved.Also it to note
Meaning, the combination of each box in block diagram and/or flow chart and the box in block diagram and/or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit can also be set in the processor, for example, can be described as:A kind of processor packet
Include the first extraction unit, the first extraction unit, the first extraction unit and input unit.Wherein, the title of these units is at certain
In the case of do not form restriction to the unit in itself, for example, the first extraction unit is also described as " extraction Target Enterprise
Company information unit ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be
Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are performed by the device so that should
Device:The company information of Target Enterprise is extracted, wherein, which includes enterprise name and business scope information;From the enterprise
Fisrt feature information is extracted in industry title and the business scope information;Second feature information is extracted from remaining information;By this
One characteristic information is merged with second feature information, and the characteristic information after fusion is input to industry trained in advance identifies mould
Type obtains the category of employment of the Target Enterprise.
The preferred embodiment and the explanation to institute's application technology principle that above description is only the application.People in the art
Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the specific combination of above-mentioned technical characteristic forms
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
The other technical solutions for arbitrarily combining and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein
The technical solution that the technical characteristic of energy is replaced mutually and formed.
Claims (16)
1. a kind of method for generating information, including:
The company information of Target Enterprise is extracted, wherein, the company information includes enterprise name and business scope information;
Fisrt feature information is extracted from the enterprise name and the business scope information;
Second feature information is extracted from remaining information, wherein, remaining described information is in company information, except enterprise name
Claim and the information other than the business scope information;
The fisrt feature information with second feature information is merged, the characteristic information after fusion is input to advance training
Industry identification model, obtain the category of employment of the Target Enterprise, wherein, the industry identification model for characteristic feature believe
Breath and the correspondence of category of employment.
2. the method according to claim 1 for generating information, wherein, it is described from the enterprise name and the operation
Fisrt feature information is extracted in range information, including:
The enterprise name and the business scope information are segmented respectively, determine the term vector of each word after participle;
Keyword is extracted from the enterprise name;
It is each in the term vector of term vector, the keyword to each word in the enterprise name, the business scope information
The term vector of word is parsed, and generates fisrt feature information.
3. the method according to claim 2 for generating information, wherein, each word in the enterprise name
Term vector, the term vector of the keyword, each word in the business scope information term vector parsed, generation first is special
Reference ceases, including:
It will be each in the term vector of each word in the enterprise name, the term vector of the keyword, the business scope information
The term vector of word is separately input into advance trained Feature Selection Model, obtain respectively with the enterprise name, the keyword,
The corresponding feature vector of the business scope information will be believed with the enterprise name, the keyword, the business scope respectively
It ceases corresponding feature vector and is determined as fisrt feature information, wherein, the Feature Selection Model is used to extract text feature.
4. the method according to claim 3 for generating information, wherein, the Feature Selection Model by training in advance
The convolutional layer of convolutional neural networks and maximum pond layer composition.
5. the method according to claim 4 for generating information, wherein, the industry identification model is convolution god
Full articulamentum through network.
6. the method according to claim 2 for generating information, wherein, it is described respectively to the enterprise name and described
Business scope information is segmented, and determines the term vector of each word after participle, including:
The enterprise name and the business scope information are segmented respectively, by each word in the enterprise name and described
Each word in business scope information is separately input into advance trained term vector model, obtains each word in the enterprise name
The term vector of term vector and each word in the business scope information, wherein, the term vector model be used to generating the word of word to
Amount.
7. the method according to claim 1 for generating information, wherein, remaining described information includes the Target Enterprise
It is at least one of following:Management position, scale, sets up time, place at registration type;And
It is described that second feature information is extracted from remaining information, including:
Determine the corresponding one-hot coding of each single item in remaining described information;
The corresponding one-hot coding of items in remaining described information is spliced, generates second feature information.
8. it is a kind of for generating the device of information, including:
First extraction unit is configured to the company information of extraction Target Enterprise, wherein, the company information includes enterprise name
With business scope information;
Second extraction unit is configured to extract fisrt feature information from the enterprise name and the business scope information;
Third extraction unit is configured to extract second feature information from remaining information, wherein, remaining described information is enterprise
Information in information, in addition to the enterprise name and the business scope information;
Input unit is configured to merge the fisrt feature information with second feature information, by the feature after fusion
Information is input to industry identification model trained in advance, obtains the category of employment of the Target Enterprise, wherein, the industry identification
Model is used for the correspondence of characteristic feature information and category of employment.
9. it is according to claim 8 for generating the device of information, wherein, second extraction unit includes:
Word-dividing mode is configured to respectively segment the enterprise name and the business scope information, after determining participle
Each word term vector;
Extraction module is configured to extract keyword from the enterprise name;
Generation module is configured to the term vector to each word in the enterprise name, the term vector of the keyword, the warp
The term vector of each word in battalion's range information is parsed, and generates fisrt feature information.
10. it is according to claim 9 for generating the device of information, wherein, the generation module is further configured to:
It will be each in the term vector of each word in the enterprise name, the term vector of the keyword, the business scope information
The term vector of word is separately input into advance trained Feature Selection Model, obtain respectively with the enterprise name, the keyword,
The corresponding feature vector of the business scope information will be believed with the enterprise name, the keyword, the business scope respectively
It ceases corresponding feature vector and is determined as fisrt feature information, wherein, the Feature Selection Model is used to extract text feature.
11. it is according to claim 10 for generating the device of information, wherein, the Feature Selection Model by training in advance
Convolutional neural networks convolutional layer and maximum pond layer form.
12. it is according to claim 11 for generating the device of information, wherein, the industry identification model is the convolution
The full articulamentum of neural network.
13. it is according to claim 9 for generating the device of information, wherein, the word-dividing mode is further configured to:
The enterprise name and the business scope information are segmented respectively, by each word in the enterprise name and described
Each word in business scope information is separately input into advance trained term vector model, obtains each word in the enterprise name
The term vector of term vector and each word in the business scope information, wherein, the term vector model be used to generating the word of word to
Amount.
14. it is according to claim 8 for generating the device of information, wherein, remaining described information is looked forward to including the target
Industry it is at least one of following:Management position, scale, sets up time, place at registration type;And
The third extraction unit includes:
Determining module is configured to determine the corresponding one-hot coding of each single item in remaining described information;
Concatenation module is configured to splice the corresponding one-hot coding of items in remaining described information, and generation second is special
Reference ceases.
15. a kind of server, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are performed by one or more of processors so that one or more of processors are real
The now method as described in any in claim 1-7.
16. a kind of computer readable storage medium, is stored thereon with computer program, wherein, when which is executed by processor
Realize the method as described in any in claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810045681.1A CN108171276B (en) | 2018-01-17 | 2018-01-17 | Method and apparatus for generating information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810045681.1A CN108171276B (en) | 2018-01-17 | 2018-01-17 | Method and apparatus for generating information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108171276A true CN108171276A (en) | 2018-06-15 |
CN108171276B CN108171276B (en) | 2019-07-23 |
Family
ID=62514587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810045681.1A Active CN108171276B (en) | 2018-01-17 | 2018-01-17 | Method and apparatus for generating information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108171276B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359197A (en) * | 2018-10-31 | 2019-02-19 | 税友软件集团股份有限公司 | A kind of tax type authentication method, device and computer readable storage medium |
CN109388712A (en) * | 2018-09-21 | 2019-02-26 | 平安科技(深圳)有限公司 | A kind of trade classification method and terminal device based on machine learning |
CN109710906A (en) * | 2018-12-06 | 2019-05-03 | 深圳市标准技术研究院 | Business scope auxiliary makes a report on method, apparatus, terminal device and storage medium |
CN109801118A (en) * | 2018-12-24 | 2019-05-24 | 航天信息股份有限公司 | Identify method, apparatus, medium and the equipment of the manufacturing business of designated trade |
CN110781955A (en) * | 2019-10-24 | 2020-02-11 | ***股份有限公司 | Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium |
CN110941826A (en) * | 2018-09-21 | 2020-03-31 | 武汉安天信息技术有限责任公司 | Malicious android software detection method and device |
CN111104791A (en) * | 2019-11-14 | 2020-05-05 | 北京金堤科技有限公司 | Industry information acquisition method and apparatus, electronic device and medium |
CN111126422A (en) * | 2018-11-01 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Industry model establishing method, industry determining method, industry model establishing device, industry determining equipment and industry determining medium |
CN111125550A (en) * | 2018-11-01 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Interest point classification method, device, equipment and storage medium |
CN111242146A (en) * | 2018-11-09 | 2020-06-05 | 蔚来汽车有限公司 | POI information classification based on convolutional neural network |
CN111538837A (en) * | 2020-04-27 | 2020-08-14 | 北京同邦卓益科技有限公司 | Method and device for analyzing enterprise operation range information |
CN111914090A (en) * | 2020-08-18 | 2020-11-10 | 生态环境部环境规划院 | Method and device for enterprise industry classification identification and characteristic pollutant identification |
CN112163153A (en) * | 2020-09-30 | 2021-01-01 | 深圳前海微众银行股份有限公司 | Industry label determination method, device, equipment and storage medium |
CN112307199A (en) * | 2019-07-14 | 2021-02-02 | 阿里巴巴集团控股有限公司 | Information identification method, data processing method, device and equipment, information interaction method |
CN112487794A (en) * | 2019-08-21 | 2021-03-12 | 顺丰科技有限公司 | Industry classification method and device, terminal equipment and storage medium |
CN112487263A (en) * | 2020-11-26 | 2021-03-12 | 杭州安恒信息技术股份有限公司 | Information processing method, system, equipment and computer readable storage medium |
CN113869639A (en) * | 2021-08-26 | 2021-12-31 | 中国环境科学研究院 | Yangtze river basin enterprise screening method and device, electronic equipment and storage medium |
CN114785410A (en) * | 2022-04-25 | 2022-07-22 | 贵州电网有限责任公司 | Accurate identification system based on optical fiber coding |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102881125B (en) * | 2012-09-25 | 2014-06-18 | 杭州立高科技有限公司 | Alarm monitoring system based on multi-information fusion centralized processing platform |
US9058515B1 (en) * | 2012-01-12 | 2015-06-16 | Kofax, Inc. | Systems and methods for identification document processing and business workflow integration |
CN105590102A (en) * | 2015-12-30 | 2016-05-18 | 中通服公众信息产业股份有限公司 | Front car face identification method based on deep learning |
CN106372648A (en) * | 2016-10-20 | 2017-02-01 | 中国海洋大学 | Multi-feature-fusion-convolutional-neural-network-based plankton image classification method |
CN106779467A (en) * | 2016-12-31 | 2017-05-31 | 成都数联铭品科技有限公司 | Enterprises ' industry categorizing system based on automatic information screening |
CN107169036A (en) * | 2017-04-19 | 2017-09-15 | 畅捷通信息技术股份有限公司 | Determine the method and system of the affiliated category of employment of enterprise |
CN108241867A (en) * | 2016-12-26 | 2018-07-03 | 阿里巴巴集团控股有限公司 | A kind of sorting technique and device |
-
2018
- 2018-01-17 CN CN201810045681.1A patent/CN108171276B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9058515B1 (en) * | 2012-01-12 | 2015-06-16 | Kofax, Inc. | Systems and methods for identification document processing and business workflow integration |
CN102881125B (en) * | 2012-09-25 | 2014-06-18 | 杭州立高科技有限公司 | Alarm monitoring system based on multi-information fusion centralized processing platform |
CN105590102A (en) * | 2015-12-30 | 2016-05-18 | 中通服公众信息产业股份有限公司 | Front car face identification method based on deep learning |
CN106372648A (en) * | 2016-10-20 | 2017-02-01 | 中国海洋大学 | Multi-feature-fusion-convolutional-neural-network-based plankton image classification method |
CN108241867A (en) * | 2016-12-26 | 2018-07-03 | 阿里巴巴集团控股有限公司 | A kind of sorting technique and device |
CN106779467A (en) * | 2016-12-31 | 2017-05-31 | 成都数联铭品科技有限公司 | Enterprises ' industry categorizing system based on automatic information screening |
CN107169036A (en) * | 2017-04-19 | 2017-09-15 | 畅捷通信息技术股份有限公司 | Determine the method and system of the affiliated category of employment of enterprise |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109388712A (en) * | 2018-09-21 | 2019-02-26 | 平安科技(深圳)有限公司 | A kind of trade classification method and terminal device based on machine learning |
CN110941826A (en) * | 2018-09-21 | 2020-03-31 | 武汉安天信息技术有限责任公司 | Malicious android software detection method and device |
CN109359197A (en) * | 2018-10-31 | 2019-02-19 | 税友软件集团股份有限公司 | A kind of tax type authentication method, device and computer readable storage medium |
CN109359197B (en) * | 2018-10-31 | 2021-01-05 | 税友软件集团股份有限公司 | Tax type authentication method, device and computer readable storage medium |
CN111125550A (en) * | 2018-11-01 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Interest point classification method, device, equipment and storage medium |
CN111125550B (en) * | 2018-11-01 | 2023-11-24 | 百度在线网络技术(北京)有限公司 | Point-of-interest classification method, device, equipment and storage medium |
CN111126422B (en) * | 2018-11-01 | 2023-10-31 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and medium for establishing industry model and determining industry |
CN111126422A (en) * | 2018-11-01 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Industry model establishing method, industry determining method, industry model establishing device, industry determining equipment and industry determining medium |
CN111242146A (en) * | 2018-11-09 | 2020-06-05 | 蔚来汽车有限公司 | POI information classification based on convolutional neural network |
CN111242146B (en) * | 2018-11-09 | 2023-08-25 | 蔚来(安徽)控股有限公司 | POI information classification based on convolutional neural network |
CN109710906A (en) * | 2018-12-06 | 2019-05-03 | 深圳市标准技术研究院 | Business scope auxiliary makes a report on method, apparatus, terminal device and storage medium |
CN109801118A (en) * | 2018-12-24 | 2019-05-24 | 航天信息股份有限公司 | Identify method, apparatus, medium and the equipment of the manufacturing business of designated trade |
CN112307199A (en) * | 2019-07-14 | 2021-02-02 | 阿里巴巴集团控股有限公司 | Information identification method, data processing method, device and equipment, information interaction method |
CN112487794A (en) * | 2019-08-21 | 2021-03-12 | 顺丰科技有限公司 | Industry classification method and device, terminal equipment and storage medium |
CN112487794B (en) * | 2019-08-21 | 2023-09-22 | 顺丰科技有限公司 | Industry classification method, device, terminal equipment and storage medium |
CN110781955A (en) * | 2019-10-24 | 2020-02-11 | ***股份有限公司 | Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium |
CN111104791B (en) * | 2019-11-14 | 2024-02-20 | 北京金堤科技有限公司 | Industry information acquisition method and device, electronic equipment and medium |
CN111104791A (en) * | 2019-11-14 | 2020-05-05 | 北京金堤科技有限公司 | Industry information acquisition method and apparatus, electronic device and medium |
CN111538837A (en) * | 2020-04-27 | 2020-08-14 | 北京同邦卓益科技有限公司 | Method and device for analyzing enterprise operation range information |
CN111914090A (en) * | 2020-08-18 | 2020-11-10 | 生态环境部环境规划院 | Method and device for enterprise industry classification identification and characteristic pollutant identification |
CN112163153A (en) * | 2020-09-30 | 2021-01-01 | 深圳前海微众银行股份有限公司 | Industry label determination method, device, equipment and storage medium |
CN112163153B (en) * | 2020-09-30 | 2024-05-03 | 深圳前海微众银行股份有限公司 | Industry label determining method, device, equipment and storage medium |
CN112487263A (en) * | 2020-11-26 | 2021-03-12 | 杭州安恒信息技术股份有限公司 | Information processing method, system, equipment and computer readable storage medium |
CN113869639B (en) * | 2021-08-26 | 2023-11-07 | 中国环境科学研究院 | Yangtze river basin enterprise screening method and device, electronic equipment and storage medium |
CN113869639A (en) * | 2021-08-26 | 2021-12-31 | 中国环境科学研究院 | Yangtze river basin enterprise screening method and device, electronic equipment and storage medium |
CN114785410A (en) * | 2022-04-25 | 2022-07-22 | 贵州电网有限责任公司 | Accurate identification system based on optical fiber coding |
CN114785410B (en) * | 2022-04-25 | 2024-02-27 | 贵州电网有限责任公司 | Accurate recognition system based on optical fiber coding |
Also Published As
Publication number | Publication date |
---|---|
CN108171276B (en) | 2019-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108171276B (en) | Method and apparatus for generating information | |
US11151177B2 (en) | Search method and apparatus based on artificial intelligence | |
CN113326764B (en) | Method and device for training image recognition model and image recognition | |
CN108153901A (en) | The information-pushing method and device of knowledge based collection of illustrative plates | |
CN107491534A (en) | Information processing method and device | |
US20180365231A1 (en) | Method and apparatus for generating parallel text in same language | |
CN107220386A (en) | Information-pushing method and device | |
CN108090162A (en) | Information-pushing method and device based on artificial intelligence | |
CN107105031A (en) | Information-pushing method and device | |
CN107168952A (en) | Information generating method and device based on artificial intelligence | |
CN109697641A (en) | The method and apparatus for calculating commodity similarity | |
CN109492772A (en) | The method and apparatus for generating information | |
CN108121800A (en) | Information generating method and device based on artificial intelligence | |
CN108287927B (en) | For obtaining the method and device of information | |
CN108121699A (en) | For the method and apparatus of output information | |
CN108804327A (en) | A kind of method and apparatus of automatic Data Generation Test | |
CN109697239A (en) | Method for generating the method for graph text information and for generating image data base | |
CN109299477A (en) | Method and apparatus for generating text header | |
CN110119445A (en) | The method and apparatus for generating feature vector and text classification being carried out based on feature vector | |
CN107145485A (en) | Method and apparatus for compressing topic model | |
CN106919711A (en) | The method and apparatus of the markup information based on artificial intelligence | |
CN107526718A (en) | Method and apparatus for generating text | |
CN108182472A (en) | For generating the method and apparatus of information | |
CN109711733A (en) | For generating method, electronic equipment and the computer-readable medium of Clustering Model | |
CN109146152A (en) | Incident classification prediction technique and device on a kind of line |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |