CN109165279A - information extraction method and device - Google Patents
information extraction method and device Download PDFInfo
- Publication number
- CN109165279A CN109165279A CN201811048118.6A CN201811048118A CN109165279A CN 109165279 A CN109165279 A CN 109165279A CN 201811048118 A CN201811048118 A CN 201811048118A CN 109165279 A CN109165279 A CN 109165279A
- Authority
- CN
- China
- Prior art keywords
- data
- training corpus
- word
- label
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of information extraction method and devices.This method comprises: obtaining the first data, which is the data of targeted efficacy to be extracted;First data are input in two-way shot and long term memory LSTM network, the second data are exported;Wherein, which is used to indicate the label of first data;Target labels are extracted from second data, and obtain the targeted efficacy according to the target labels;Wherein, there is corresponding relationship between the target labels and the targeted efficacy.Correspondingly, additionally providing corresponding device.Using the application, the efficiency of information extraction can be effectively improved.
Description
Technical field
This application involves field of computer technology more particularly to a kind of information extraction methods and device.
Background technique
There is more beauty product on the market at present, different websites is also not quite similar to the description of cosmetics.In order to
The effect of excavating different cosmetics generally carries out entity extraction using rule or regular expression, i.e., using rule or just
Then expression formula describes the effect of cosmetics from extracting in the description corpus largely to cosmetics.
However, extraction efficiency can be made low using above method.
Summary of the invention
This application provides a kind of information extraction method and devices, can effectively improve the efficiency of information extraction.
In a first aspect, the embodiment of the present application provides a kind of information extraction method, comprising:
The first data are obtained, first data are the data of targeted efficacy to be extracted;
First data are input in two-way shot and long term memory network LSTM, the second data are exported;Wherein, described
Two data are used to indicate the label of first data;
Target labels are extracted from second data, and obtain the targeted efficacy according to the target labels;Wherein,
There is corresponding relationship between the target labels and the targeted efficacy.
In the embodiment of the present application, come using two-way shot and long term memory (long short term memory, LSTM) network
The label of the data of targeted efficacy to be extracted is obtained, to extract mesh corresponding with targeted efficacy using the second obtained data
Label is marked, and then obtains the targeted efficacy.Implement the embodiment of the present application, not only realizes simply, but also targeted efficacy can be improved
Extraction efficiency.
In one possible implementation, the length of first data is identical as the length of second data.
In the embodiment of the present application, the length by guaranteeing the first data is identical as the length of the second data, can effectively ensure that
Information extraction device obtains the label of each word in the first data, can effectively avoid two-way LSTM network not to the first data
Situation about having marked occurs, to also improve the efficiency for extracting targeted efficacy.
In one possible implementation, first data include the natural language description of product, the target function
The effect of effect includes the product.
In the embodiment of the present application, product may include cosmetic product, i.e., by handling cosmetic product, can obtain the change
The effect of adornment product.Implement the embodiment of the present application, the extraction accuracy rate of the cosmetic industry in corpus can be improved, avoid manpower
Exhaustive regular expression.
In one possible implementation, described that first data are input to two-way shot and long term memory LSTM network
Before, the method also includes:
Obtain training corpus;
Each word in the training corpus is labeled, the label of each word in the training corpus is obtained;
Each word in the training corpus is encoded, the one-dimensional vector of each word in the training corpus is obtained;
The one-dimensional vector of each word in the label and the training corpus of word each in the training corpus is input to
The two-way LSTM network, the training two-way LSTM network;Wherein, the label of each word and the instruction in the training corpus
Practicing has corresponding relationship between the one-dimensional vector of each word in corpus.
In the embodiment of the present application, two-way LSTM network is trained using training corpus, to train this two-way
Weight parameter in LSTM, and then information extraction side provided by the embodiment of the present application is realized using the two-way LSTM network
Method improves the feasibility of the information extraction method.
In one possible implementation, the acquisition training corpus, comprising:
The training corpus is obtained using web crawlers method.
In the embodiment of the present application, the method for utilizing web crawlers collects training corpus, it is ensured that can be collected into it is a large amount of and
Sufficient training corpus, improves the training effect of two-way LSTM network.
Second aspect, the embodiment of the present application provide a kind of information extraction device, comprising:
Acquiring unit, for obtaining the first data, first data are the data of targeted efficacy to be extracted;
Input-output unit, for first data to be input in two-way shot and long term memory LSTM network, output the
Two data;Wherein, second data are used to indicate the label of first data;
Extracting unit, for extracting target labels from second data, and according to target labels acquisition
Targeted efficacy;Wherein, there is corresponding relationship between the target labels and the targeted efficacy.
In one possible implementation, the length of first data is identical as the length of second data.
In one possible implementation, first data include the natural language description of product, the target function
The effect of effect includes the product.
In one possible implementation, the acquiring unit is also used to obtain training corpus;
Described device further include:
Unit is marked, for being labeled to each word in the training corpus, is obtained each in the training corpus
The label of word;
Coding unit obtains each in the training corpus for encoding to each word in the training corpus
The one-dimensional vector of word;
Training unit, one for each word in the label and the training corpus by word each in the training corpus
Dimensional vector is input to the two-way LSTM network, the training two-way LSTM network;Wherein, each word in the training corpus
There is between the one-dimensional vector of each word corresponding relationship in label and the training corpus.
In one possible implementation, the acquiring unit is specifically used for using described in the acquisition of web crawlers method
Training corpus.
The third aspect, the embodiment of the present application also provides a kind of information extraction devices, comprising: processor, memory and defeated
Enter output interface, the processor and the memory, the input/output interface are interconnected by route;Wherein, the storage
Device is stored with program instruction;When described program instruction is executed by the processor, execute the processor such as first aspect institute
The corresponding method stated.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage
Computer program is stored in medium, the computer program includes program instruction, and described program instruction is worked as to be filled by information extraction
When the processor set executes, the processor is made to execute method described in first aspect.
5th aspect, the embodiment of the present application provides a kind of computer program product comprising instruction, when it is in computer
When upper operation, so that computer executes method described in above-mentioned first aspect.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application or in background technique below will be implemented the application
Attached drawing needed in example or background technique is illustrated.
Fig. 1 is a kind of flow diagram of information extraction method provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of the training method of two-way LSTM network provided by the embodiments of the present application;
Fig. 3 is a kind of schematic diagram of a scenario of two-way LSTM network provided by the embodiments of the present application;
Fig. 4 is a kind of structural schematic diagram of information extraction device provided by the embodiments of the present application;
Fig. 5 is the structural schematic diagram of another information extraction device provided by the embodiments of the present application;
Fig. 6 is the structural schematic diagram of another information extraction device provided by the embodiments of the present application.
Specific embodiment
In order to keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application make into
One step it is described in detail.
It is a kind of flow diagram of information extraction method provided by the embodiments of the present application referring to Fig. 1, Fig. 1, which takes out
Method is taken to can be applied to information extraction device, which can be terminal device, such as mobile phone, plate, computer, or
Person, the information extraction device can be also server etc., and the embodiment of the present application is specially which kind of sets for the information extraction device
It is standby to be not construed as limiting.Optionally, which can also be independent chip etc., and the embodiment of the present application is not construed as limiting.Such as
Shown in Fig. 1, the information extraction method can include:
101, the first data are obtained, above-mentioned first data are the data of targeted efficacy to be extracted.
In the embodiment of the present application, which can be the data of arbitrary natural language description, specifically, first number
It may include the natural language description of cosmetics according to the natural language description that may include product, such as first data.Above-mentioned target function
Effect include the said goods the effect of, as should including cosmetics natural language description data in may include cosmetics the effect of such as
Any one or more effect in whitening, moisturizing, oil-control, anti-aging and nti-freckle etc..
It is understood that the first data that the embodiment of the present application specifically wherefrom obtains the information extraction device do not limit
It is fixed, the description on cosmetic bottle can be shot by camera such as the information extraction device, thus after identifying first data,
Obtain first data.Or the information extraction device can directly obtain first data from webpage.Or the letter
First data etc. that breath draw-out device is obtained from the device that other are used to obtain the first data, the embodiment of the present application is not made
It limits
102, above-mentioned first data are input in two-way shot and long term memory LSTM network, export the second data;Wherein, on
The second data are stated for indicating the label of above-mentioned first data.
In the embodiment of the present application, which can have been trained for the information extraction device from what other devices obtained
Good network, or the network etc. of the information extraction device oneself training, the embodiment of the present application is not construed as limiting.
In the embodiment of the present application, LSTM network is a kind of time recurrent neural network, can be used for handling and predicted time sequence
Relatively long event is spaced and postponed in column.Such as the LSTM network can learn documentation summary, speech recognition and handwriting recognition
Etc., specifically, whether the information that the LSTM network can judge automatically in the addition network is useful.That is, the LSTM net
It can be by being arranged for judging the whether useful unit (cell) of the information being added in the network, so that information enters in network
When in the LSTM network, can be judged whether according to rule it is useful, such as meet the LSTM network algorithm certification information stayed
Under, the information not being inconsistent passes into silence.Specifically, the cell may include three fan doors, i.e., input gate (input gate), forget door
(forget gate) and out gate (output gate).Wherein, input gate can be used for remembering certain information of input, forget door
It can be used for selecting to forget past certain information, and out gate can close certain information of input with the certain information forgotten
And and then output information.
As a result, in the embodiment of the present application, when the first data are input to the LSTM network, which can pass through cell
Judge whether to have in first data information for meeting algorithm certification, useful letter can be left if any the then LSTM network
Breath, as the first data of memory input may be selected in input gate.To which information of the LSTM when that will train merges with the first data
Later, i.e., the information that the LATM network can be useful to this is analyzed or is learnt, and then exports the second data by out gate.
It is understood that being a kind of mode of LSTM network provided by the embodiments of the present application above, in the concrete realization, which may also have
There are many variations or improvement, therefore, LSTM network illustrated above should not be interpreted as to the restriction to the embodiment of the present application.
It is understood that as to how training or learning the LSTM network, alternatively, can also claim how to make the LSTM network defeated
Useful information out with input is corresponding as a result, can refer to training method shown in Fig. 2, is no longer described in detail one by one here.
In the embodiment of the present application, which can be used for indicating the label of the first data, which can be with number
Form.Optionally, which can also be used in the effect of indicating the first data.For example, which can be " salicylic
Whitening effect is fine ", whitening is indicated with number 1, and other data are indicated with 0, then first data are input to two-way LSTM
In, the second data of output can be such as " 0000110000 ".That is 11 in the second data can indicate whitening.It is understood that the above is only
A kind of example should not be construed as the restriction to the embodiment of the present application.
Optionally, in order to guarantee that two-way LSTM network can be labeled each word of the first data of input, this
Apply in embodiment, the length of above-mentioned first data is identical as the length of above-mentioned second data.That is, by guaranteeing first
The length of data is identical as the length of the second data, it can be achieved that being labeled to each word in the first data, to be somebody's turn to do
The label of each word of first data.
103, target labels are extracted from above-mentioned second data, and obtain above-mentioned targeted efficacy according to above-mentioned target labels;
Wherein, there is corresponding relationship between above-mentioned target labels and above-mentioned targeted efficacy.
In the embodiment of the present application, target labels can be corresponding with targeted efficacy, such as still by taking above-mentioned example as an example, the second of output
Data are " 0000110000 ", then target labels can be 11, since number 1 represents whitening, can be obtained according to the target labels
To targeted efficacy.
Again for example, whitening is indicated with number 1, indicate oil-control with number 2, indicate moisturizing with number 3, with digital 4 tables
Show anti-aging, number 0 indicates other, it is appreciated that this other be expressed as and the data except whitening, oil-control, moisturizing and anti-aging.
If the first data are " resist years invasion, alleviate dry skin problem, light take off microgroove wrinkle ", then first data are input to
After two-way LSTM network, the second data of output can be " 00000,0003300,440000 ", therefrom extract target labels
33 and 44, then it can learn that described in first data be moisturizing and anti-aging.That is, although unknown in the first data
Moisturizing and anti-aging are really recorded, but according to semantic analysis it is found that the meaning expressed in first data is moisturizing and anti-ageing
Always, then the two-way LSTM network by analyzing the first data of input, it can be deduced that corresponding label (including target mark
Label).Such as two-way LSTM can be by study sentence structure and word or word information, to provide corresponding label.
It is understood that the embodiment of the present application specifically indicates to be not construed as limiting in what manner for the second data.
In the embodiment of the present application, the label of the data of targeted efficacy to be extracted is obtained using two-way LSTM network, thus
Target labels corresponding with targeted efficacy are extracted using the second obtained data, and then obtain the targeted efficacy.Implement this Shen
Please embodiment, not only realize simple, but also the extraction efficiency of targeted efficacy can be improved.
It below will be by taking the two-way LSTM network of information extraction device training as an example, to illustrate how the information extraction device is
Training network.Referring to fig. 2, Fig. 2 is that a kind of process of method for training two-way LSTM network provided by the embodiments of the present application is shown
It is intended to, this method, which can be applied to information extraction device, which can be terminal device to be server etc.,
The embodiment of the present application is not construed as limiting.As shown in Fig. 2, the training method can include:
201, training corpus is obtained.
In the embodiment of the present application, training corpus may include natural language description relevant to cosmetic product.Specifically, this Shen
Please be in embodiment, above-mentioned acquisition training corpus, comprising:
Above-mentioned training corpus is obtained using web crawlers method.
Wherein, web crawlers, also known as webpage spider, network robot etc., be it is a kind of according to certain rules, automatically
Ground grabs program or script on network.Therefore, the embodiment of the present application is obtained from network by the method using web crawlers
A large amount of training corpus is taken, the training effect of two-way LSTM network can be effectively increased.
202, each word in above-mentioned training corpus is labeled, obtains the label of each word in above-mentioned training corpus.
In the embodiment of the present application, each word in training corpus is labeled, may include in the training corpus
Each word is labeled according to the rule of pre-set label and the corresponding relationship of effect, is divided to realize each word
Class.Wherein, the label of each word can indicate in digital form in the training corpus, such as whitening be indicated with number 1, with number
2 indicate oil-control, indicate moisturizing with number 3, indicate that anti-aging, number 0 indicate other etc., the embodiment of the present application pair with number 4
It is without being limited thereto in the description of effect.
It is understood that being labeled in the embodiment of the present application to each word in training corpus, obtain each in training corpus
The label of word may include following two situation:
Situation one,
It include the natural language description of effect in training corpus, that is to say, that in this case, directly just being wrapped in training corpus
Certain effects, such as whitening, oil-control are included, at this moment, information extraction device can directly be labeled the training corpus.
Situation two,
It does not directly include the natural language description of effect in training corpus, that is to say, that in this case, in training corpus simultaneously
There is no these words for being expressly recited of whitening, oil-control, anti-aging etc..At this moment, the information extraction device can to the training corpus into
After row semantic analysis, which is labeled.
It is understood that the information extraction device is labeled each word in the training corpus in the embodiment of the present application
Method, a kind of mark instruction that user's input can be received for the information extraction device, the information extraction device is according to the mark
Instruction is labeled each word in the training corpus, to obtain the label of each word in the training corpus.Second
It can be labeled automatically according to pre-set rule for the information extraction device, as the pre-set rule can be by
User is arranged, i.e., is arranged the rule before information extraction device is labeled.Wherein, which can indicate
The rule of corresponding relationship between effect and label, such as number 1 indicate whitening, indicate oil-control with number 2, indicate to mend with number 3
Water indicates that anti-aging, number 0 indicate other etc. with number 4, and the embodiment of the present application is not construed as limiting the rule.
203, each word in above-mentioned training corpus is encoded, obtain each word in above-mentioned training corpus it is one-dimensional to
Amount.
It, can will be every in the training corpus after being encoded to each word in training corpus in the embodiment of the present application
A word is converted into the vector of regular length.It is understood that being encoded to each word in training corpus, may also comprise to training
Each character in corpus is encoded.
For example, the vector of regular length may include the vector that length is 100, if training corpus includes " salicylic
Whitening effect is fine ", wherein can be obtained [X1, X2, X3, X4, X5, X6, X7, X8, X9, X10] after being encoded to it, wherein
1 to 10 can indicate the length of sentence in the training corpus, and X can indicate one-dimensional vector.
204, the one-dimensional vector of each word in the label and above-mentioned training corpus of word each in above-mentioned training corpus is defeated
Enter to above-mentioned two-way LSTM network, the above-mentioned two-way LSTM network of training;Wherein, in above-mentioned training corpus the label of each word with it is upper
Stating has corresponding relationship between the one-dimensional vector of each word in training corpus.
The application is that victor is red, in training corpus the one-dimensional vector of each word be represented by [X1, X2, X3, X4 ...,
Xn], the label y of each word can be [0,0,0,1 ... ..., 0] in the training corpus, this is input to two-way through one-dimensional vector and y
Learnt in LSTM network, specifically, as shown in figure 3, study can be calculated by the following formula.
it=σ (Wix(t)+Uih(t-1)+bi) (1)
ft=σ (Wfx(t)+Ufh(t-1)+bf) (2)
ot=σ (Wox(t)+Uoh(t-1)+bo) (3)
ht=σt*tanh(Ct) (6)
Wherein, σ is sigmoid function, and W, U indicate weight, and h (t) indicates that the hidden state of t moment, b indicate biasing,
Middle C indicates unit memory, and i indicates that input vector, f indicate forgetting rate, and 0 indicates output vector.
Wherein i determines knots modification of the vector X for the memory state C in unit of input, f then have decided on whether using
Memory state before and to use how much.
The weight parameter in two-way LSTM network, such as W, U and b can be trained by one-dimensional vector Xn and corresponding y.
It is understood that by the above parameter of training, namely forgetting door, input gate and out gate in training or study LSTM network in cell
In weight parameter etc. can apply the application real by the LSTM network to may make after determining weight parameter
Apply method shown in FIG. 1 provided by example.
Implement the embodiment of the present application, by collecting a large amount of training corpus, can effectively complete to two-way LSTM network
Training also may make information extraction device to improve information extraction so that the training effect of the two-way LSTM network not only can be improved
Efficiency.
It is above-mentioned to illustrate the method for the embodiment of the present application, the device of the embodiment of the present application is provided below.
Referring to fig. 4, Fig. 4 is a kind of structural schematic diagram of information extraction device provided by the embodiments of the present application, which takes out
Device is taken to can be used for executing Fig. 1 and method shown in Fig. 2, as shown in figure 4, the information extraction device includes:
Acquiring unit 401, for obtaining the first data, above-mentioned first data are the data of targeted efficacy to be extracted;
Input-output unit 402, for above-mentioned first data to be input in two-way shot and long term memory LSTM network, output
Second data;Wherein, above-mentioned second data are used to indicate the label of above-mentioned first data;
Extracting unit 403 is obtained for extracting target labels from above-mentioned second data, and according to above-mentioned target labels
State targeted efficacy;Wherein, there is corresponding relationship between above-mentioned target labels and above-mentioned targeted efficacy.
In the embodiment of the present application, the label of the data of targeted efficacy to be extracted is obtained using two-way LSTM network, thus
Target labels corresponding with targeted efficacy are extracted using the second obtained data, and then obtain the targeted efficacy.Implement this Shen
Please embodiment, not only realize simple, but also the extraction efficiency of targeted efficacy can be improved.
Specifically, above-mentioned acquiring unit 401, is also used to obtain training corpus;
Optionally, as shown in figure 5, above-mentioned apparatus further include:
Unit 404 is marked, for being labeled to each word in above-mentioned training corpus, is obtained every in above-mentioned training corpus
The label of a word;
Coding unit 405 obtains every in above-mentioned training corpus for encoding to each word in above-mentioned training corpus
The one-dimensional vector of a word;
Training unit 406, for each word in the label and above-mentioned training corpus by word each in above-mentioned training corpus
One-dimensional vector be input to above-mentioned two-way LSTM network, the above-mentioned two-way LSTM network of training;Wherein, each in above-mentioned training corpus
There is between the one-dimensional vector of each word corresponding relationship in the label of word and above-mentioned training corpus.
Specifically, above-mentioned acquiring unit 401, is specifically used for obtaining above-mentioned training corpus using web crawlers method.
It is understood that the specific implementation of Fig. 4 and information extraction device shown in fig. 5 can refer to Fig. 1 and side shown in Fig. 2
The implementation of method, is no longer described in detail one by one here.
It is a kind of structural schematic diagram of information extraction device provided by the embodiments of the present application referring to Fig. 6, Fig. 6, which takes out
Taking device includes processor 601, memory 602 and input/output interface 603, the processor 601, memory 602 and input
Output interface 603 is connected with each other by bus.
Memory 602 include but is not limited to be random access memory (random access memory, RAM), it is read-only
Memory (read-only memory, ROM), Erasable Programmable Read Only Memory EPROM (erasable programmable
Read only memory, EPROM) or portable read-only memory (compact disc read-only memory, CD-
ROM), which is used for dependent instruction and data.
Input/output interface 603, such as can be communicated etc. by the input/output interface with other devices.Such as input
Output interface can be used for obtaining the trained two-way LSTM network of other devices transmission.The for another example input/output interface
It can also be used to obtain first data etc., the embodiment of the present application is not construed as limiting.
Processor 601 can be one or more central processing units (central processing unit, CPU), locate
In the case that reason device 601 is a CPU, which can be monokaryon CPU, be also possible to multi-core CPU.
Specifically, the realization of each operation can also correspond to referring to Figures 1 and 2 shown in embodiment of the method accordingly retouch
It states.And the realization of each operation can also corresponding description to should refer to Fig. 4 and Installation practice shown in fig. 5.
As in one embodiment, processor 601 can be used for executing method shown in step 101 to step 103, for another example should
Processor 601 can also be used to execute side performed by acquiring unit 401, input-output unit 402, extraction determination unit 403 etc.
Method.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, the process
Relevant hardware can be instructed to complete by computer program, which can be stored in computer-readable storage medium, should
Program is when being executed, it may include such as the process of above-mentioned each method embodiment.And storage medium above-mentioned includes: ROM or deposits at random
Store up the medium of the various program storage codes such as memory body RAM, magnetic or disk.
Claims (10)
1. a kind of information extraction method characterized by comprising
The first data are obtained, first data are the data of targeted efficacy to be extracted;
First data are input in two-way shot and long term memory LSTM network, the second data are exported;Wherein, second number
According to for indicating the label of first data;
Target labels are extracted from second data, and obtain the targeted efficacy according to the target labels;Wherein, described
There is corresponding relationship between target labels and the targeted efficacy.
2. the method according to claim 1, wherein the length of the length of first data and second data
It spends identical.
3. method according to claim 1 or 2, which is characterized in that first data include that the natural language of product is retouched
It states, the effect of targeted efficacy includes the product.
4. according to the method described in claim 3, it is characterized in that, described be input to two-way shot and long term note for first data
Before recalling LSTM network, the method also includes:
Obtain training corpus;
Each word in the training corpus is labeled, the label of each word in the training corpus is obtained;
Each word in the training corpus is encoded, the one-dimensional vector of each word in the training corpus is obtained;
The one-dimensional vector of each word in the label and the training corpus of word each in the training corpus is input to described
Two-way LSTM network, the training two-way LSTM network;Wherein, the label of each word and the trained language in the training corpus
There is between the one-dimensional vector of each word corresponding relationship in material.
5. according to the method described in claim 4, it is characterized in that, the acquisition training corpus, comprising:
The training corpus is obtained using web crawlers method.
6. a kind of information extraction device characterized by comprising
Acquiring unit, for obtaining the first data, first data are the data of targeted efficacy to be extracted;
Input-output unit, for first data to be input in two-way shot and long term memory LSTM network, the second number of output
According to;Wherein, second data are used to indicate the label of first data;
Extracting unit obtains the target for extracting target labels from second data, and according to the target labels
Effect;Wherein, there is corresponding relationship between the target labels and the targeted efficacy.
7. device according to claim 6, which is characterized in that
The acquiring unit is also used to obtain training corpus;
Described device further include:
It marks unit and obtains each word in the training corpus for being labeled to each word in the training corpus
Label;
Coding unit obtains each word in the training corpus for encoding to each word in the training corpus
One-dimensional vector;
Training unit, in the label and the training corpus by word each in the training corpus each word it is one-dimensional to
Amount is input to the two-way LSTM network, the training two-way LSTM network;Wherein, in the training corpus each word label
There is corresponding relationship between the one-dimensional vector of word each in the training corpus.
8. device according to claim 7, which is characterized in that
The acquiring unit is specifically used for obtaining the training corpus using web crawlers method.
9. a kind of information extraction device, which is characterized in that including processor, memory and input/output interface, the processor
It is interconnected with the memory, the input/output interface by route;Wherein, the memory is stored with program instruction, described
When program instruction is executed by the processor, the processor is made to execute the corresponding method as described in claim 1 to 5.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program, the computer program include program instruction, and described program instruction makes when being executed by the processor of information extraction device
The processor perform claim requires method described in 1 to 5 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811048118.6A CN109165279A (en) | 2018-09-06 | 2018-09-06 | information extraction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811048118.6A CN109165279A (en) | 2018-09-06 | 2018-09-06 | information extraction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109165279A true CN109165279A (en) | 2019-01-08 |
Family
ID=64894493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811048118.6A Pending CN109165279A (en) | 2018-09-06 | 2018-09-06 | information extraction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109165279A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156003A (en) * | 2016-06-30 | 2016-11-23 | 北京大学 | A kind of question sentence understanding method in question answering system |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107563407A (en) * | 2017-08-01 | 2018-01-09 | 同济大学 | A kind of character representation learning system of the multi-modal big data in network-oriented space |
CN107797988A (en) * | 2017-10-12 | 2018-03-13 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on Bi LSTM |
CN107908614A (en) * | 2017-10-12 | 2018-04-13 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi LSTM |
CN107943911A (en) * | 2017-11-20 | 2018-04-20 | 北京大学深圳研究院 | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing |
-
2018
- 2018-09-06 CN CN201811048118.6A patent/CN109165279A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156003A (en) * | 2016-06-30 | 2016-11-23 | 北京大学 | A kind of question sentence understanding method in question answering system |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107563407A (en) * | 2017-08-01 | 2018-01-09 | 同济大学 | A kind of character representation learning system of the multi-modal big data in network-oriented space |
CN107797988A (en) * | 2017-10-12 | 2018-03-13 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on Bi LSTM |
CN107908614A (en) * | 2017-10-12 | 2018-04-13 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi LSTM |
CN107943911A (en) * | 2017-11-20 | 2018-04-20 | 北京大学深圳研究院 | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qin et al. | Imaging and fusing time series for wearable sensor-based human activity recognition | |
US10706504B2 (en) | Image processing methods and image processing devices | |
CN111476023B (en) | Method and device for identifying entity relationship | |
CN107220296A (en) | The generation method of question and answer knowledge base, the training method of neutral net and equipment | |
CN110489755A (en) | Document creation method and device | |
CN112183747A (en) | Neural network training method, neural network compression method and related equipment | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
CN108121702A (en) | Mathematics subjective item reads and appraises method and system | |
CN109657582A (en) | Recognition methods, device, computer equipment and the storage medium of face mood | |
CN110717324A (en) | Judgment document answer information extraction method, device, extractor, medium and equipment | |
CN107480196A (en) | A kind of multi-modal lexical representation method based on dynamic fusion mechanism | |
Zhou et al. | ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge | |
Xu et al. | Intelligent emotion detection method based on deep learning in medical and health data | |
CN114201683A (en) | Interest activation news recommendation method and system based on multi-level matching | |
CN110286774A (en) | A kind of sign Language Recognition Method based on Wrist-sport sensor | |
CN111597341A (en) | Document level relation extraction method, device, equipment and storage medium | |
Islam et al. | A simple and mighty arrowhead detection technique of Bangla sign language characters with CNN | |
CN103345623B (en) | A kind of Activity recognition method based on robust relative priority | |
Kumar et al. | Bird species classification from images using deep learning | |
CN111445545B (en) | Text transfer mapping method and device, storage medium and electronic equipment | |
Sridhar et al. | An Enhanced Haar Cascade Face Detection Schema for Gender Recognition | |
Yadahalli et al. | Facial micro expression detection using deep learning architecture | |
Samsani et al. | A real-time automatic human facial expression recognition system using deep neural networks | |
CN109165279A (en) | information extraction method and device | |
CN113779244B (en) | Document emotion classification method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190108 |