CN107943911A - Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing - Google Patents
Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN107943911A CN107943911A CN201711155534.1A CN201711155534A CN107943911A CN 107943911 A CN107943911 A CN 107943911A CN 201711155534 A CN201711155534 A CN 201711155534A CN 107943911 A CN107943911 A CN 107943911A
- Authority
- CN
- China
- Prior art keywords
- data
- feature tag
- neural network
- network model
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing.A kind of data pick-up method, including:Pending data is split to obtain data set;Data set is inputted into default neural network model and obtains initial draw data and feature tag corresponding with initial draw data;According to preset rules template, corresponding target extract data are extracted from initial draw data;By target extract data and feature tag corresponding with target extract data association output.Above-mentioned data pick-up method, when extracting the data of different-format, from the limitation of decimation rule, the mapping relations of data set and feature tag are subjected to data pick-up by the decimation rule of customization, error rate when extracting data in different formats can be reduced, it is more preferable to extract effect.
Description
Technical field
The present invention relates to a kind of computer realm, more particularly to a kind of data pick-up method, apparatus, computer equipment and
Readable storage medium storing program for executing.
Background technology
The fast development of modern information technologies and memory technology and the rapid sprawling of internet so that people are in daily life
Work can touch the various information on network;In the big data epoch, what people lacked is not information, but confused from magnanimity
Information useful, that people are of interest is obtained in complicated miscellaneous information;The advantage of Data Extraction Technology is to simplify nature language
Say the process of processing, only focus on relevant information, and ignore unrelated content.
Traditional data pick-up method mainly passes through rule extraction, i.e., the information word of concern is identified and positioned,
Then decimation rule is customized according to linguistic feature and relevant formatted data, its rule customized can only be directed to it is specific certain
The data of specific format, and when in face of the data of different-format, usually because the segmentation errors of information and the list of decimation rule
One property, causes the error rate of data pick-up very high.
The content of the invention
Based on this, it is necessary to for traditional data abstracting method error rate it is higher the problem of, there is provided a kind of data pick-up side
Method, device, computer equipment and readable storage medium storing program for executing.
A kind of data pick-up method, including:
Pending data is split to obtain data set;
By the data set input default neural network model obtain initial draw data and with initial draw data pair
The feature tag answered;
According to preset rules template, corresponding target extract data are extracted from the initial draw data;
By the target extract data and feature tag corresponding with target extract data association output.
In one of the embodiments, described the step of being split to obtain data set by pending data, including:
The pending data is split to obtain data set according to punctuation mark.
In one of the embodiments, it is described that the default neural network model of data set input is obtained into initial draw number
According to this and the step of feature tag corresponding with the initial draw data, including:
The data set is inputted into default neural network model and obtains alternative label and corresponding standby with the alternative label
Select data set;
Obtain the probability of the corresponding each alternative label of the data set;
The corresponding alternative label of maximum probability is chosen as feature tag, alternate data corresponding with the feature tag
Collection is used as the initial draw data set.
In one of the embodiments, described the step of exporting the feature tag and the target extract data correlation
Afterwards, further include:
When the feature tag of association output has mistake with the target extract data, receive for described default
The adjust instruction of rule template;
According to the adjust instruction, the preset rules template is adjusted.
In one of the embodiments, the method further includes:
Sample data pre-processed according to preprocessing rule to obtain sample set;
Obtain the feature tag corresponding to each sample set;
The sample set and the feature tag are inputted into initial neural network model and obtain default neutral net mould
Type.
In one of the embodiments, the method inputs the sample set and the feature tag to initial nerve net
The step of default neural network model is obtained in network model, including:
The sample set is divided into training set and verification collects;
The training set and feature tag corresponding with the training set are inputted into initial neural network model and obtained
Training neural network model;
The verification collection input to training neural network model is verified feature tag;
When verifying that corresponding with the training set feature tag of feature tag is inconsistent, then by with the training set pair
The feature tag answered corrects the trained neural network model and obtains default neural network model.
In one of the embodiments, it is described to be pre-processed sample data according to preprocessing rule to obtain sample set
Step, including:
Sample segmented to obtain individual character collection according to default participle logic;
Each word that the quantity for the character concentrated by presetting vector model and the individual character concentrates the individual character
It is expressed as word vector;
The word that the individual character is concentrated is expressed as word sequence according to preset rules;
Sample set is obtained according to word sequence described in the word vector sum.
A kind of information extraction device, including:
Split module, for being split pending data to obtain data set;
Labeling module, for by the data set input default neural network model obtain initial draw data and with it is first
Begin to extract the corresponding feature tag of data;
Abstraction module, for according to preset rules template, corresponding target extract to be extracted from the initial draw data
Data;
Output module, for the target extract data and feature tag corresponding with the target extract data to be closed
Connection output.
A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor
Computer program, the processor realizes the step in the above method when performing described program.
A kind of readable storage medium storing program for executing, is stored thereon with computer program, which realizes above-mentioned side when being executed by processor
Step in method.
Above-mentioned data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing, first divide pending data
Cut, then feature tag is added to the data set after segmentation by the self-aid learning ability of neural network model, and pass through regular mould
Plate extracts the target extract data included in data set, finally exports the corresponding feature tag of target extract data.
When extracting the data of different-format, as long as data to be extracted can be identified by computer, you can to pass through neutral net mould
Type establishes the mapping relations of data set and feature tag, and from the limitation of decimation rule, then by data set and feature tag
Mapping relations carry out data pick-up by the decimation rule of customization, can reduce error rate when extracting data in different formats, take out
Take effect more preferable.
Brief description of the drawings
Fig. 1 is the flow chart of data pick-up method in an embodiment;
Fig. 2 is the flow chart of the step S104 in embodiment illustrated in fig. 1;
Fig. 3 is the flow chart of the pre-treatment step in an embodiment;
Fig. 4 is the flow chart of step S302 in embodiment illustrated in fig. 3;
Fig. 5 is the flow chart of step S104 in embodiment illustrated in fig. 1;
Fig. 6 is the structure diagram of data pick-up device in an embodiment;
Fig. 7 is the structure diagram of the computer equipment in an embodiment.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is used only for explaining the present invention, and
It is not used in the restriction present invention.
Before describing in detail according to an embodiment of the invention, it should be noted that, embodiment essentially consist in and data
Abstracting method, device, computer equipment and the step of readable storage medium storing program for executing correlation and the combination of system component.Therefore, affiliated system
System component and method and step are showed in position by ordinary symbol in the accompanying drawings, and are only indicated
The details related with understanding the embodiment of the present invention, in order to avoid because for having benefited from those of ordinary skill in the art of the present invention
Those obvious details have obscured the disclosure.
Herein, such as left and right, upper and lower, front and rear, first and second etc relational terms are used merely to area
Divide an entity or action and another entity or action, and not necessarily require or imply and is between this entity or action any
Actual this relation or order.Term " comprising ", "comprising" or any other variant are intended to non-exclusive inclusion, by
This make it that including the process of a series of elements, method, article or equipment not only includes these key elements, but also comprising not bright
The other element really listed, or be elements inherent to such a process, method, article, or device.
Fig. 1 is referred to, Fig. 1 provides the flow chart of data pick-up method in an embodiment, and method includes the following steps:
S102:Pending data is split to obtain data set.
Wherein, pending data is the target data of data pick-up to be carried out, including text data, PDF, picture etc.;Example
Such as, resume file.Data set refers to the data that can be inputted in default neural network model, it can be character set, picture
Collection or word and picture intersection etc..
Specifically, by pending data progress data split to obtain data set, i.e., by target data according to predetermined logic into
Row segmentation obtains inputting the data set of default neural network model.The effect for carrying out data segmentation is prevented initial data one
Secondary property input neural network model causes data congestion, and treatment effeciency is low, and when segmentation can according to preset rules into
Row segmentation so that each data set after segmentation, which carries, to be had interrelated in content, facilitate neural network model to carry out next step
Data processing.
For example, in a resume data pick-up, resume content to be extracted for " Li Ming graduates from Tsinghua University in 2000,
So far it is engaged in XX work in certain company within 2001, the said firm once won XX rewards.", resume to be extracted is divided into 3 numbers first
According to collection, i.e. data set 1:Li Ming graduates from Tsinghua University's Automation Specialty in 2000;Data set 2:2001 so far in certain company
It is engaged in XX work;Data set 3:The said firm once won XX rewards.
S104:By data set input default neural network model obtain initial draw data and with initial draw data pair
The feature tag answered.
Wherein, default neural network model is the neutral net mould with fixing process rule obtained by sample training
Type.Initial draw data are that the data set split in step S102 is inputted to the data preset neural network model and obtained,
Feature tag is the mark that the data that default neural network model concentrates data are done.
Specifically, after the default neural network model of data set input, neural network model is preset by trained rule, is known
Do not go out with extracting the relevant data set of target data in data set, and pair with extracting target data relevant data addition feature mark
Label, the data set of no feature tag is abandoned, and is only retained the data set that with the addition of feature tag, is obtained initial draw data.
For example, in the example that above-mentioned resume extracts, default neural network model has learnt how to know in training
Not with the relevant data of target extract data, and the ability of feature tag is added to it.Such as, feature tag:Name, and default
Neural network model is read:, Lee, king can identify that these data may represent surname when data, and according to Chinese
Content of the logic after surname most likely name;Feature tag:Previous graduate college, reads in default neural network model:
Tsinghua University, Wuhan University can identify that these data may represent previous graduate college when data.Again by above-mentioned 3 data sets
After the default neural network model of input, feature tag-name is added on " Li Ming " in data set 1, in " Tsinghua University "
Feature tag-previous graduate college is added, feature tag-major is added on " Automation Specialty ";In " certain public affairs of data set 2
Feature tag-work unit is added in department ";Data set 1 is added to 3 feature tags, and data set 2 with the addition of 1 feature tag,
And data set 3 is not added with feature tag.After default neural network model, output data set 1 and 2 is extracted as initial target
Data set, the feature tag added in data set 1 and 2 is feature tag corresponding with initial draw data, and will be not added with spy
The data set 3 of sign label is cast out.
S106:According to preset rules template, corresponding target extract data are extracted from initial draw data.
Wherein, preset rules template is by the summary to sample data, is incorporated experience into and data characteristics hand-codings
Rule template.Such as:Mr. Lin Fei, Chinese nationality, without the permanent right of residence overseas, nineteen sixty-eight birth.Chinese nationality is determined with nationality
Position pers.country, it is male pers.male that Mr. Lin Fei, which positions Mr., and woods flies to be positioned as name pers.name, nineteen sixty-eight
Birth positioning nineteen sixty-eight is birthday pers.birth.Matching process is carried out such as to the information in literary section after formulating the rule that becomes more meticulous
Under:
Specifically, according to preset rules template, the process of target extract data is extracted from initial draw data, is to carry out
The process of rule extraction.Rule extraction is artificially to be positioned according to keyword, word, string matching, regular expression, entity information
A template is formulated etc. information to extract key message therein.Target extract data are the extractions by being obtained after rule extraction
The extraction target of target, i.e. notebook data abstracting method.
Resume text will be obtained first to split resume text according to punctuation mark, exports some sections of text chunk P=
{p_1,p_2…p_n};Secondly, judge whether information word included in each text chunk P={ p_1, p_2 ... p_n } has resume
The information word to be extracted in information, puts forward the text chunk of the information word included to form a new text chunk set P2
={ pr_1, pr_2 ... pr_n };Finally, judge after the information word comprising required extraction, with text chunk P2=pr_1,
Pr_2 ... pr_n } with corresponding information metatag data to for training data, training corresponding information word decimation rule, and store
Information.
S108:By target extract data and feature tag corresponding with target extract data association output.
Specifically, feature tag and target extract data are a pair of of data pair, and it is last to pass through notebook data abstracting method
Output result is one group of group data pair in fact, output characteristic label with extract after target extract data data to rear, parse
The data gone out simultaneously can be used for structure information database, this is undoubtedly for the application such as follow-up data mining, commending system
It is very helpful.
If the preset rules template in this resume data pick-up is to extract name, previous graduate college, major, work
The data template of office, then according to this preset rules template, to extract name, previous graduate college, the data of major, then
Navigated to according to feature tag by target is extracted in data set 1, Li Ming, clearly is extracted from data set 1 according to preset rules template
Hua Da, Automation Specialty;Similarly, when extracting the data of work unit, then navigated to according to feature tag by target is extracted
In data set 2, certain company is extracted from data set 2 according to preset rules template;Finally by feature tag and the target after extraction
Data correlation output is extracted, that is, is exported:Name-Li Ming, previous graduate college-Tsinghua University, major-Automation Specialty, work
Unit-certain company.
Above-mentioned data pick-up method, when carrying out data pick-up, first forms pending data according to the segmentation of segmentation rule
Data set, no matter it is same format, identical coding mode that data book to be extracted is no, as long as training default neural network model,
And ensure that pending data can be identified by computer, you can enable data to be extracted by presetting neural network model
Its content and feature tag are established into mapping relations, overcoming the rule customized in traditional rule extraction can only be directed to specifically
The resume of specific format in certain, just seems unable to do what one wishes in the numerous and complicated resume text in face of magnanimity, not only needs constantly
Addition modification and safeguard existing rule, and the problem of need to handle the conflict between rule.And although neutral net has
Powerful ability of self-teaching goes the interrelated feature between automatic learning text biographic information member, but in learning process
With certain error rate.The use of nerual network technique combination decimation rule then judges mould by regular masterplate in the application
Biographic information member included in block, then accurately extracts the information word in resume, largely reduces neutral net again
The error rate of technological learning so that the accuracy higher of extraction, it is more preferable to extract effect.
In a wherein embodiment, when extracting data, it can be split according to punctuation mark, or according to paragraph
(i.e. enter key) is split, according to the font size of different color or word split etc..Specifically, by pending number
According to being split to obtain data set and being split according to punctuation mark, punctuation mark is easier to identify, and meets lteral data
Using rule.
It is in the step S108 in the above method that feature tag and target extract data correlation is defeated in a wherein embodiment
Further included after the step of going out:When the feature tag of association output has mistake with target extract data, receive for default
The adjust instruction of rule template;According to adjust instruction, preset rules template is adjusted.
Wherein, adjust instruction is when the feature tag of output has mistake with target extract data, according to output error
The characteristics of preset rules template is made modification instruction.
Specifically, the output result of neutral net template may be different because of different pending datas, output
The higher feature tag of error rate partially decidable corresponding with preset rules template is Universal Problems in actual use;
Therefore can be with the error rate of everywhere in the feature tag of the output of statistical neural network template and target extract data, Ran Houzhen
Progress in the corresponding preset rules template in the high part of feature tag error rate to output artificially adjusts.Such as it can pass through
It whether there is mistake between the feature tag and target extract data of the association output of artificial or automatic identification, if automatic know
Not, then can there will be the part of mistake to be labeled, such as the color of font, form etc. can be changed, then pass through user
To adjust the partial error, and accommodation is carried out for preset rules template, so as to will not occur again identical next time
Mistake, and the number used with system is more, the accuracy of system also can be higher, is conducive to the raising of follow-up efficiency.
Above-mentioned data pick-up method is during rule extraction, since decimation rule is according to keyword, word positioning, character
The data such as String matching, regular expression, entity information formulate a template to extract critical data therein, this template differs
Surely all data pick-up requirements be disclosure satisfy that, when there is mistake in feature tag and the target extract data of output, it is necessary to plus
Entering part expert to intervene and verify, it is ensured that regular accuracy, i.e., change error rate eminence in preset rules template by expert,
To ensure the accuracy of data pick-up.
Fig. 2 is referred to, in a wherein embodiment, the step S104 in the above method, i.e., input default god by data set
The step of obtaining initial draw data and feature tag corresponding with initial draw data through network model can include:
S202:Data set is inputted into default neural network model and obtains alternate data collection and corresponding with alternate data collection
Alternative label.
Wherein, alternative label is that data set is inputted default neural network model by presetting the neural network model study of Confucian classics
The rule of inveterate habit may corresponding feature tag come the data judged in the data set of input.
Alternate data collection is according to alternative label, by presetting the obtained initial draw data of neural network model.Due to
The not necessarily last feature tag of alternative label.
Specifically, when performing this step, only the data set for inputting default neural network model is labeled, Huo Zhetong
One form modifying, and it is not required to cast out the data set for not marking alternative label, prevent because the mistake of alternative label causes data
Lose.
S204:Obtain the probability of the corresponding each alternative label of data set.
Specifically, a linear layer, profit can be added in the output port of neural network model when selecting feature tag
When output with the word vector sum word sequence of input terminal in neural network model, allowed in the linear layer of output terminal every kind of standby
Label is selected to be counted, screened as the probability of the feature tag of final output, the final alternative mark for choosing maximum probability
Label form a new meaning of one's words label as output characteristic, do entity mark using new output meaning of one's words label, obtain new
Feature tag;A softmax grader can also be terminated in the output of neural network model, to predict each feature tag
Weight.
S206:The corresponding alternative label of maximum probability is chosen as feature tag, alternative number corresponding with feature tag
Initial draw data set is used as according to collection.
When actual use neural network model obtain initial draw data set, due to the number in the data set of input
Huge according to measuring, word is as a kind of important information carrier, and due to the diversity of word, same word is under different linguistic context
There is the different meanings, the difference of even question mark or fullstop can also influence the meaning of whole word, different especially for Chinese
Punctuate mode can also excavate completely different information.Polysemy, colloquial style, technical term etc., multilingual mixing, sentence
The expression of formula can all impact semantic parsing;And since the difference of various forms, and the expression-form of Chinese character be not solid
It is fixed, most accurate initial draw data and feature tag corresponding with initial draw data can not be directly obtained, if in god
Criterion through disposable fixation in network model, be easy to cause mass data and is judged invalid, cause loss of data,
Therefore data set is first inputted into default neural network model and obtains alternative label and alternate data collection corresponding with alternative label, pass through
The each alternative label of statistics as the probability of the feature tag of final output, can choose the alternative label of maximum probability as real
The feature tag of border output, and initial draw data set is marked with this.
For example, in above-mentioned resume extracts, default neutral net is carried out for the data set 1 of input:" Li Ming 2000
Graduate from Tsinghua University ", surname may be identified as by presetting " Lee " word in the surname rule that neural network model learns in training
Family name, then " Lee " word may then be identified as name with 1 or 2 word below, but individually only be agreed with the recognition result of first prediction
It is fixed inaccurate, first using this result as the first alternative label, to predict second of recognition result i.e. with the second word " bright " together
The alternative label that " Li Ming " is marked is then to repeat this process, until the last character, last " Li Ming " word is marked
It is maximum for the weight of " name " this alternative label, therefore by " surname " this feature tag as last output as a result, and will
Feature tag with data set 1 in " Li Ming " associate.
Only list the deterministic process for one layer of feature tag in this example, neural network model is defeated in practical application
A linear layer is connected in outlet, this linear layer has deterministic process as multilayer, it is ensured that the adequacy that data judge.
It is this most probable feature tag is selected by statistical method and match initial draw data and with it is initial
The corresponding feature tag of data is extracted, error rate of the neural network model when analyzing the data of multiple format can be reduced.
Fig. 3 is referred to, in a wherein embodiment, pre-treatment step is further included in above-mentioned data pick-up method, the pre- place
It can be performed before the embodiment shown in Fig. 1 to manage step, which may comprise steps of:
S302:Sample data pre-processed according to preprocessing rule to obtain sample set.
Wherein, preprocessing rule is the rule by artificial settings, and preprocessed data is processed into and can be used in training just
The data of beginning neural network model, i.e. sample set.Specifically, the default neural network model arrived for guarantee training can be more preferable
Ground meets true situation about using, and preprocessing rule can be set as the data segmentation side identical with above-mentioned data pick-up method
Formula.
S304:Obtain the feature tag corresponding to each sample set.
Wherein, the feature tag corresponding to sample set is to the spy corresponding to the data of the sample set obtained in step S302
Label is levied, feature tag and the mapping relations of sample intensive data are obtained by data mining.Data mining generally refers to
The process of wherein information is hidden in by algorithm search from substantial amounts of data.Data mining is usually related with computer science,
And pass through statistics, Data Environments, information retrieval, machine learning, expert system (relying on the past rule of thumb) and pattern
All multi-methods such as identification realize above-mentioned target, and utilize the powerful learning ability of deep neural network to learn between data
Interrelated feature, the model of generation are used to extract new data.
S306:Sample set and feature tag are inputted into initial neural network model and obtain default neural network model.
Wherein, initial neural network model is the blank template established in newly-built neural network model, this template
Training need to be passed through, study there could be relevant, the as default god of initial neural network model in this step after training
Through network model.In this application, the feature tag corresponding to the sample set and sample set after processing is inputted into initial nerve net
After network model, by the learning ability of neural network model, the default neural network model of generation has identified input data
The ability for the feature tag matched somebody with somebody.
The pretreatment process that neural network model is preset in a training is further included in above-mentioned data pick-up method, passes through this
Pretreatment process trains the default neural network model of training and is used for actual data pick-up;In this application, by after processing
After sample set inputs initial neural network model with the feature tag corresponding to sample set, pass through the study energy of neural network model
Power, the default neural network model of generation have the ability of the feature tag of identified input Data Matching.
In a wherein embodiment, sample data is pre-processed to obtain according to preprocessing rule in above-mentioned steps S302
The step of sample set, is further comprising the steps of, refers to Fig. 4:
S402:Sample segmented to obtain individual character collection according to default participle logic.
Specifically, after carrying out data extraction and obtaining pending data, if pending data text is D={ D_1 ... D_
N }, wherein D_n represents nth data text.Then resume data text D={ D_1 ... D_n } handled, split.
The word inside text chunk, sentence are treated as individual character one by one with trained participle model, obtain list
Word collection w={ wd_1 ... wd_n }, wherein wd_n represent n-th of word.
S404:Each word that the quantity for the character concentrated by presetting vector model and individual character concentrates individual character represents
For word vector.
Wherein, it is the advance trained vector model based on word or word to preset vector model, can be using ***'s
The GloVe of Skip-gram models and Stanford in Word2Vec, carries out vector representation by the word of pending data and forms dimension
D=N dimensions are spent, for initializing neutral net word vector table, are then finely adjusted in nerve network system;A kind of base of word vector
In the character representation vector of word, that is to say, that gone to represent each word with the vector of fixed dimension, wherein removing table with d=100 dimensions
It is the parameter that engineering experience takes to state each word.
For example, for the individual character collection w={ wd_1 ... wd_n } split in above-mentioned S402 steps, reading trains in advance
The vector model based on word or word, each word in individual character collection w is subjected to vector representation and forms dimension d=N dimension (such as 100 dimension)
Word vector v={ v_1 ... v_n }.
S406:The word that individual character is concentrated is expressed as word sequence according to preset rules.
Wherein, context of co-text feature of the word sequence for description language fragment, most basic observation sequence, just because of
The own characteristic of Chinese character and its arrangement of fixation so that each Chinese character in sequence shows certain role characteristic.Such as one
M individual character collection is included in a text chunk, N represents the sum of word included in each individual character collection, then this text chunk can use M
The sequence that N-dimensional (0,1) vector is formed goes to represent, and each word is with 1 N-dimensional (0,1) vector representation;" the explosion-proof section of electric light
The corresponding word sequence of skill limited company " is { electricity, light, prevents, quick-fried, section, skill, stock, part, has, limit, public, department }.
Specifically, for the individual character collection w={ wd_1 ... wd_n } split in above-mentioned S402 steps, according to default rule
Then, each individual character in individual character collection w is expressed as word sequence, as B=B_1, B_2, B_3 ... B_4 | n>0 }, wherein B_n is
Chinese character or symbol string.
S408:Sample set is obtained according to word vector sum word sequence.
Specifically, the word vector training set (Train set) and the training set of word sequence data to be extracted being divided into
(Train set) while as the input feature vector of default neural network model, word vector training set here and the instruction of word sequence
Practicing collection can be powerful using deep neural network as the sample set of training neural network model by processing such as unified forms
Ability of self-teaching go automatically study pending data between interrelated feature, practise default neutral net mould so as to select
Type.
Wherein, the word vector training set and the training set of word sequence data to be extracted being divided into are at the same time as default nerve
During the input feature vector of network model, it is engineering experience parameter that can set dropout=N, wherein N, prevents over-fitting.And observation instruction
Practice neural network model sample set, then by the format analysis processing of sample intensive data into actual pending data form one
Cause, convenient actual data pick-up.
Chinese organize word in, word has very strong flexibility so that vocabulary enormous amount, while lexical feature enrich without
Easily study, and regard keyword as word combination make it that vocabulary role is extremely complex.Such as the part of keyword can
It can be split in other non-key words.That is obtain more after obtaining aspect ratio character segmentation after word segmentation, greatly improve
The complexity of machine learning.The quality that word represents will directly affect the recognition result of biographic information member.Under Chinese environment,
First have to segment Chinese, it is always in the industry cycle a bottleneck problem then to segment, and early period, the quality of participle effect will be straight
The identification of the name entity influenced below connect, causes hydraulic performance decline.So we are used as spy using word vector sum word sequence
The input of sign, the problem of effectively avoiding participle.The feature obtained due to obtaining aspect ratio word segmentation after character segmentation will be lacked, together
When also reduce the influence of participle, greatly reduce the complexity of machine learning.
Fig. 5 is referred to, in a wherein embodiment, data set is inputted default neutral net mould by the S104 in the above method
It is further comprising the steps of that type obtains the step of initial draw data and feature tag corresponding with initial draw data:
S502:Sample set is divided into training set and verification collects.
Wherein, training set is to cause the model on verification collection for training initial model, then by adjusting model parameter
Mark performance be optimal.
Specifically, the sample set by pretreatment is split, the N% of sample set points are training set, the N% of training set
Collect for verification, the value that wherein N is taken by engineering experience.
S504:Training set and feature tag corresponding with training set are inputted into initial neural network model and trained
Neural network model.
Specifically, in training neural network model, it is necessary to initial neural network model first be established, then by training data
Input into the default of line discipline in nerve network system, training data here refers to training set and feature corresponding with training set
Label, by the learning ability of neutral net, obtains training neural network model, training neural network model here has had
After having input data set, according to the data set of input and itself default rule, output characteristic label and corresponding with feature tag
Data set.
S506:Verification collection input to training neural network model is verified feature tag.
Specifically, the training neural network model obtained in step S504 might not have complete accuracy, for drop
Error of the low trained neural network model in actual data pick-up flow, we extract a certain number of samples in training set
This conduct verification collection, then will verify that collection inputs the accuracy that trained neural network model examines training neural network model again,
Verification feature tag corresponding with verification collection has been obtained in this step.
S508:When verifying that feature tag feature tag corresponding with training set is inconsistent, then by corresponding with training set
Feature tag amendment training neural network model obtain default neural network model.
When above-mentioned verification feature tag feature tag corresponding with training set is inconsistent, that is, illustrate training neutral net
Model generates error in the application, and the type of error of the feature tag of analysis verification at this time, super the reason for finding generation mistake, is repaiied
Positive training neural network model, revised trained neural network model are the default nerve used in real data extraction
Network model.
Verification process when above-mentioned step is training default neural network model, is tested by once even multiple
Card, until ensureing that default neural network model is applicable in actual data pick-up flow, ensures the accurate of data pick-up
Property.
Continue the example that above-mentioned resume extracts, by the resume sample data after segmentation be divided into 90% training resume sample and
Data in training resume sample are carried out feature tag association, input initial neutral net mould by 20% verification resume sample
Learnt in type, obtain training neural network model, then trained the sample input of verification resume in neural network model, inspection
The accuracy that trained neural network model is exported for verifying the feature tag of sample data is tested, if output is wrong, is proved
Training network model is problematic, then trains neural network model by feature tag amendment corresponding with resume training sample, repair
The neural network model just obtained afterwards can linked character label and resume sample data, this model be correctly default neutral net
Model
In one of the embodiments, reference can be made to Fig. 6, there is provided the structure diagram of data pick-up device in an embodiment,
The Video Rendering device 600 includes:
Split module 602, for being split pending data to obtain data set.
Labeling module 604, for by data set input default neural network model obtain initial draw data and with it is first
Begin to extract the corresponding feature tag of data.
Abstraction module 606, for according to preset rules template, corresponding target extract number to be extracted from initial draw data
According to.
Output module 608, for target extract data and feature tag corresponding with target extract data association are defeated
Go out.
In one of the embodiments, the segmentation module 602 in above-mentioned data pick-up device can be also used for according to punctuate
Symbol is split pending data to obtain data set.
In one of the embodiments, the labeling module 604 in above-mentioned data pick-up device can include:
Alternative unit, alternate data collection and and alternate data are obtained for data set to be inputted default neural network model
Collect corresponding alternative label.
Statistic unit, for obtaining the probability of the corresponding each alternative label of alternate data collection.
Unit is chosen, the corresponding alternative label of the probability for choosing maximum is corresponding with feature tag as feature tag
Alternate data collection as initial draw data set.
In one of the embodiments, above-mentioned data pick-up device can also include:
Garbled-reception module, for by feature tag and target extract data correlation output after, and associate output spy
When levying label and target extract data in the presence of mistake, the adjust instruction for preset rules template is received.
Module is adjusted, for according to adjust instruction, being adjusted to preset rules template.
In one of the embodiments, above-mentioned data pick-up device can also include:
Acquisition module, for being pre-processed sample data according to preprocessing rule to obtain sample set.
Tag definition module, for obtaining the feature tag corresponding to each sample set.
Forming module, default nerve net is obtained for inputting sample set and feature tag into initial neural network model
Network model.
In one of the embodiments, the labeling module 604 in above-mentioned data pick-up device can include:
Sample decomposition unit, for sample set to be divided into training set and verification collection.
Training unit, for inputting training set and feature tag corresponding with training set into initial neural network model
Obtain training neural network model.
Tag unit is verified, for verification collection input to training neural network model to be verified feature tag.
Authentication unit, for when verifying that corresponding with the training set feature tag of feature tag is inconsistent, then by with instruction
Practice the corresponding initial neural network model of feature tag amendment of collection and obtain default neural network model.
In one of the embodiments, above-mentioned collecting unit can include:
Subelement is segmented, for sample being segmented according to default participle logic to obtain individual character collection.
To quantum boxes, the quantity for the character by presetting vector model and individual character concentration concentrates individual character every
One word is expressed as word vector.
Sequence subelement, for the word that individual character is concentrated to be expressed as word sequence according to preset rules.
Subelement is gathered, for obtaining sample set according to word vector sum word sequence.
It is above-mentioned to limit the restriction that may refer to above in connection with data pick-up method on the specific of data pick-up device,
This is repeated no more.
In one of the embodiments, Fig. 7 is referred to, there is provided the computer equipment of data pick-up is performed in an embodiment
Structure diagram, which can perform data pick-up equipment, be General Server or other any suitable calculating
Machine equipment, including memory, processor, operating system, database and storage can be run on a memory and on a processor
Data extractor, wherein memory can include built-in storage, and processor realizes following step when performing data extractor
Suddenly:Pending data is split to obtain data set;Data set is inputted into default neural network model and obtains initial draw number
According to this and feature tag corresponding with initial draw data;Mesh corresponding with preset rules template is extracted from initial draw data
Mark extracts data;By target extract data and feature tag corresponding with target extract data association output.
In one of the embodiments, realized during processor executive program split pending data is counted
It can include according to the step of collection:Pending data split according to punctuation mark to obtain data set.
In one of the embodiments, that is realized during processor executive program inputs data set default neutral net mould
Type, which obtains the step of initial draw data and feature tag corresponding with initial draw data, to be included:Data set is inputted
Default neural network model obtains alternate data collection and alternative label corresponding with alternate data collection;Obtain alternate data set pair
The probability for each alternative label answered;The corresponding alternative label of maximum probability is chosen as feature tag, with feature tag pair
The alternate data collection answered is as initial draw data set.
In one of the embodiments, that is realized during processor executive program closes feature tag and target extract data
After the step of connection output, further include:When the feature tag of association output has mistake with target extract data, reception is directed to
The adjust instruction of preset rules template;According to adjust instruction, preset rules template is adjusted.
In one of the embodiments, following steps are also realized when which is executed by processor:According to preprocessing rule
Sample data is pre-processed to obtain sample set;Obtain the feature tag corresponding to each sample set;By sample set and feature
Label, which is inputted into initial neural network model, obtains default neural network model.
In one of the embodiments, that is realized during processor executive program inputs sample set and feature tag to first
The step of default neural network model is obtained in beginning neural network model includes:Sample set is divided into training set and verification collects;Will
Training set and feature tag corresponding with training set, which are inputted into initial neural network model, to be obtained training neural network model;Will
Verification collection input to training neural network model is verified feature tag;When verification feature tag feature corresponding with training set
When label is inconsistent, then default neutral net is obtained by the initial neural network model of feature tag amendment corresponding with training set
Model.
In one of the embodiments, realized during processor executive program according to preprocessing rule by sample data into
The step of row pretreatment obtains sample set, including:Sample segmented to obtain individual character collection according to default participle logic;By pre-
If each word that individual character is concentrated is expressed as word vector by the quantity for the character that vector model and individual character are concentrated;According to default rule
The word that individual character is concentrated then is expressed as word sequence;Sample set is obtained according to word vector sum word sequence.
It is above-mentioned to limit the restriction that may refer to above in connection with data pick-up method on the specific of computer equipment, herein
Repeat no more.
In one embodiment, please continue to refer to Fig. 7, there is provided a kind of computer-readable storage medium, is stored thereon with computer
Program, the program realize following steps when being executed by processor:Pending data is split to obtain data set;By data set
The default neural network model of input obtains initial draw data and feature tag corresponding with initial draw data;Taken out from initial
Target extract data corresponding with preset rules template are extracted in access in;By target extract data and with target extract data
Corresponding feature tag association output.
In one of the embodiments, realized during processor executive program split pending data is counted
It can include according to the step of collection:Pending data split according to punctuation mark to obtain data set.
In one of the embodiments, that is realized during processor executive program inputs data set default neutral net mould
Type obtains the step of initial draw data and feature tag corresponding with initial draw data, including:Data set input is pre-
If neural network model obtains alternate data collection and alternative label corresponding with alternate data collection;Alternate data collection is obtained to correspond to
Each alternative label probability;The corresponding alternative label of maximum probability is chosen as feature tag, it is corresponding with feature tag
Alternate data collection as initial draw data set.
In one of the embodiments, that is realized during processor executive program closes feature tag and target extract data
After the step of connection output, further include:When the feature tag of association output has mistake with target extract data, reception is directed to
The adjust instruction of preset rules template;According to adjust instruction, preset rules template is adjusted.
In one of the embodiments, following steps are also realized when which is executed by processor:According to preprocessing rule
Sample data is pre-processed to obtain sample set;Obtain the feature tag corresponding to each sample set;By sample set and feature
Label, which is inputted into initial neural network model, obtains default neural network model.
In one of the embodiments, that is realized during processor executive program inputs sample set and feature tag to first
The step of default neural network model is obtained in beginning neural network model, including:Sample set is divided into training set and verification collects;Will
Training set and feature tag corresponding with training set, which are inputted into initial neural network model, to be obtained training neural network model;Will
Verification collection input to training neural network model is verified feature tag;When verification feature tag feature corresponding with training set
When label is inconsistent, then default neutral net is obtained by the initial neural network model of feature tag amendment corresponding with training set
Model.
In one of the embodiments, realized during processor executive program according to preprocessing rule by sample data into
The step of row pretreatment obtains sample set, including:Sample segmented to obtain individual character collection according to default participle logic;By pre-
If each word that individual character is concentrated is expressed as word vector by the quantity for the character that vector model and individual character are concentrated;According to default rule
The word that individual character is concentrated then is expressed as word sequence;Sample set is obtained according to word vector sum word sequence.
It is above-mentioned to limit the restriction that may refer to above in connection with Video Rendering method on the specific of computer-readable storage medium,
Details are not described herein.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with
The program for instructing relevant hardware to complete by computer program can be stored in a non-volatile computer and calculating can be read
In machine storage medium, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, it is computer-readable
The computer-readable storage medium taken can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) etc..
Each technical characteristic of above example can be combined arbitrarily, to make description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, lance is not present in the combination of these technical characteristics
Shield, is all considered to be the scope of this specification record.
Above example only expresses the several embodiments of the present invention, its description is more specific and detailed, but can not
Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art,
On the premise of not departing from present inventive concept, various modifications and improvements can be made, these belong to protection scope of the present invention.
Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (10)
- A kind of 1. data pick-up method, it is characterised in that including:Pending data is split to obtain data set;The data set is inputted into default neural network model and obtains initial draw data and corresponding with initial draw data Feature tag;Target extract data corresponding with the preset rules template are extracted from the initial draw data;By the target extract data and feature tag corresponding with target extract data association output.
- 2. according to the method described in claim 1, it is characterized in that, described split pending data to obtain data set Step, including:The pending data is split to obtain data set according to punctuation mark.
- 3. according to the method described in claim 1, it is characterized in that, described input default neural network model by the data set The step of obtaining initial draw data and feature tag corresponding with the initial draw data, including:The data set is inputted into default neural network model and obtains alternate data collection and corresponding with alternate data collection alternative Label;Obtain the probability of the corresponding each alternative label of the alternate data collection;The corresponding alternative label of maximum probability is chosen as feature tag, alternate data collection corresponding with the feature tag to make For the initial draw data set.
- It is 4. according to the method described in claim 1, it is characterized in that, described by the feature tag and the target extract data After the step of association output, further include:When the feature tag of association output has mistake with the target extract data, reception is directed to the preset rules The adjust instruction of template;According to the adjust instruction, the preset rules template is adjusted.
- 5. method according to any one of claims 1 to 4, it is characterised in that the method further includes:Sample data pre-processed according to preprocessing rule to obtain sample set;Obtain the feature tag corresponding to each sample set;The sample set and the feature tag are inputted into initial neural network model and obtain default neural network model.
- 6. according to the method described in claim 5, it is characterized in that, the method is defeated by the sample set and the feature tag The step of entering into initial neural network model to obtain default neural network model, including:The sample set is divided into training set and verification collects;The training set and feature tag corresponding with the training set are inputted into initial neural network model and trained Neural network model;The verification collection input to training neural network model is verified feature tag;When verifying that feature tag feature tag corresponding with the training set is inconsistent, then by corresponding with the training set Feature tag corrects the trained neural network model and obtains default neural network model.
- 7. according to the method described in claim 5, it is characterized in that, described located sample data according to preprocessing rule in advance The step of reason obtains sample set, including:Sample segmented to obtain individual character collection according to default participle logic;Each word that the quantity for the character concentrated by presetting vector model and the individual character concentrates the individual character represents For word vector;The word that the individual character is concentrated is expressed as word sequence according to preset rules;Sample set is obtained according to word sequence described in the word vector sum.
- A kind of 8. information extraction device, it is characterised in that including:Split module, for being split pending data to obtain data set;Labeling module, initial draw data are obtained and with initially taking out for the data set to be inputted default neural network model Access is according to corresponding feature tag;Abstraction module, for according to preset rules template, corresponding target extract data to be extracted from the initial draw data;Output module, for the target extract data and feature tag corresponding with target extract data association are defeated Go out.
- 9. a kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor Computer program, it is characterised in that the processor is realized in any the method for claim 1 to 7 when performing described program The step of.
- 10. a kind of readable storage medium storing program for executing, is stored thereon with computer program, it is characterised in that when the program is executed by processor Realize the step in any the method for claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711155534.1A CN107943911A (en) | 2017-11-20 | 2017-11-20 | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711155534.1A CN107943911A (en) | 2017-11-20 | 2017-11-20 | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107943911A true CN107943911A (en) | 2018-04-20 |
Family
ID=61929143
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711155534.1A Pending CN107943911A (en) | 2017-11-20 | 2017-11-20 | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107943911A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733837A (en) * | 2018-05-28 | 2018-11-02 | 杭州依图医疗技术有限公司 | A kind of the natural language structural method and device of case history text |
CN108829683A (en) * | 2018-06-29 | 2018-11-16 | 北京百度网讯科技有限公司 | Mixing mark learning neural network model and its training method, device |
CN108874942A (en) * | 2018-06-04 | 2018-11-23 | 科大讯飞股份有限公司 | A kind of information determines method, apparatus, equipment and readable storage medium storing program for executing |
CN109145125A (en) * | 2018-08-20 | 2019-01-04 | 长城计算机软件与***有限公司 | A kind of method and system, the storage medium of dynamic Extracting Information |
CN109165279A (en) * | 2018-09-06 | 2019-01-08 | 深圳和而泰数据资源与云技术有限公司 | information extraction method and device |
CN109255128A (en) * | 2018-10-11 | 2019-01-22 | 北京小米移动软件有限公司 | Generation method, device and the storage medium of multi-layer label |
CN109308304A (en) * | 2018-09-18 | 2019-02-05 | 深圳和而泰数据资源与云技术有限公司 | Information extraction method and device |
CN109583594A (en) * | 2018-11-16 | 2019-04-05 | 东软集团股份有限公司 | Deep learning training method, device, equipment and readable storage medium storing program for executing |
CN109616215A (en) * | 2018-11-23 | 2019-04-12 | 金色熊猫有限公司 | Medical data abstracting method, device, storage medium and electronic equipment |
CN109635288A (en) * | 2018-11-29 | 2019-04-16 | 东莞理工学院 | A kind of resume abstracting method based on deep neural network |
CN109657115A (en) * | 2018-10-18 | 2019-04-19 | 平安科技(深圳)有限公司 | Crawl data self-repair method, device, equipment and medium |
CN110213239A (en) * | 2019-05-08 | 2019-09-06 | 阿里巴巴集团控股有限公司 | Suspicious transaction message generation method, device and server |
WO2019227584A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Method for parsing and processing resume data information, device, apparatus, and storage medium |
CN110717039A (en) * | 2019-09-17 | 2020-01-21 | 平安科技(深圳)有限公司 | Text classification method and device, electronic equipment and computer-readable storage medium |
CN110795468A (en) * | 2019-10-10 | 2020-02-14 | 中国建设银行股份有限公司 | Data extraction method and device |
CN110866393A (en) * | 2019-11-19 | 2020-03-06 | 北京网聘咨询有限公司 | Resume information extraction method and system based on domain knowledge base |
CN111078737A (en) * | 2019-11-25 | 2020-04-28 | 北京明略软件***有限公司 | Commonality analysis method and device, data processing equipment and readable storage medium |
CN111221975A (en) * | 2018-11-26 | 2020-06-02 | 珠海格力电器股份有限公司 | Method and device for extracting field and computer storage medium |
CN111309572A (en) * | 2020-02-13 | 2020-06-19 | 上海复深蓝软件股份有限公司 | Test analysis method and device, computer equipment and storage medium |
CN111428484A (en) * | 2020-04-14 | 2020-07-17 | 广州云从鼎望科技有限公司 | Information management method, system, device and medium |
CN111753546A (en) * | 2020-06-23 | 2020-10-09 | 深圳市华云中盛科技股份有限公司 | Document information extraction method and device, computer equipment and storage medium |
WO2020252919A1 (en) * | 2019-06-20 | 2020-12-24 | 平安科技(深圳)有限公司 | Resume identification method and apparatus, and computer device and storage medium |
TWI820845B (en) * | 2022-08-03 | 2023-11-01 | 中國信託商業銀行股份有限公司 | Training data labeling method and its computing device, article labeling model establishment method and its computing device, and article labeling method and its computing device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101470699A (en) * | 2007-12-28 | 2009-07-01 | 日电(中国)有限公司 | Information extraction model training apparatus, information extraction apparatus and information extraction system and method thereof |
CN105426360A (en) * | 2015-11-12 | 2016-03-23 | 中国建设银行股份有限公司 | Keyword extracting method and device |
US20170144378A1 (en) * | 2015-11-25 | 2017-05-25 | Lawrence Livermore National Security, Llc | Rapid closed-loop control based on machine learning |
CN106777336A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | A kind of exabyte composition extraction system and method based on deep learning |
CN106874256A (en) * | 2015-12-11 | 2017-06-20 | 北京国双科技有限公司 | Name the method and device of entity in identification field |
CN107193843A (en) * | 2016-03-15 | 2017-09-22 | 阿里巴巴集团控股有限公司 | A kind of character string selection method and device based on AC automatic machines and postfix expression |
-
2017
- 2017-11-20 CN CN201711155534.1A patent/CN107943911A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101470699A (en) * | 2007-12-28 | 2009-07-01 | 日电(中国)有限公司 | Information extraction model training apparatus, information extraction apparatus and information extraction system and method thereof |
CN105426360A (en) * | 2015-11-12 | 2016-03-23 | 中国建设银行股份有限公司 | Keyword extracting method and device |
US20170144378A1 (en) * | 2015-11-25 | 2017-05-25 | Lawrence Livermore National Security, Llc | Rapid closed-loop control based on machine learning |
CN106874256A (en) * | 2015-12-11 | 2017-06-20 | 北京国双科技有限公司 | Name the method and device of entity in identification field |
CN107193843A (en) * | 2016-03-15 | 2017-09-22 | 阿里巴巴集团控股有限公司 | A kind of character string selection method and device based on AC automatic machines and postfix expression |
CN106777336A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | A kind of exabyte composition extraction system and method based on deep learning |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733837A (en) * | 2018-05-28 | 2018-11-02 | 杭州依图医疗技术有限公司 | A kind of the natural language structural method and device of case history text |
WO2019227584A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Method for parsing and processing resume data information, device, apparatus, and storage medium |
CN108874942A (en) * | 2018-06-04 | 2018-11-23 | 科大讯飞股份有限公司 | A kind of information determines method, apparatus, equipment and readable storage medium storing program for executing |
CN108829683B (en) * | 2018-06-29 | 2022-06-10 | 北京百度网讯科技有限公司 | Hybrid label learning neural network model and training method and device thereof |
CN108829683A (en) * | 2018-06-29 | 2018-11-16 | 北京百度网讯科技有限公司 | Mixing mark learning neural network model and its training method, device |
CN109145125A (en) * | 2018-08-20 | 2019-01-04 | 长城计算机软件与***有限公司 | A kind of method and system, the storage medium of dynamic Extracting Information |
CN109165279A (en) * | 2018-09-06 | 2019-01-08 | 深圳和而泰数据资源与云技术有限公司 | information extraction method and device |
CN109308304A (en) * | 2018-09-18 | 2019-02-05 | 深圳和而泰数据资源与云技术有限公司 | Information extraction method and device |
CN109255128B (en) * | 2018-10-11 | 2023-11-28 | 北京小米移动软件有限公司 | Multi-level label generation method, device and storage medium |
CN109255128A (en) * | 2018-10-11 | 2019-01-22 | 北京小米移动软件有限公司 | Generation method, device and the storage medium of multi-layer label |
CN109657115B (en) * | 2018-10-18 | 2023-04-14 | 平安科技(深圳)有限公司 | Crawling data self-repairing method, device, equipment and medium |
CN109657115A (en) * | 2018-10-18 | 2019-04-19 | 平安科技(深圳)有限公司 | Crawl data self-repair method, device, equipment and medium |
CN109583594A (en) * | 2018-11-16 | 2019-04-05 | 东软集团股份有限公司 | Deep learning training method, device, equipment and readable storage medium storing program for executing |
CN109616215B (en) * | 2018-11-23 | 2021-07-09 | 金色熊猫有限公司 | Medical data extraction method, device, storage medium and electronic equipment |
CN109616215A (en) * | 2018-11-23 | 2019-04-12 | 金色熊猫有限公司 | Medical data abstracting method, device, storage medium and electronic equipment |
CN111221975A (en) * | 2018-11-26 | 2020-06-02 | 珠海格力电器股份有限公司 | Method and device for extracting field and computer storage medium |
CN111221975B (en) * | 2018-11-26 | 2021-12-14 | 珠海格力电器股份有限公司 | Method and device for extracting field and computer storage medium |
CN109635288B (en) * | 2018-11-29 | 2023-05-23 | 东莞理工学院 | Resume extraction method based on deep neural network |
CN109635288A (en) * | 2018-11-29 | 2019-04-16 | 东莞理工学院 | A kind of resume abstracting method based on deep neural network |
CN110213239B (en) * | 2019-05-08 | 2021-06-01 | 创新先进技术有限公司 | Suspicious transaction message generation method and device and server |
CN110213239A (en) * | 2019-05-08 | 2019-09-06 | 阿里巴巴集团控股有限公司 | Suspicious transaction message generation method, device and server |
WO2020252919A1 (en) * | 2019-06-20 | 2020-12-24 | 平安科技(深圳)有限公司 | Resume identification method and apparatus, and computer device and storage medium |
CN110717039A (en) * | 2019-09-17 | 2020-01-21 | 平安科技(深圳)有限公司 | Text classification method and device, electronic equipment and computer-readable storage medium |
CN110717039B (en) * | 2019-09-17 | 2023-10-13 | 平安科技(深圳)有限公司 | Text classification method and apparatus, electronic device, and computer-readable storage medium |
CN110795468A (en) * | 2019-10-10 | 2020-02-14 | 中国建设银行股份有限公司 | Data extraction method and device |
CN110866393A (en) * | 2019-11-19 | 2020-03-06 | 北京网聘咨询有限公司 | Resume information extraction method and system based on domain knowledge base |
CN111078737B (en) * | 2019-11-25 | 2023-03-21 | 北京明略软件***有限公司 | Commonality analysis method and device, data processing equipment and readable storage medium |
CN111078737A (en) * | 2019-11-25 | 2020-04-28 | 北京明略软件***有限公司 | Commonality analysis method and device, data processing equipment and readable storage medium |
CN111309572B (en) * | 2020-02-13 | 2021-05-04 | 上海复深蓝软件股份有限公司 | Test analysis method and device, computer equipment and storage medium |
CN111309572A (en) * | 2020-02-13 | 2020-06-19 | 上海复深蓝软件股份有限公司 | Test analysis method and device, computer equipment and storage medium |
CN111428484A (en) * | 2020-04-14 | 2020-07-17 | 广州云从鼎望科技有限公司 | Information management method, system, device and medium |
CN111753546A (en) * | 2020-06-23 | 2020-10-09 | 深圳市华云中盛科技股份有限公司 | Document information extraction method and device, computer equipment and storage medium |
CN111753546B (en) * | 2020-06-23 | 2024-03-26 | 深圳市华云中盛科技股份有限公司 | Method, device, computer equipment and storage medium for extracting document information |
TWI820845B (en) * | 2022-08-03 | 2023-11-01 | 中國信託商業銀行股份有限公司 | Training data labeling method and its computing device, article labeling model establishment method and its computing device, and article labeling method and its computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107943911A (en) | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing | |
CN108959242B (en) | Target entity identification method and device based on part-of-speech characteristics of Chinese characters | |
CN109635288A (en) | A kind of resume abstracting method based on deep neural network | |
CN112149421A (en) | Software programming field entity identification method based on BERT embedding | |
CN109391706A (en) | Domain name detection method, device, equipment and storage medium based on deep learning | |
CN103678684A (en) | Chinese word segmentation method based on navigation information retrieval | |
CN107301163B (en) | Formula-containing text semantic parsing method and device | |
CN110555206A (en) | named entity identification method, device, equipment and storage medium | |
CN105068990B (en) | A kind of English long sentence dividing method of more strategies of Machine oriented translation | |
CN110008309A (en) | A kind of short phrase picking method and device | |
CN110232123A (en) | The sentiment analysis method and device thereof of text calculate equipment and readable medium | |
CN111124487A (en) | Code clone detection method and device and electronic equipment | |
CN111143531A (en) | Question-answer pair construction method, system, device and computer readable storage medium | |
CN111831624A (en) | Data table creating method and device, computer equipment and storage medium | |
CN115713085A (en) | Document theme content analysis method and device | |
CN116029280A (en) | Method, device, computing equipment and storage medium for extracting key information of document | |
CN111814476B (en) | Entity relation extraction method and device | |
CN111382243A (en) | Text category matching method, text category matching device and terminal | |
CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
CN108829898B (en) | HTML content page release time extraction method and system | |
CN110866394A (en) | Company name identification method and device, computer equipment and readable storage medium | |
CN116578700A (en) | Log classification method, log classification device, equipment and medium | |
Vu-Manh et al. | Improving Vietnamese dependency parsing using distributed word representations | |
CN115757815A (en) | Knowledge graph construction method and device and storage medium | |
CN115392255A (en) | Few-sample machine reading understanding method for bridge detection text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180420 |