CN107908642A - Industry text entities extracting method based on distributed platform - Google Patents

Industry text entities extracting method based on distributed platform Download PDF

Info

Publication number
CN107908642A
CN107908642A CN201710902720.0A CN201710902720A CN107908642A CN 107908642 A CN107908642 A CN 107908642A CN 201710902720 A CN201710902720 A CN 201710902720A CN 107908642 A CN107908642 A CN 107908642A
Authority
CN
China
Prior art keywords
text
extraction
model
feature
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710902720.0A
Other languages
Chinese (zh)
Other versions
CN107908642B (en
Inventor
武克杰
周书勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Huatong Sheng Yun Technology Co Ltd
Original Assignee
Jiangsu Huatong Sheng Yun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Huatong Sheng Yun Technology Co Ltd filed Critical Jiangsu Huatong Sheng Yun Technology Co Ltd
Priority to CN201710902720.0A priority Critical patent/CN107908642B/en
Publication of CN107908642A publication Critical patent/CN107908642A/en
Application granted granted Critical
Publication of CN107908642B publication Critical patent/CN107908642B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of industry text entities extracting method based on distributed platform, including:Relationship characteristic model is obtained using deep learning neural metwork training text data set;The relationship characteristic of extraction is generated into multiple elasticity distribution formula relationship characteristic data set RDD;The category feature model extraction category feature that data set in RDD is obtained by improved non-linear svm classifier Algorithm for Training;Corresponding linguistic context physical model is found according to the category feature of extraction, and the solid data in the text for corresponding to category feature is extracted by trained physical model;Judge whether this quantity of corresponding linguistic context text exceedes given threshold, if exceed threshold value, the re -training linguistic context physical model, utilizes the solid data in the text of the corresponding category feature of physical model extraction of re -training, otherwise, text entities feature and text data are preserved.The text feature entity under different context can be handled, effectively increases the efficiency and extraction entity accuracy rate of entity extraction.

Description

Industry text entities extracting method based on distributed platform
Technical field
The present invention relates to a kind of extracting method of text entities, more particularly to a kind of industry text based on distributed platform This entity extraction method.
Background technology
Traditional Text Extraction is using pattern match Relation extraction method, the Relation extraction based on dictionary driving, base In Relation extraction method of machine learning etc., these first most of methods are that word frequency is higher in the method extraction text by participle Word as effective entity.These methods are suitable for the relatively simple scene of entity in text, but under different context, these Method cannot effectively distinguish entity under different context, need not will can split originally or the segmentation and conjunction of merged entity mistake And.
Meanwhile word of the traditional detection method to the mistake for not having to occur in former text, it is difficult to be carried out by segmenting method Extraction.
Occur many extraction instance methods based on deep learning in the recent period, wherein extraction entity algorithm is divided into calculated performance It is not higher that relatively good but extraction is accurate, two kinds of models that extraction accuracy is higher but calculated performance is slow.Such as fast linear Entity extraction model, convolutional neural networks are exactly accelerated model, and non-linear entity extraction model, deep neural network model are exactly The relatively good model of accuracy.
It is real that Chinese patent literature CN2017100036859 discloses a kind of online traditional Chinese medical science text name based on deep learning Body recognition methods, the entity extraction method are carried by reptile rich text training sample set, while using the method for neutral net Text feature is taken, this can extract the accuracy of the entity of sample to a certain extent, but with the increase pair of training sample The extraction physical model answered also increases, while the time of training can gradually increase, while extracts the characteristic time also with increase.
The content of the invention
For above-mentioned technical problem, the present invention seeks to:A kind of industry text entities based on distributed platform are provided to carry Method is taken, using multiple elasticity distribution formula entity extraction models in Spark platforms, the text feature handled under different context is real Body, so can effectively improve the efficiency of entity extraction, can also improve extraction entity accuracy rate.At the same time by supporting vector Weights are improved in machine sorting algorithm, enhance the generalization ability of text, the further accuracy of text.
The technical scheme is that:
A kind of industry text entities extracting method based on distributed platform, comprises the following steps:
S01:Relationship characteristic model is obtained using deep learning neural metwork training text data set, and passes through relationship characteristic Relationship characteristic in model extraction target text;
S02:The relationship characteristic of extraction is generated into multiple elasticity distribution formula relationship characteristic data set RDD;
S03:The category feature model that data set in RDD is obtained by improved non-linear svm classifier Algorithm for Training Extract category feature;
S04:Corresponding linguistic context physical model is found according to the category feature of extraction, and is extracted by trained physical model Solid data in the text of corresponding category feature;
S05:Judge whether this quantity of corresponding linguistic context text exceedes given threshold T, if exceed threshold value T, re -training should Linguistic context physical model, using the solid data in the text of the corresponding category feature of physical model extraction of re -training, otherwise, is protected Deposit text entities feature and text data.
Preferably, the step S01 is specifically included:
S11:Text is segmented by ansj segmenting methods of increasing income, count word frequency of each word in all texts and Word frequency in current text, removes general auxiliary word, stop words and the high word of frequency, by all texts according to ought be above The relation of word frequency in this and the word frequency in all texts, extracts N number of word, will be placed on per one kind in same file folder;
S12:Each word in N number of word is randomly set to the data characteristics of A dimensions, each text forms N*A dimension datas;
S13:Using each word feature as deep learning neutral net input node neuron, then pass through the first hidden layer Convolution is carried out, sub-sample and local average are carried out by the second hidden layer, second of convolution is carried out by the 3rd hidden layer, is led to Cross the 4th hidden layer and carry out second of sub-sample and the calculation of local average juice, full articulamentum, convert text to B dimension datas, lead to Multiple testing and debugging accuracy is crossed, obtains relationship characteristic model.
Preferably, the step S03 is specifically included:
S31:The weight and offset in non-linear svm classifier algorithm are adjusted, makes the relationship characteristic of input and has marked The error of the feature of sample preserves the category feature model of text in setting range;
S32:The disaggregated model method of selection is improved non-linear svm classifier algorithm, its training pattern class object letter Number isWhereinForecast classification condition is y=w' φ (xi)+ b+εi, obtain discriminant functionWherein weightsC is penalty factor, is one Empirical parameter, i are RDD numbers, and w is vectorial weight, siIt is the Euclidean distance of positive sample and negative sample in relationship characteristic, b is point Threshold value during class, εiFor error, φ (xi) it is Non-linear Kernel function;
S32:Gradually adjustment penalty factor, test select optimal penalty factor, wherein Non-linear Kernel function phi (xi) For min (x (i), xs(i)), wherein x (i), xs(i) it is feature vector that any two text relationship characteristic sample extraction arrives;Often The label of class relationship characteristic sample is corresponding classification number, and the α of discriminant function is obtained by multiple off-line trainingiAnd b, wherein sentencing Other functionIt is exactly corresponding category feature model.
Preferably, in the step S03, bad and sample text that is having apparent error will be extracted and be put into new class, by Step section test sample so that test sample class is optimal.
Compared with prior art, it is an advantage of the invention that:
Present invention improves over sorting algorithm model, wherein mainly with the addition of punishment in training pattern class object function The weighting coefficient of the factor, enhances the generalization ability of train classification models, while employs Non-linear Kernel function min (x (i), xs (i)) so that the correspondence classification of text can accurately be found.Pass through distributed spark platforms Text Feature Extraction physical model point at the same time Into the extraction text entities model of multiple scenes, solve tradition extraction text entities training and computational load is bigger asks Inscribe, entity in the energy each text of rapid extraction, can more accurately extract text entities.
Brief description of the drawings
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is the flow chart of the industry text entities extracting method of the invention based on distributed platform.
Embodiment
To make the object, technical solutions and advantages of the present invention of greater clarity, with reference to embodiment and join According to attached drawing, the present invention is described in more detail.It should be understood that these descriptions are merely illustrative, and it is not intended to limit this hair Bright scope.In addition, in the following description, the description to known features and technology is eliminated, to avoid this is unnecessarily obscured The concept of invention.
Embodiment:
As shown in fig. 1, the industry text entities extracting method based on distributed platform, comprises the following steps:
(1) in text collection, the textual data of every profession and trade is obtained respectively by akka communication modules in spark Open Source Platforms It is believed that breath, the text data for needing to extract entity of monitoring device collection is transmitted on distributed Spark platforms.
(2) spark platform clusters are built, a wherein server is saved as management node, 4 servers as service Point.Dependence wherein between management node essential record data flow is simultaneously responsible for task scheduling and the new RDD of generation.Service Node is mainly the store function for realizing parser and data.
(3) existing text data set is trained to obtain relationship characteristic model, Ran Houli by deep learning neural net method With the relationship characteristic in the new text of relationship characteristic model extraction;
The generation of relationship characteristic model, specifically includes:
S21:First text is segmented with ansj segmenting methods of increasing income, each word is then calculated in institute by statistical There are the word frequency in text and the word frequency in current text, remove general auxiliary word, stop words, and the word that frequency is higher Language, then the word frequency relation in the word frequency in current text and all texts, extracts N number of primary word, while will be each Class is put into same file folder.
S22:Then the data characteristics that each vocabulary is 200 dimensions is randomly provided, so each samples of text can form N* The data of 200 dimensions.
S23:Using the relationship characteristic of each word as deep learning neutral net input node neuron, then pass through first Hidden layer carries out convolution, the second hidden layer carries out sub-sample and local average, the 3rd hidden layer carry out second of convolution, the 4th A hidden layer carries out second of sub-sample and the calculation of local average juice, full articulamentum, realizes that N*200 dimension datas turn to 1000 dimension datas Change.70% data wherein are used to training and 30% to be used to test.Gradual adjusting training is adjusted by multiple test accuracy The model of depth network generation, it is exactly the relational model for generating text that can finally be optimal network model.
(4) the relationship characteristic text data extracted is converted into text elasticity distribution formula RDD relationship characteristic text datas, Then it is divided into multiple RDD according to text contextual feature stream and carrys out burst processing.
(5) the elasticity distribution formula RDD features text data that will convert into passes through improved non-linear svm classifier Algorithm for Training For category feature model conversion out into category feature, trained data set is existing and categorized good industry text data Collection, while the advantage that quickly can be quickly calculated using spark distributed platforms, can to correcting industry text data set again With Fast Training, new category feature model is obtained.
Bad and sample text that is having apparent error will be extracted to be put into new class, progressively adjust test sample so that Test sample class is optimal;New text set can form different classifications, be feature by the distribution of spark platforms, can be fast All samples are extracted corresponding entity by speed by the physical model of corresponding types.It is corresponding more with increasing for classification Class physical model robustness is stronger, and it is better to extract entity accuracy.
The category feature model that improved non-linear svm classifier Algorithm for Training comes out, comprises the following steps:
Choose improved supporting vector machine model is as train classification models, its training pattern class object functionIts corresponding constraints is y=w' φ (xi)+b+εi, pass through object function peace treaty Beam condition derives discriminant functionWherein weightsC is penalty factor, is one A adjustable parameter, i are 1 to arrive n training text number of samples, and w is weight vector, siIt is the Euclidean distance of positive sample and negative sample, And as the weighting coefficient of penalty factor in object function, b is threshold value, εiFor error, φ (xi) it is Non-linear Kernel function;
Between penalty factor is set as 1 to 100, feature extraction is carried out to the positive negative sample of preprepared pedestrian, it is right Kernel function φ (the x answeredi) it is min (x (i), xs(i)), wherein x (i), xs(i) it is feature that the positive and negative sample extraction of any two arrives Vector;The label of positive sample is that value is 1, and negative sample label value is -1, and off-line training obtains the α of discriminant functioniAnd b, wherein sentencing Other functionIt is exactly corresponding non-linear SVM detection models;
By the result y for judging detection modeli, export the classification that respective value corresponds to text linguistic context.
(6) corresponding linguistic context physical model is found according to the category feature of text, and is extracted by trained physical model The solid data in the text of corresponding types is selected, wherein linguistic context physical model is existing by word2vec instruments of increasing income The physical model for the text that industry text data set is trained;
(7) when some scene text quantity exceedes threshold value T, the word2vec instruments re -training scenario entities will be used Model, can first save the data on distributed platform when no more than number of thresholds, wherein more than 10,000 samples of general T Quantity.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing Change example.

Claims (4)

1. a kind of industry text entities extracting method based on distributed platform, it is characterised in that comprise the following steps:
S01:Relationship characteristic model is obtained using deep learning neural metwork training text data set, and passes through relationship characteristic model Extract the relationship characteristic in target text;
S02:The relationship characteristic of extraction is generated into multiple elasticity distribution formula relationship characteristic data set RDD;
S03:The category feature model extraction that data set in RDD is obtained by improved non-linear svm classifier Algorithm for Training Category feature;
S04:Corresponding linguistic context physical model is found according to the category feature of extraction, and is extracted and corresponded to by trained physical model Solid data in the text of category feature;
S05:Judge whether this quantity of corresponding linguistic context text exceedes given threshold T, if exceed threshold value T, the re -training linguistic context Physical model, using the solid data in the text of the corresponding category feature of physical model extraction of re -training, otherwise, preserves text This substance feature and text data.
2. the industry text entities extracting method according to claim 1 based on distributed platform, it is characterised in that described Step S01 is specifically included:
S11:Text is segmented by ansj segmenting methods of increasing income, word frequency of each word in all texts is counted and is working as Word frequency in preceding text, removes general auxiliary word, stop words and the high word of frequency, by all texts according in current text Word frequency and the word frequency in all texts relation, extract N number of word, by per one kind be placed on same file folder in;
S12:Each word in N number of word is randomly set to the data characteristics of A dimensions, each text forms N*A dimension datas;
S13:Using each word feature as deep learning neutral net input node neuron, then carried out by the first hidden layer Convolution, sub-sample and local average are carried out by the second hidden layer, and second of convolution is carried out by the 3rd hidden layer, by the Four hidden layers carry out second of sub-sample and the calculation of local average juice, full articulamentum, B dimension datas are converted text to, by more Secondary testing and debugging accuracy, obtains relationship characteristic model.
3. the industry text entities extracting method according to claim 1 based on distributed platform, it is characterised in that described Step S03 is specifically included:
S31:Adjust the weight and offset in non-linear svm classifier algorithm, the sample for making the relationship characteristic of input and having marked Feature error in setting range, preserve the category feature model of text;
S32:The disaggregated model method of selection is improved non-linear svm classifier algorithm, its training pattern class object function isWhereinForecast classification condition is y=w' φ (xi)+b+ εi, obtain discriminant functionWherein weightsC is penalty factor, is a warp Test parameter, i is RDD numbers, and w is vectorial weight, siIt is the Euclidean distance of positive sample and negative sample in relationship characteristic, b is classification When threshold value, εiFor error, φ (xi) it is Non-linear Kernel function;
S32:Gradually adjustment penalty factor, test select optimal penalty factor, wherein Non-linear Kernel function phi (xi) it is min (x(i),xs(i)), wherein x (i), xs(i) it is feature vector that any two text relationship characteristic sample extraction arrives;Per class relation The label of feature samples is corresponding classification number, and the α of discriminant function is obtained by multiple off-line trainingiAnd b, wherein discriminant functionIt is exactly corresponding category feature model.
4. the industry text entities extracting method according to claim 1 based on distributed platform, it is characterised in that described In step S03, bad and sample text that is having apparent error will be extracted and be put into new class, progressively adjust test sample so that Test sample class is optimal.
CN201710902720.0A 2017-09-29 2017-09-29 Industry text entity extraction method based on distributed platform Active CN107908642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710902720.0A CN107908642B (en) 2017-09-29 2017-09-29 Industry text entity extraction method based on distributed platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710902720.0A CN107908642B (en) 2017-09-29 2017-09-29 Industry text entity extraction method based on distributed platform

Publications (2)

Publication Number Publication Date
CN107908642A true CN107908642A (en) 2018-04-13
CN107908642B CN107908642B (en) 2021-11-12

Family

ID=61840291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710902720.0A Active CN107908642B (en) 2017-09-29 2017-09-29 Industry text entity extraction method based on distributed platform

Country Status (1)

Country Link
CN (1) CN107908642B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508757A (en) * 2018-10-30 2019-03-22 北京陌上花科技有限公司 Data processing method and device for Text region
CN109754014A (en) * 2018-12-29 2019-05-14 北京航天数据股份有限公司 Industry pattern training method, device, equipment and medium
CN111274348A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Service feature data extraction method and device and electronic equipment
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN111950279A (en) * 2019-05-17 2020-11-17 百度在线网络技术(北京)有限公司 Entity relationship processing method, device, equipment and computer readable storage medium
CN112052646A (en) * 2020-08-27 2020-12-08 安徽聚戎科技信息咨询有限公司 Text data labeling method
CN114756385A (en) * 2022-06-16 2022-07-15 合肥中科类脑智能技术有限公司 Elastic distributed training method in deep learning scene

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100250547A1 (en) * 2001-08-13 2010-09-30 Xerox Corporation System for Automatically Generating Queries
CN104933164A (en) * 2015-06-26 2015-09-23 华南理工大学 Method for extracting relations among named entities in Internet massive data and system thereof
CN105389378A (en) * 2015-11-19 2016-03-09 广州精标信息科技有限公司 System for integrating separate data
CN106168965A (en) * 2016-07-01 2016-11-30 竹间智能科技(上海)有限公司 Knowledge mapping constructing system
CN106599032A (en) * 2016-10-27 2017-04-26 浙江大学 Text event extraction method in combination of sparse coding and structural perceptron
CN106599041A (en) * 2016-11-07 2017-04-26 中国电子科技集团公司第三十二研究所 Text processing and retrieval system based on big data platform
US20170124181A1 (en) * 2015-10-30 2017-05-04 Oracle International Corporation Automatic fuzzy matching of entities in context
CN106682220A (en) * 2017-01-04 2017-05-17 华南理工大学 Online traditional Chinese medicine text named entity identifying method based on deep learning
US20170169094A1 (en) * 2015-12-15 2017-06-15 International Business Machines Corporation Statistical Clustering Inferred From Natural Language to Drive Relevant Analysis and Conversation With Users

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100250547A1 (en) * 2001-08-13 2010-09-30 Xerox Corporation System for Automatically Generating Queries
CN104933164A (en) * 2015-06-26 2015-09-23 华南理工大学 Method for extracting relations among named entities in Internet massive data and system thereof
US20170124181A1 (en) * 2015-10-30 2017-05-04 Oracle International Corporation Automatic fuzzy matching of entities in context
CN105389378A (en) * 2015-11-19 2016-03-09 广州精标信息科技有限公司 System for integrating separate data
US20170169094A1 (en) * 2015-12-15 2017-06-15 International Business Machines Corporation Statistical Clustering Inferred From Natural Language to Drive Relevant Analysis and Conversation With Users
CN106168965A (en) * 2016-07-01 2016-11-30 竹间智能科技(上海)有限公司 Knowledge mapping constructing system
CN106599032A (en) * 2016-10-27 2017-04-26 浙江大学 Text event extraction method in combination of sparse coding and structural perceptron
CN106599041A (en) * 2016-11-07 2017-04-26 中国电子科技集团公司第三十二研究所 Text processing and retrieval system based on big data platform
CN106682220A (en) * 2017-01-04 2017-05-17 华南理工大学 Online traditional Chinese medicine text named entity identifying method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任育伟 等: ""搜索日志中命名实体识别"", 《现代图书情报技术》 *
张帆 等: ""基于深度学习的医疗命名实体识别"", 《计算技术与自动化》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508757A (en) * 2018-10-30 2019-03-22 北京陌上花科技有限公司 Data processing method and device for Text region
CN111274348A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Service feature data extraction method and device and electronic equipment
CN111274348B (en) * 2018-12-04 2023-05-12 北京嘀嘀无限科技发展有限公司 Service feature data extraction method and device and electronic equipment
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN111382570B (en) * 2018-12-28 2024-05-03 深圳市优必选科技有限公司 Text entity recognition method, device, computer equipment and storage medium
CN109754014A (en) * 2018-12-29 2019-05-14 北京航天数据股份有限公司 Industry pattern training method, device, equipment and medium
CN109754014B (en) * 2018-12-29 2021-04-27 北京航天数据股份有限公司 Industrial model training method, device, equipment and medium
CN111950279A (en) * 2019-05-17 2020-11-17 百度在线网络技术(北京)有限公司 Entity relationship processing method, device, equipment and computer readable storage medium
CN112052646A (en) * 2020-08-27 2020-12-08 安徽聚戎科技信息咨询有限公司 Text data labeling method
CN112052646B (en) * 2020-08-27 2024-03-29 安徽聚戎科技信息咨询有限公司 Text data labeling method
CN114756385A (en) * 2022-06-16 2022-07-15 合肥中科类脑智能技术有限公司 Elastic distributed training method in deep learning scene

Also Published As

Publication number Publication date
CN107908642B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN107908642A (en) Industry text entities extracting method based on distributed platform
CN105809190B (en) A kind of SVM cascade classifier methods based on Feature Selection
CN104915327B (en) A kind of processing method and processing device of text information
CN107194418B (en) Rice aphid detection method based on antagonistic characteristic learning
CN106960214A (en) Object identification method based on image
CN105095856A (en) Method for recognizing human face with shielding based on mask layer
CN107871101A (en) A kind of method for detecting human face and device
CN108932527A (en) Using cross-training model inspection to the method for resisting sample
CN108090099B (en) Text processing method and device
CN107818298A (en) General Raman spectral characteristics extracting method for machine learning material recognition
CN110070090A (en) A kind of logistic label information detecting method and system based on handwriting identification
CN104732248B (en) Human body target detection method based on Omega shape facilities
CN107180084A (en) Word library updating method and device
CN106611193A (en) Image content information analysis method based on characteristic variable algorithm
CN109255339B (en) Classification method based on self-adaptive deep forest human gait energy map
CN105930792A (en) Human action classification method based on video local feature dictionary
CN107145778A (en) A kind of intrusion detection method and device
CN106971180A (en) A kind of micro- expression recognition method based on the sparse transfer learning of voice dictionary
CN113489685A (en) Secondary feature extraction and malicious attack identification method based on kernel principal component analysis
Mehdipour Ghazi et al. Open-set plant identification using an ensemble of deep convolutional neural networks
CN110210433A (en) A kind of container number detection and recognition methods based on deep learning
CN107357895A (en) A kind of processing method of the text representation based on bag of words
CN108241662A (en) The optimization method and device of data mark
Gillies et al. Arabic text recognition system
CN110837818A (en) Chinese white sea rag dorsal fin identification method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant